5,643 Matching Annotations
  1. Jul 2025
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors attempted to dissect the function of a long non-coding RNA, lnc-FANCI-2, in cervical cancer. They profiled lnc-FANCI-2 in different cell lines and tissues, generated knockout cell lines, and characterized the gene using multiple assays.

      Strengths:

      A large body of experimental data has been presented and can serve as a useful resource for the scientific community, including transcriptomics and proteomics datasets. The reported results also span different parts of the regulatory network and open up multiple avenues for future research.

      Thanks for your positive comments on the strengths.

      Weaknesses:

      The write-up is somewhat unfocused and lacks deep mechanistic insights in some places.

      As the lnc-FANCI-2 as a novel lncRNA had never been explored for any functional study, our report found that it regulates RAS signaling. Thus, this report focuses on lnc-FANCI-2 and RAS signaling pathway but also includes some important screening data, which are important for our readers to understand how we could reach the RAS signaling.

      Reviewer #2 (Public review):

      The study by Liu et al provides a functional analysis of lnc-FANCI-2 in cervical carcinogenesis, building on their previous discovery of FANCI-2 being upregulated in cervical cancer by HPV E7.

      The authors conducted a comprehensive investigation by knocking out (KO) FANCI-2 in CaSki cells and assessing viral gene expression, cellular morphology, altered protein expression and secretion, altered RNA expression through RNA sequencing (verification of which by RT-PCR is well appreciated), protein binding, etc. Verification experiments by RT-PCR, Western blot, etc are notable strengths of the study.

      The KO and KD were related to increased Ras signaling and EMT and reduced IFN-y/a responses.

      Thanks for your positive comments. It did take us a few years to reach this scientific point for understanding of lnc-FANCI-2 function.

      Although the large amount of data is well acknowledged, it is a limitation that most data come from CaSki cells, in which FANCI-2 localization is different from SiHa cells and cancer tissues (Figure 1). The cytoplasmic versus nuclear localization is somewhat puzzling.

      Regarding lnc-FANCI-2 localization, it could be both cytoplasmic and nuclear in cervical cancer tissues, HPV16 or HPV18 infected keratinocytes, and HPV16+ cervical cancer cell line CaSki cells which contain multiple integrated HPV16 DNA copies. But surprisingly, it is most detectable in the nucleus in HPV16+ SiHa cells which contain only one copy of integrated HPV16 DNA (Yu, L., et al. mBio 15: e00729-24, 2024). No matter what, knockdown of lnc-FANCI-2 expression from SiHa cells induces RAS signaling leading to an increase in the expression of p-AKT and p-Erk1/2 (suppl. Fig. S6B).

      Reviewer #3 (Public review):

      Summary:

      A long noncoding RNA, lnc-FANCI-2, was reported to be regulated by HPV E7 oncoprotein and a cell transcription factor, YY1 by this group. The current study focuses on the function of lnc-FANCI-2 in HPV-16 positive cervical cancer is to intrinsically regulate RAS signaling, thereby facilitating our further understanding of additional cellular alterations during HPV oncogenesis. The authors used advanced technical approaches such as KO, transcriptome and (IRPCRP) and LC- MS/MS analyses in the current study and concluded that KO Inc-FANCI-2 significantly increases RAS signaling, especially phosphorylation of Akt and Erk1/2.

      Strengths:

      (1) HPV E6E7 are required for full immortalization and maintenance of the malignant phenotype of cervical cancer, but they are NOT sufficient for full transformation and tumorigenesis. This study helps further understanding of other cellular alterations in HPV oncogenesis.

      (2) lnc-FANCI-2 is upregulated in cervical lesion progression from CIN1, CIN2-3 to cervical cancer, cancer cell lines, and HPV transduced cell lines.

      (3) Viral E7 of high-risk HPVs and host transcription factor YY1 are two major factors promoting lnc-FANCI-2 expression.

      (4) Proteomic profiling of cytosolic and secreted proteins showed inhibition of MCAM, PODXL2, and ECM1 and increased levels of ADAM8 and TIMP2 in KO cells.

      (5) RNA-seq analyses revealed that KO cells exhibited significantly increased RAS signaling but decreased IFN pathways.

      (6) Increased phosphorylated Akt and Erk1/2, IGFBP3, MCAM, VIM, and CCND2 (cyclin D2) and decreased RAC3 were observed in KO cells.

      Thanks for your positive comments. It has taken us almost nine years to reach this point to gradually understand lnc-FANCI-2 functions, which are more complex than our initial thoughts.  

      Weaknesses:

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2.

      Both HPV16 and HPV18 infections induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that the lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells in our dual luciferase assays but is much less sensitive to YY1 binding in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      Author response image 1.

      A firefly luciferase (FLuc) reporter containing either the wild-type (−600 wt) or YY1-binding-site-mutated lnc-FANCI-2 promoter was evaluated in CaSki, HeLa, C33A, and HCT116 cells for its promoter activity, with Renilla luciferase (RLuc) activity driven by a TK promoter serving as an internal control. The two YY1-binding motifs (A and B) with a X for mutation are illustrated in the right diagram.

      (2) Previous studies and data in the current showed a steadily increased Inc-FANCI-2 during cancer progression, however, the authors did not observe significant changes in cell behaviors (both morphology and proliferation) in KO Inc-FANCI-2.

      Thanks. We do see decreases in cell proliferation, colony formation, and cell migration, accompanied by increased cell senescence, from the lnc-FANCI-2 KO cells to the parent WT cells.  These data are now added to the revised Fig. 1 and the revised supplemental Fig. S3.

      (3) The authors observed the significant changes of RAS signaling (downstream) in KO cells, but they provided limited interpretations of how these results contributed to full transformation or tumorigenesis in HPV-positive cancer.

      As we stated in the title of this function of lnc-FANCI-2, the lnc-FANCI-2 intrinsically restricts RAS signaling and phosphorylation of Akt and Erk in HPV16-infected cervical cancer. Presumably, high RAS-AKT-ERK signaling inhibits tumor cell survival due to senescence induction as we show in our new Figure 1 and supplemental Fig. S3. A similar report was found in a lung cancer study (Patricia Nieto, et al. Nature 548: 239-243, 2017).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) A major issue is that parts of the manuscript read like a collection of experimental results. However, some of the results do not contribute directly to the central story. Besides confusing the reader, the large amount of apparently disparate results can raise more questions. For example:

      a) Why is lnc-FANCI-2 highly expressed in HPV16-infected cervical cancer cell lines (but not in HPV18-infected cells)?

      b) How do p53 and RB repress the expression of lnc-FANCI-2?

      c) What regulates the sub-cellular localization of lnc-FANCI-2?

      d) How does lnc-FANCI-2 negatively regulate RAS signalling?

      e) How does MAP4K4 bind to lnc-FANCI-2?

      f) Do lnc-FANCI-2 and MAP4K4 require each other to regulate RAS signalling?

      g) How does RAS signalling regulate the transcription of MCAM and IGFBP3?

      h) How does MCAM feedback on RAS? Do the different MCAM isoforms impact on RAS signalling differently?

      i) How does IGFBP3 feedback on ERK but not AKT?

      j) How do the other mentioned proteins like ADAM8 fit into the regulatory network?

      k) Each question will require a lot more work to address. I think it would be good if the authors could think through carefully what the key message(s) in the current manuscript should be and then present a more focused write-up.

      Thanks for the critical comments. Because this study is the first time to explore lnc-FANCI-2 functions, we would like to be collective. We believe these data are important to guide any future studies. We really appreciate our reviewer listing many questions related to HPV infection, cell biology, RAS signaling, cancer biology from questions a to k. To address each question in a satisfactory way will be a separate study, but fortunately, our report has pointed out such a direction with some preliminary data for future studies. Here below are our responses to each question from a to k:

      a) Both HPV16 and HPV18 infection induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells but is much less sensitive to YY1 in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      b) We don’t know whether p53 and pRB could repress the expression of lnc-FANCI-2 although C33A cells bearing a mutant p53 and mutant pRB express high amount of lnc-FANCI-2. However, KD of E2F1 had no effect on lnc-FANCI-2 promoter activity in CaSki cells (Liu, H., et al. PNAS, 2021).

      c) RNA cellular localization can be affected by many factors, including splicing, export, and polyadenylation. As lnc-FANCI-2 is a long non-coding RNA, its regulation of cellular location could be more complicated than mRNAs and thus could be a future research direction.  

      d) The conclusion that lnc-FANCI-2 negatively regulates RAS signaling is based on both lnc-FANCI-2 KO and KD studies.  Please see the proposed hypothetic model in Figure 8E.

      e) The MAP4K4 binding to lnc-FANCI-2 was demonstrated by our IRPCRP-Mass spectrometry (Fig. 8A and 8C), although the exact binding site on lnc-FANCI-2 was not explored. As you probably know, many enzymes today turn out an RNA-binding enzyme (Castello A., et al. Trends Endocrinol. Metab. 26: 746-757, 2015; Hentze MW., et al. Nat. Rev. Mol. Cell Biol. 19: 327-341, 2018)    

      f) Yes, they are slightly relied on each other in regulating RAS signaling. We found that KD of MAP4K4 in parent CaSki cells (Figure 8D) led to more effect on RAS signaling (MCAM, IGFBP3, p-Akt) than that in lnc-FANCI-2 KO ΔPr-A9 cells. In contrast, the latter displayed more p-Erk1/2 than that induced by KD of lnc-FANCI-2 in the parental CaSki cells (Figure S7C).

      g) We believe RAS signaling regulates most likely the transcription of MCAM and IGFBP3 through phosphorylated transcription factors (Figure 8E diagram).

      h) As a signal molecule with at least 13 ligands/coreceptors (Joshkon A., et al. Biomedicines 8: 633, 2020), the increased MCAM appears to sustain RAS signaling (Fig. 7J and Fig. 8E). We are assuming the full-length cytoplasmic MCAM plays a predominant role in RAS signaling due to its abundance than the cleaved nuclear MCAM missing both transmembrane and cytoplasmic regions. Plus, RAS signaling mainly occurs in the cytosol.  

      i) Exact mechanism remains unknown. Lnc-FANCI-2 KO cells exhibit high expression levels of IGFBP3 RNA and protein and p-Erk1/2, but not so much for p-Akt, possibly due to IGFBP3 regulation of MAPK for Erk phosphorylation, but not much so on PI3K for Akt phosphorylation.

      j) The dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      k) We agree with our reviewer that each question will require a lot more work to address. As this study is to explore the lnc-FANCI-2 function for the first time, however, we prefer to include all of these data that have been selectively included in this write-up. We hope reviewer 1 will be satisfied with our response to each question from a to j. 

      (2) Figures S1A & S1C - Replicates are needed.

      Yes, we have repeated all of the experiments. The quantification shown in Figure S1A and S1C was performed in triplicate, and error bars have been added to the updated figure.

      3) Figure S1D - There seems to be some lnc-FANCI-2 RNA in the nucleus of CaSki cells as well. Please quantify the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm.

      Yes, a small fraction of lnc-FANCI-2 is in the nucleus of CaSki cells as we reported (Liu H., PNAS, 2021, Movies S1 and S2). We did quantify by fractionation and RT-qPCR the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm in Figure S1C. 

      (4) Figure S2B - (a) For ΔPr-A9 cells, it looks like there is an increase in E6 and a decrease in E7, instead of "little change" as the authors claimed. (b) I suggest checking the protein levels for all the control and KO clones.

      Thanks for the questions. We had some variation in E6 and E7 detection and the submitted one was one representative.  We grew again the lnc-FANCI-2 KO clones A9 and B3 and reexamined the expression of HPV16 E6/E7 proteins and their downstream targets, p53 and E2F1. As shown in new Figure S3A expt II, we saw again some variations in the detections (~20-30%) and these variations do not reflect a noticeable change for their downstream targets. Thus, we do not consider these changes significantly enough to draw a conclusion in our study, but rather most likely from sampling in the assays.

      (5) In the Proteome Profiler Human sReceptor Array analysis, multiple proteins were highlighted as having at least 30% change. But it is unclear how they relate to RAS signaling.

      Thanks for this comment.  Cellular soluble receptors are essential for RAS signaling, EMT pathway and IFN responses. For example, the dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      (6) Does knockdown of MAP4K4 lead to an increase in MCAM and IGFBP3?

      Yes, the MAP4K4 KD from parental WT CaSki cells does lead an increase in MCAM (~70%) and IGFBP3 (~30%) which is like the knockdown of lnc-FANCI-2 shown in the revised Figure 8D.

      Minor comments:

      (7) In the opinion of this reviewer the title is somewhat unwieldy.

      Thanks. We have shortened the title as “The lnc-FANCI-2 intrinsically restricts RAS signaling in HPV16-infected cervical cancer”

      (8) The abstract can be more focused and doesn't have to mention so many gene names. In fact, the significance paragraph works better as an abstract. For the significance, the authors can provide another write-up on the implications of their research instead.

      Thanks. We have revised the abstract and added the implications of this research.

      (9) The last sentence of the introduction feels a little abrupt. It would be good to elaborate a little more on the key findings.

      Thanks for this critical comment. We have revised as in the following: In this report, we demonstrate that lnc-FANCI-2 in HPV16-infected cells controls RAS signaling by interaction with MAP4K4 and other RNA-binding proteins. Ablation of lnc-FANCI-2 in the cells promotes RAS signaling and phosphorylation of Akt and Erk. High levels of lnc-FANCI-2 and low level of MCAM expression in cervical cancer patients correlate with improved survival, indicating that lnc-FANCI-2 plays a critical role in regulating RAS signaling to affect cervical cancer progression and patient outcomes.

      (10) Typo on line 191: Should be ADAM8 and not ADMA8.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      The paper contains a vast amount of data and would greatly benefit from an expanded version of the schematic of Figure 8E summarizing the main results. Including additional details on FANCI-2 regulation by HPV (primarily from previous studies) and its implications for HPV16-driven carcinogenesis would provide a more comprehensive overview.

      Thanks for the suggestion. We have modified our Figure 8E to include HR-HPV E7 and YY1 in regulation of lnc-FANCI-2 transcription.

      Further specific comments:

      (1) The introduction may be shortened to increase readability (e.g. lines 77-90; 94-105).

      We have shortened the introduction by deletion of the lines 94-105 from our initial submission.

      (2) Lines 55-57 the number of cervical cancer diagnoses and mortality need to be updated to the latest literature. The reference is from 2012.

      Thanks. We have revised and updated accordingly with a new citation (Bray F., et al: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 74, 229-263 (2024))

      (3) Line 61: Progression rate of CIN3 is incorrect (31% in 30 years according to reference 5).

      Thanks. Corrected.

      (4) Lines 108-112 are difficult to understand and should be rewritten.

      Thanks. Revised accordingly.

      (5) Line 116 Is this correct or should 'but' be 'and'?

      Thanks. Corrected accordingly.

      (6) Figure 1A top: The difference between cervical cancer and normal areas is hard to see in the top figure. The region labeled as "normal" does not resemble typical differentiating epithelium or normal glandular epithelium, though this is difficult to assess accurately from the image provided. I suggest adding HE staining and also the histotypes.

      We have added an H&E staining panel in the corresponding region to Figure 1A, which clearly shows the normal and cancer regions. Both cervical cancer tissues were cervical squamous cell carcinoma.

      (7) HFK-HPV16 & 18 cells (Figure 1B) are not described in the Materials & Methods.

      Thanks. We revised our Materials and Methods by citing our two previous publications.

      (8) Figure 2E (RNA scope on FANCI-2 KO) only shows 2 to 3 cells, which makes it somewhat difficult to assess downregulated expression in the KO. I suggest replacing these with pictures showing more cells (i.e. >10) to strengthen the results.

      We have replaced the image in Figure 2E to include more cells.

      (9) The spindle-like morphology in deltaPr-A9 cells shown in FigS2A is not very distinct. Including images at higher magnification could help clarify this feature.

      Good comment. We have enlarged the images for better view and revised the context.

      (10) Both protein and RNA expression analysis have been performed on WT CaSki cells and FANCI-2 KO cells. If I am correct there is little overlap between the significantly changed gene products. What does this mean? Have you looked into the comparison?

      The DEGs identified from RNA-seq indicated a genome wide transcriptome change, while the protein array we used only covered 105 soluble protein receptors. However, we did find 9/15 (60%) membrane proteins in cell lysates (PODXL2, ECM1, NECTIN2, MCAM, ADAM9, CDH5, ADAM10, ITGA5, NOTCH1, SCARF2, ADAM8, TIMP2, LGALS3BP, CDH13, and ITGB6) exhibited consistent changes in expression (underlined) by both RNA-seq and protein array assays. We have revised the text with this information (page 11). Other six proteins (40%) had inconsistent expression correlation in two assays could be due to post-translational mechanisms, such as protein stability, modifications and secretion, etc.  

      (11) Figure S7, which represents TCGA data and survival is quite complex. It would be more effective to display a similar figure for FANCI-2, as was done for MCAM in Figure 7I, to simplify the comparison and enhance clarity.

      Thanks. However, the suggested figure for lnc-FANCI-2 was published in PNAS paper already (Liu H., et al. PNAS, 2021).  The Figure S8 in this revision is the result from our in-house GradientScanSurv pipeline, a new way to correlate the expression and survival more accurately.

      What do the Figures look like if you analyse only HPV16+ patients versus HPV18+ patients, considering that FANCI-2 upregulation in cell lines is related to HPV16 and not 18? Is there an effect of histotype? Or tumor stage?

      HPV18 infected keratinocytes express high level of lnc-FANCI-2. Two HPV18<sup>+</sup> HeLa and C4II cell lines and HPV-negative cell lines, such as HCT116 cells, which do not express lnc-FANCI-2 could be due to the presence of some unknow repressive factors. We found that lnc-FANCI-2 promoter functions well in responding to YY1 binding in CaSki and C33A cells expressing lnc-FANCI-2 but does not so in HeLa and HCT116 cells in our dual luciferase assays. 

      (12) It remains puzzling that FANCI-2 upregulation was previously shown to already occur in CIN lesions and increase further in cervical cancer, while the current data indicate that FANCI-2 suppresses AKT activation. If I am correct Akt activation has been linked to cervical carcinogenesis. Similarly, line 434 states that increased MCAM might promote cervical tumorigenesis, implying that low FANCI-2 would stimulate tumorigenesis. If I understand correctly, the increase in FANCI-2 observed in CIN lesions would reflect a "brake" on the carcinogenic pathway and its sustained increase in cancer might indicate that growth is still (partly) controlled. As mentioned earlier, a Figure illustrating the relation between FANCI-2, HPV, and the carcinogenic process would be beneficial for clarity.

      Yes. Increased MCAM, but low level of lnc-FANCI-2, correlates with poor cervical cancer survival. We have revised Figure 8E to illustrate this relation better.  

      (13) May part of the potentially conflicting findings be explained by CaSki cells being of metastatic origin? Related to this, does the expression of FANCI-2 or MALM depend on the tumor stage?

      Thanks for this important suggestion. Unfortunately, we found that the expression of lnc-FANCI-2 and MCAM is not associated with cervical cancer stage based on the TCGA data (http://gepia.cancer-pku.cn/index.html). See the data below:

      Author response image 2.

      Despite some lingering uncertainty, the extensive experiments conducted using KO and KD cells do provide compelling evidence that lnc-FANCI-2 function is linked to RAS signaling and EMT.

      Thanks for your positive review and instructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2. I suggest authors provide more discussions on this difference, for example, HPV genotypes. HPV genome status in host cells? Cell types?

      Thanks. We found the keratinocyte infections with HPV16, HPV18, and other HR-HPVs could induce lnc-FANCI-2 expression (Liu H., et al. PNAS, 2021). In this report, we found HPV18<sup>+</sup> HeLa and C4II cells and other HPV-negative cell lines do not. Our preliminary data on lnc-FANCI-2 promoter activity assays showed the presence of a negative regulatory factor (s) in non-lnc-FANCI-2 expressing cells. See the data in Author response image 1.

      We have revised our discussion by inclusion these sets of the luciferase data as data not shown.

      (2) I suggest the authors discuss more details on how the changes of RAS signaling in KO cells help our further understanding of the molecular mechanisms for HPV-associated full-cell transformation and malignancy in addition to the well-known functions of HPV E6 and E7.

      Thanks. We have modified the Figure 8E as suggested by reviewer 2 and revised the discussion further.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Detecting unexpected epistatic interactions among multiple mutations requires a robust null expectation - or neutral function - that predicts the combined effects of multiple mutations on phenotype, based on the effects of individual mutations. This study assessed the validity of the product neutrality function, where the fitness of double mutants is represented as the multiplicative combination of the fitness of single mutants, in the absence of epistatic interactions. The authors utilized a comprehensive dataset on fitness, specifically measuring yeast colony size, to analyze epistatic interactions.

      The study confirmed that the product function outperformed other neutral functions in predicting the fitness of double mutants, showing no bias between negative and positive epistatic interactions. Additionally, in the theoretical portion of the study, the authors applied a wellestablished theoretical model of bacterial cell growth to simulate the growth rates of both single and double mutants under various parameters. The simulations further demonstrated that the product function was superior to other functions in predicting the fitness of hypothetical double mutants. Based on these findings, the authors concluded that the product function is a robust tool for analyzing epistatic interactions in growth fitness and effectively reflects how growth rates depend on the combination of multiple biochemical pathways.

      Strengths:

      By leveraging a previously published extensive dataset of yeast colony sizes for single- and double-knockout mutants, this study validated the relevance of the product function, commonly used in genetics to analyze epistatic interactions. The finding that the product function provides a more reliable prediction of double-mutant fitness compared to other neutral functions offers significant value for researchers studying epistatic interactions, particularly those using the same dataset.

      Notably, this dataset has previously been employed in studies investigating epistatic interactions using the product neutrality function. The current study's findings affirm the validity of the product function, potentially enhancing confidence in the conclusions drawn from those earlier studies. Consequently, both researchers utilizing this dataset and readers of previous research will benefit from the confirmation provided by this study's results.

      Weaknesses:

      This study exhibits several significant logical flaws, primarily arising from the following issues: a failure to differentiate between distinct phenotypes, instead treating them as identical; an oversight of the substantial differences in the mechanisms regulating cell growth between prokaryotes and eukaryotes; and the adoption of an overly specific and unrealistic set of assumptions in the mutation model. Additionally, the study fails to clearly address its stated objective-investigating the mechanistic origin of the multiplicative model. Although it discusses conditions under which deviations occur, it falls short of achieving its primary goal. Moreover, the paper includes misleading descriptions and unsubstantiated reasoning, presented without proper citations, as if they were widely accepted facts. Readers should consider these issues when evaluating this paper. Further details are discussed below.

      (1) Misrepresentation of the dataset and phenotypes

      The authors analyze a dataset on the fitness of yeast mutants, describing it as representative of the Malthusian parameter of an exponential growth model. However, they provide no evidence to support this claim. They assert that the growth of colony size in the dataset adheres to exponential growth kinetics; in contrast, it is known to exhibit linear growth over time, as indicated in [Supplementary Note 1 of https://doi.org/10.1038/nmeth.1534]. Consequently, fitness derived from colony size should be recognized as a different metric and phenotype from the Malthusian parameter. Equating these distinct phenotypes and fitness measures constitutes a fundamental error, which significantly compromises the theoretical discussions based on the Malthusian parameter in the study.

      The reviewer is correct in pointing out that colony-size measurements are distinct from exponential growth kinetics. We acknowledge that our original text implied that the dataset directly measured the exponential growth rate (Malthusian parameter), when in fact it was measuring yeast colony expansion rates on solid media. Colony growth under these conditions often follows a biphasic pattern in that there is typically an initial microscopic phase where cells can grow exponentially, but as the colony expands further then the growth dynamics become more linear (Meunier and Choder 1999). We have revised our text to state clearly what the experiment measured.

      However, while colony size does not exhibit exponential growth kinetics, several studies have argued that the rate of colony expansion is related to the exponential growth rate of cells growing in non-limiting nutrient conditions in liquid culture. This is because colony growth is dominated by cells at the colony boundaries that have access to nutrients and are in exponential growth. Cells in the colony interior lack nutrients and therefore contribute little to colony growth. This has been shown both in theoretical and experimental studies, finding that the linear growth rate of the colony is directly linked to the single-cell exponential growth rate (Pirt 1967; Gray and Kirwan 1974; Korolev et al. 2012; Gandhi et al. 2016; Meunier and Choder 1999). In particular, the above studies suggest that the linear colony growth rate is directly proportional to the square root of the exponential growth rate. Therefore, one would expect that the validity of the product model for one fitness measure implies its validity for the other measure. In addition, colony size was found to be highly correlated with the exponential growth rate of cells in non-limiting nutrients in liquid culture (Baryshnikova et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). For these reasons, we treated the colony size and exponential growth rate as interchangeable in our original manuscript. 

      To address the important point raised by the reviewer, we now explain more clearly in the text what the analyzed data on colony size show and why we believe it is reflective of the exponential growth rate. Finally, we note that our results supporting the product neutrality function are consistent with the work of (Mani et al. 2008), which used smaller datasets based on liquid culture growth rates (Jasnos and Korona 2007; Onge et al. 2007).

      The text in Section 2.3 now reads:

      “Having verified empirically that the Product neutrality function is supported by the latest data for cell proliferation, we now turn our attention to its origins. Addressing this question requires some mechanistic model of biosynthesis. However, most mechanistic models of growth apply directly to single cells in rich nutrient conditions, which may not directly apply to the SGA measurements of colony expansion rates. In particular, colony growth has been shown to follow a biphasic pattern (Meunier et al. 1999). A first exponential phase is followed by a slower linear phase as the colony expands. Previous modeling and empirical work indicates that this second linear expansion rate reflects the underlying exponential growth of cells in the periphery of the colony (Pirt 1967; Gray et al. 1974; Gandhi et al. 2016; Baryshnikova, Costanzo, S. Dixon, et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). More precisely, mathematical models show the linear colony-size expansion rate is directly proportional to the square root of the exponential growth rate under non-limiting conditions. Intuitively, this relationship arises because colony growth is dominated by the expansion of the population of cells in an annulus at the colony border that are exposed to rich nutrient conditions. These cells expand at a rate similar to the exponential rate of cells growing in a rich nutrient liquid culture. In contrast, the cells in the interior of the colony experience poor nutrient conditions, grow very slowly, and do not contribute to colony growth.

      This intimate relationship between both proliferation rates allows us to explore the origin of the Product neutrality function in mechanistic models of cell growth. Indeed, if colony-based fitnesses follow a Product model, then

      where the superscript c indicates colony-based values for the fitness W and the growth rate λ. Taking into account the relationship between single-cell exponential growth rates and colony growth rates, we can write

      where the superscript l denotes liquid cultures. Combining these expressions, we obtain

      In other words, from the perspective of the Product neutrality function, fitnesses based on colony expansion rates are equivalent to fitnesses based on single-cell exponential growth rates. The prevalence of the Product neutrality model—both in the SGA data and in previous studies on datasets from liquid cultures (Jasnos et al. 2007; Onge et al. 2007; Mani et al. 2008)—encourages the exploration of its origin in mechanistic models of cell growth.”

      (2) Misapplication of prokaryotic growth models

      The study attempts to explain the mechanistic origin of the multiplicative model observed in yeast colony fitness using a bacterial cell growth model, particularly the Scott-Hwa model. However, the application of this bacterial model to yeast systems lacks valid justification. The Scott-Hwa model is heavily dependent on specific molecular mechanisms such as ppGppmediated regulation, which plays a crucial role in adjusting ribosome expression and activity during translation. This mechanism is pivotal for ensuring the growth-dependency of the ribosome fraction in the proteome, as described in [https://doi.org/10.1073/pnas.2201585119]. Unlike bacteria, yeast cells do not possess this regulatory mechanism, rendering the direct application of bacterial growth models to yeast inappropriate and potentially misleading. This fundamental difference in regulatory mechanisms undermines the relevance and accuracy of using bacterial models to infer yeast colony growth dynamics.

      If the authors intend to apply a growth model with macroscopic variables to yeast double-mutant experimental data, they should avoid simply repurposing a bacterial growth model. Instead, they should develop and rigorously validate a yeast-specific growth model before incorporating it into their study.

      There is nothing that is prokaryote specific in the Scott-Hwa model. It does not include the specific ppGpp mechanism to regulate ribosome fraction that does not exist in eukaryotes.  The general features of the model, like how the ribosome fraction is proportional to the growth rate have indeed been validated in yeast (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022). Performing a detailed physiological analysis of budding yeast across varying growth conditions in order to build a more extensive model is beyond the scope of this work. Finally, we note that the Weiße model, which we also analyzed, is also generic and has replicated empirical measurements both from bacteria and yeast (Weiße et al. 2015).

      To clarify this point in the text, we have added the following to Section 2.3: 

      “Experimental measurements in other organisms suggest that the observations leading to this model, including that the cellular ribosome fraction increases with growth rate, are in fact generic and also seen in the yeast S. cerevisiae (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022).”

      (3) Overly specific assumptions in the theoretical model

      he theoretical model in question assumes that two mutations affect only independent parameters of specific biochemical processes, an overly restrictive premise that undermines its ability to broadly explain the occurrence of the multiplicative model in mutations. Additionally, experimental evidence highlights significant limitations to this approach. For example, in most viable yeast deletion mutants with reduced growth rates, the expression of ribosomal proteins remains largely unchanged, in direct contradiction to the predictions of the Scott-Hwa model, as indicated in [https://doi.org/10.7554/eLife.28034]. This discrepancy emphasizes that the ScottHwa model and its derivatives do not reliably explain the growth rates of mutants based on current experimental data, suggesting that these models may need to be reevaluated or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.

      In the data from the Barkai lab referenced by the reviewer (reproduced below), we see that the ribosomal transcript fraction is in fact proportional to growth rate in response to gene deletions in contradiction to the reviewer’s interpretation. However, it is notable that the ribosomal transcript fraction is a bit higher for a given growth rate if that growth rate is generated by a mutation rather than generated by a suboptimal nutrient condition. We know that the very simple Scott-Hwa model is not a perfect representation of the cell. Nevertheless, it does recapitulate important aspects of growth physiology and therefore we thought it is useful to analyze its response to mutations and compare those responses to the different neutrality functions.  We never claimed the Scott-Hwa model was a perfect model and fully agree with the referee’s statement above that “... these models may need to be reevaluated, or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.” Indeed, we say as much in our discussion where we wrote: 

      “While we focused on coarse-grained models for their simplicity and mechanistic interpretability, they might be too simple to effectively model large double-mutant datasets and the resulting double-mutant fitness distributions. We therefore expect the combination of high throughput genetic data with the analysis of larger-scale models, for instance based on Flux Balance Analysis, Metabolic Control Analysis, or whole-cell modeling, to lead to important complementary insights regarding the regulation of cell growth and proliferation.”

      To further clarify this point, we discuss and cite the Barkai lab data for gene deletions see Figure 2 from Metzl-Raz et al. 2017.

      (4) Lack of clarity on the mechanistic origin of the multiplicative model

      The study falls short of providing a definitive explanation for its primary objective: elucidating the "mechanistic origin" of the multiplicative model. Notably, even in the simplest case involving the Scott-Hwa model, the underlying mechanistic basis remains unexplained, leaving the central research question unresolved. Furthermore, the study does not clearly specify what types of data or models would be required to advance the understanding of the mechanistic origin of the multiplicative model. This omission limits the study's contribution to uncovering the biological principles underlying the observed fitness patterns.”

      We appreciate the reviewer’s interest in a more complete mechanistic explanation for the product model of fitness. The primary goal of this study was to explore the validity of the Product model from the perspective of coarse-grained models of cell growth, and to extract mechanistic insights where possible. We view our work as a first step toward a deeper understanding of how double-mutant fitnesses combine, rather than a final, all-encompassing theory. As the referee notes, we are limited by the current state of the field, which has an incomplete understanding of cell growth. 

      Nonetheless, our analysis does propose concrete, mechanistically informed explanations. For example, we highlight how growth-optimizing feedback—such as cells’ ability to reallocate ribosomes or adjust proteome composition—naturally leads to multiplicative rather than additive or minimal fitness effects. We also link the empirical deviations from pure multiplicative behavior to differences in how specific pathways re-balance under perturbation, and we suggest that a product-like rule emerges when multiple interconnected processes each partially limit cell growth.

      In the discussion, we clarify what additional data and models we think will be required to advance this question. Namely, we propose extending our approach through larger-scale, more detailed modeling frameworks – that may include explicit modeling of ppGpp or TOR activities in bacteria or eukaryotic cells, respectively. We also emphasize the importance of refining the measurement of cell growth rates to uncover subtle deviations from the product rule that could yield greater mechanistic insight. By integrating high-throughput genetic data with nextgeneration computational models, it should be possible to hone in on the specific biological principles (e.g., metabolic bottlenecks, resource reallocation) that underlie the multiplicative neutrality function.

      Reviewer #2 (Public review):

      The paper deals with the important question of gene epistasis, focusing on asking what is the correct null model for which we should declare no epistasis.

      In the first part, they use the Synthetic Genetic Array dataset to claim that the effects of a double mutation on growth rate are well predicted by the product of the individual effects (much more than e.g. the additive model). The second (main) part shows this is also the prediction of two simple, coarse-grained models for cell growth.

      I find the topic interesting, the paper well-written, and the approach innovative.

      One concern I have with the first part is that they claim that:

      "In these experiments, the colony area on the plate, a proxy for colony size, followed exponential growth kinetics. The fitness of a mutant strain was determined as the rate of exponential growth normalized to the rate in wild type cells."

      There are many works on "range expansions" showing that colonies expand at a constant velocity, the speed of which scales as the square root of the growth rate (these are called "Fisher waves", predicted in the 1940', and there are many experimental works on them, e.g. https://www.pnas.org/doi/epdf/10.1073/pnas.0710150104) If that's the case, the area of the colony should be proportional to growth_rate X time^2 , rather than exp(growth_rate*time), so the fitness they might be using here could be the log(growth_rate) rather than growth_rate itself? That could potentially have a big effect on the results.

      We thank the reviewer for their thoughtful remarks. As they rightly pointed out, a large body of literature supports that colonies expand at constant velocity both from a theoretical and experimental standpoint. 

      As discussed in the answer to the first question of Reviewer 1, this body of work also suggests that the linear expansion rate of the colony front is directly related to the single-cell exponential growth rate of the cells at the periphery. Hence, although the macroscopic colony growth may not be exponential in time, measuring colony size (or radial expansion) across different genotypes still provides a consistent and meaningful proxy for comparing their underlying growth capabilities. 

      In particular, these studies suggest (consistently with Fisher-wave theory) that the linear growth rate of the colony 𝐾 is proportional to the square root of the exponential growth rate 𝜆. Under the assumption that the product model is valid for a given double mutant and for the exponential growth rate, we would have that

      The associated wave-front velocities would then be predicted to be

      In other words, if the product model is valid for fitness measures based on exponential growth rates, it should also be valid for fitness measures based on linear colony growth rates. 

      We now include this discussion in the revised version of Section 2.3.

      Additional comments/questions:

      (1) What is the motivation for the model where the effect of two genes is the minimum of the two?

      The motivation for the minimal model is the notion that there might be a particular process that is rate-limiting for growth due to a mutation. In this case, a mutation in process X makes it really slow and process Y proceeds in parallel and has plenty of time to finish its job before cell division takes place. In this case, even a mutation to process Y might not slow down growth because there is an excess amount of time for it to be completed. Thus, the double mutant might then be anticipated to have the growth rate associated with the single mutation to process X. We now add a similar description when we introduce the different neutrality functions in Section 2.1.

      (2) How seriously should we take the Scott-Hwa model? Should we view it as a toy model to explain the phenomenon or more than that? If the latter, then since the number of categories in the GO analysis is much more than two (47?) in many cases the analysis of the experimental data would take pairs of genes that both affect one process in the Scott-Hwa model - and then the product prediction should presumably fail? The same comment applies to the other coarse-grained model.

      From our perspective, models like the Scott-Hwa model constitute the simplest representation of growth based on data that is not trivial. Moreover, the Scott-Hwa model is able to incorporate interactions between two different biological processes. We believe models, like the Scott-Hwa and Weiße models, should be viewed as more than mere toy models because they have been backed up by some empirical data, such as that showing the ribosome fraction increases with growth rate. However, the Scott-Hwa model is inherently limited by its low dimensionality and relative simplicity. We do not claim that such models can provide a full picture of the cell. As argued in the main text, we have chosen to focus on such models because of their tractability and in the hope of extracting general principles. We nonetheless agree with the reviewer that they do not have the capacity to represent interactions between genes in the same biological process. We now note this limitation in the text. 

      (3) There are many works in the literature discussing additive fitness contributions, including Kaufmann's famous NK model as well as spin-glass-type models (e.g. Guo and Amir, Science Advances 2019, Reddy and Desai, eLife 2021, Boffi et al., eLife 2023) These should be addressed in this context.

      We thank the reviewer for pointing out this part of the literature. We do believe these works constitute a relevant body of work tackling the emergence of epistasis patterns from a theoretical grounding, and now reference and discuss them in the text. 

      (4) The experimental data is for deletions, but it would be interesting to know the theoretical model's prediction for the expected effects of beneficial mutations and how they interact since that's relevant (as mentioned in the paper) for evolutionary experiments. Perhaps in this case the question of additive vs. multiplicative matters less since the fitness effects are much smaller.

      This is an interesting question. Since mutations increasing the growth rate generated by gene deletions or other systematic perturbations are rare, we did not focus on them. Of course, as the reviewer notes, in the case of evolution experiments, these fitness enhancing mutations are selected for. To address the reviewer's question, we can first consider the Scott-Hwa model. In this case, the analytical solution remains valid in the case of fitness enhancing mutations so that the fitness of the double mutant will be the product neutrality function multiplied by an additional interaction term (see Figure 3). The mathematical derivation predicts that the double mutant fitness can potentially grow indefinitely. Indeed, the denominator can be equal to zero in some cases. In simulations, we see that the observation for deleterious mutations does not seem to hold for beneficial mutations (new supplementary Figure S5 shown below). Indeed, no model seems to replicate double mutant fitnesses much better than any other. This suggests that the growth-optimizing feedback we discuss in section 2.3 may have compound effects that ultimately make double-mutant fitnesses much larger than any model predicts.

      We recognize this may be an important point, and discuss it in detail in the revised section 2.3 as well as in the discussion.

      Baryshnikova, Anastasia, Michael Costanzo, Scott Dixon, Franco J. Vizeacoumar, Chad L. Myers, Brenda Andrews, and Charles Boone. 2010. “Synthetic Genetic Array (SGA) Analysis in Saccharomyces Cerevisiae and Schizosaccharomyces Pombe.” Methods in Enzymology 470 (March):145–79.

      Elsemman, Ibrahim E., Angelica Rodriguez Prado, Pranas Grigaitis, Manuel Garcia Albornoz, ictoria Harman, Stephen W. Holman, Johan van Heerden, et al. 2022. “Whole-Cell Modeling in Yeast Predicts Compartment-Specific Proteome Constraints That Drive Metabolic Strategies.” Nature Communications 13 (1): 801.

      Gandhi, Saurabh R., Eugene Anatoly Yurtsev, Kirill S. Korolev, and Jeff Gore. 2016. “Range Expansions Transition from Pulled to Pushed Waves as Growth Becomes More Cooperative in an Experimental Microbial Population.” Proceedings of the National Academy of Sciences of the United States of America 113 (25): 6922–27.

      Gray, B. F., and N. A. Kirwan. 1974. “Growth Rates of Yeast Colonies on Solid Media.” Biophysical Chemistry 1 (3): 204–13.

      Jasnos, Lukasz, and Ryszard Korona. 2007. “Epistatic Buffering of Fitness Loss in Yeast Double Deletion Strains.” Nature Genetics 39 (4): 550–54.

      Korolev, Kirill S., Melanie J. I. Müller, Nilay Karahan, Andrew W. Murray, Oskar Hallatschek, and David R. Nelson. 2012. “Selective Sweeps in Growing Microbial Colonies.” Physical Biology 9 (2): 026008.

      Mani, Ramamurthy, Robert P. St Onge, John L. Hartman 4th, Guri Giaever, and Frederick P. Roth. 2008. “Defining Genetic Interaction.” Proceedings of the National Academy of Sciences of the United States of America 105 (9): 3461–66.

      Metzl-Raz, Eyal, Moshe Kafri, Gilad Yaakov, Ilya Soifer, Yonat Gurvich, and Naama Barkai. 2017. “Principles of Cellular Resource Allocation Revealed by Condition-Dependent Proteome Profiling.” eLife 6 (August). https://doi.org/10.7554/elife.28034.

      Meunier, J. R., and M. Choder. 1999. “Saccharomyces Cerevisiae Colony Growth and Ageing: Biphasic Growth Accompanied by Changes in Gene Expression.” Yeast (Chichester, England) 15 (12): 1159–69.

      Miller, James H., Vincent J. Fasanello, Ping Liu, Emery R. Longan, Carlos A. Botero, and Justin C. Fay. 2022. “Using Colony Size to Measure Fitness in Saccharomyces Cerevisiae.” PloS e 17 (10): e0271709.

      Onge, Robert P. St, Ramamurthy Mani, Julia Oh, Michael Proctor, Eula Fung, Ronald W. Davis, Corey Nislow, Frederick P. Roth, and Guri Giaever. 2007. “Systematic Pathway Analysis Using High-Resolution Fitness Profiling of Combinatorial Gene Deletions.” Nature Genetics 39 (2): 199–206.

      Pirt, S. J. 1967. “A Kinetic Study of the Mode of Growth of Surface Colonies of Bacteria and Fungi.” Journal of General Microbiology 47 (2): 181–97.

      Weiße, Andrea Y., Diego A. Oyarzún, Vincent Danos, and Peter S. Swain. 2015. “Mechanistic Links between Cellular Trade-Offs, Gene Expression, and Growth.” Proceedings of the National Academy of Sciences of the United States of America 112 (9): E1038–47.

      Xia, Jianye, Benjamin J. Sánchez, Yu Chen, Kate Campbell, Sergo Kasvandik, and Jens Nielsen. 2022. “Proteome Allocations Change Linearly with the Specific Growth Rate of Saccharomyces Cerevisiae under Glucose Limitation.” Nature Communications 13 (1): 2819.

      Zackrisson, Martin, Johan Hallin, Lars-Göran Ottosson, Peter Dahl, Esteban Fernandez-Parada, Erik Ländström, Luciano Fernandez-Ricaud, et al. 2016. “Scan-O-Matic: High-Resolution Microbial Phenomics at a Massive Scale.” G3 (Bethesda, Md.) 6 (9): 3003–14.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new potential tool to manipulate Tregs function for therapeutic use. It focuses on the role of PGAM in Tregs differentiation and function. The authors, interrogating publicly available transcriptomic and proteomic data of human regulatory T cells and CD4 T cells, state that Tregs express higher levels of PGAM (at both message and protein levels) compared to CD4 T cells. They then inhibit PGAM by using a known inhibitor ECGC and show that this inhibition affects Tregs differentiation. This result was also observed when they used antisense oligonucleotides (ASOs) to knockdown PGAM1.

      PGAM1 catalyzes the conversion of 3PG to 2PG in the glycolysis cascade. However, the authors focused their attention on the additional role of 3PG: acting as starting material for the de novo synthesis of serine.

      They hypothesized that PGAM1 regulates Tregs differentiation by regulating the levels of 3PG that are available for de novo synthesis of serine, which has a negative impact on Tregs differentiation. Indeed, they tested whether the effect on Tregs differentiation observed by reducing PGAM1 levels was reverted by inhibiting the enzyme that catalyzes the synthesis of serine from 3PG.

      The authors continued by testing whether both synthesized and exogenous serine affect Tregs differentiation and continued with in vivo experiments to examine the effects of dietary serine restriction on Tregs function.

      In order to understand the mechanism by which serine impacts Tregs function, the authors assessed whether this depends on the contribution of serine to one-carbon metabolism and to DNA methylation.

      The authors therefore propose that extracellular serine and serine whose synthesis is regulated by PGAM1 induce methylation of genes Tregs associated, downregulating their expression and overall impacting Tregs differentiation and suppressive functions.

      Strengths:

      The strength of this paper is the number of approaches taken by the authors to verify their hypothesis. Indeed, by using both pharmacological and genetic tools in in vitro and in vivo systems they identified a potential new metabolic regulation of Tregs differentiation and function.

      We are grateful to the reviewer for their thoughtful and constructive consideration of our work. We appreciate their comment that the number of approaches taken to test our hypothesis represents a strength that increases confidence in the conclusions.

      Weaknesses:

      Using publicly available transcriptomic and proteomic data of human T cells, the authors claim that both ex vivo and in vitro polarized Tregs express higher levels of PGAM1 protein compared to CD4 T cells (naïve or cultured under Th0 polarizing conditions). The experiments shown in this paper have all been carried out in murine Tregs. Publicly available resources for murine data (ImmGen -RNAseq and ImmPRes - Proteomics) however show that Tregs do not express higher PGAM1 (mRNA and protein) compared to CD4 T cells. It would be good to verify this in the system/condition used in the paper.

      This is a fair comment. Although our pharmacologic and genetic studies demonstrated the importance of PGAM in Treg differentiation and suppressive function in murine cells, thereby corroborating the hypothesis formed based on human CD4 cell expression data, we agree that investigating PGAM expression in murine Tregs is important in the context of our work. In reviewing the ImmPres proteomics database, the reviewer is correct that PGAM1 expression was not higher in iTregs compared to other subsets, including Th17 cells. However, when compared to other glycolytic enzymes, expression of PGAM1 increases out of proportion in iTregs. In particular, the ratio of PGAM1 to GAPDH expression is much greater in iTregs compared to Th17 cells. This data is now shown in the revised Figure S5. The disproportionate increase in PGAM1 expression is consistent with the regulatory role of PGAM in the Treg-Th17 axis via modulation of 3PG concentrations, a metabolite that lies between GAPDH and PGAM in the glycolytic pathway. The divergent expression changes between GAPDH and PGAM furthermore support the conclusion that GAPDH and PGAM play opposite roles in Treg differentiation.

      It would also be good to assess the levels of both PGAM1 mRNA and protein in Tregs PGAM1 knockdown compared to scramble using different methods e.g. qPCR and western blot. However, due to the high levels of cell death and differentiation variability, that would require cells to be sorted.

      We appreciate this comment. As noted by the reviewer, assessing PGAM1 expression via qPCR and Western blot would require cell sorting, which we do not currently have the resources to pursue. However, we measured the effect of ASOs on PGAM1 protein expression using anti-PGAM1 antibody via flow cytometry, which allowed gating on viable cells. As shown in Figure S3A, PGAM-targeted ASOs led to an approximately 40% decrease in PGAM1 expression, as measured by mean fluorescence intensity (MFI). Furthermore, we now show in revised Figure S2 that ASO uptake was near-complete in our cultured CD4 cells.

      It is not specified anywhere in the paper whether cells were sorted for bulk experiments. Based on the variability of cell differentiation, it would be good if this was mentioned in the paper as it could help to interpret the data with a different perspective.

      Cells were not sorted for bulk experiments. In the revised manuscript, this point is made clear in the text, figure legends, and Methods. It is worth noting that all bulk experiments were conducted on samples with greater than 70% cell viability (greater than 90% for stable isotope tracing studies).

      Reviewer #2 (Public review):

      Summary:

      The authors have tried to determine the regulatory role of Phosphoglycerate mutate (PGAM), an enzyme involved in converting 3-phosphoglycerate to 2-phosphoglycerate in glycolysis, in differentiation and suppressive function of regulatory CD4 T cells through de novo serine synthesis. This is done by contributing one carbon metabolism and eventually epigenetic regulation of Treg differentiation.

      Strengths:

      The authors have rigorously used inhibitors and antisense RNA to verify the contribution of these pathways in Treg differentiation in-vitro. This has also been verified in an in-vivo murine model of autoimmune colitis. This has further clinical implications in autoimmune disorders and cancer.

      We very much appreciate these comments about the rigor of the work and its implications.

      Weaknesses:

      The authors have used inhibitors to study pathways involved in Treg differentiation. However, they have not studied the context of overexpression of PGAM, which was the actual reason to pursue this study.

      We appreciate this comment and agree that overexpression of PGAM would be an excellent way to complement and further corroborate our findings. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest increasing the font size for flow cytometry gates. Percentages are the focus of the analysis, and it is very hard to read any.

      We have increased the font size on all flow cytometry gates, as suggested.

      Moreover, most of the flow data show Tregs polarization based on CD25 and FOXP3 expression. However, Figure 3 A, Figure 4D and Figure S3 show Tregs polarization based on FSC and Foxp3. Is there any reason for this?

      Antibody staining against CD25 was poor in the experiments noted, which is why Foxp3 alone was used to identify Treg cells in these experiments.

      Especially for Figure 3A, other cells could also express Foxp3 making interpretation difficult.

      This is a fair comment. With respect to Figures 4D and S3 (now revised Figure S4), these experiments were conducted in isolated CD4 cells, in which the population of CD25-Foxp3+ cells is minimal following Treg polarization (as evident in our other figures). Regarding Figure 3A, previous work has found minimal expression of Foxp3 in circulating non-T cells (Devaud et al., 2014, PMID 25063364), such that we have confidence the identified Foxp3 expressing cells are, in fact, Treg cells. Notably, Figure 3A was already gated on CD4+ T cells, and in the periphery of wild-type mice, these would be reasonably referred to as Tregs, although this does not apply to diseased states or specific cases such as the tumor microenvironment.

      The level of murine Tregs differentiation varies a lot among experiments. The % of CD4+CD25+FOXP3+ is ranging from 14% to 77% (controls). It would be good to understand and verify why such differentiation variability.

      For most of our Treg polarization experiments, % differentiation in the control group falls within the 35 – 55% range. We found that treatment with ASOs (even scrambled control ASOs) tended to decrease Treg polarization overall, leading to lower numbers of Foxp3 expression in these experiments. Differentiation was similarly low in a few experiments that did not involve the use of ASOs, which we believe was caused by batch variability in the recombinant TGF-b that was used for polarization. Despite this variability, experiments were conducted with sufficient independent experiments and biological replicates to observe consistent trends and to have confidence in the results, as corroborated by statistical testing and the wide variety of experimental approaches used to verify our conclusions. Notably controls were run in every experiment, allowing accurate comparisons to be made in each individual experiment.

      Similar comments apply to the level of cell death observed in the cultures of polarizing Tregs.

      Although there was some variability in cell viability between experiments, flow cytometry experiments were always gated on live cells, and we believe concerns about reproducibility are substantially mitigated by the number of independent experiments, biological replicates, and distinct experimental approaches used for verification of the experimental findings. For all bulk experiments, cell viability was greater than 70% and equal across samples. For the flux studies, viability was greater than 90% and equal across samples.

      Figure 2 B and D: EGCG has been used at two different concentrations. Is it lower in Figure 2D because of one condition being a combination of inhibitors or is it a typo?

      The doses stated in the original legend are correct. Yes, drug doses were optimized for combination-treatment experiments. This point is now clarified in the figure legend.

      Figure 2G: The description in the results does not match figure legend - Text - serine/glycine-free media or control (serine/glycine-containing) media; figure legend - serine/glycine-free media or media containing 4 mM serine.

      We thank the reviewer for pointing out this discrepancy, which was an error in the text. The two conditions used were 1) serine/glycine-free media, and 2) serine/glycine-free media supplemented with 4 mM serine. The text and figure legend have both been updated to clarify this point.

      Figure 3 F and G: the graphs do not show the individual points.

      Individual points were not shown in these graphs because they are derived from scRNA-seq data, with SCFEA calculated from individual cells. As such, there are far too many data points to display all individual values.

      CD4+ T-cell isolation and culture: cells were cultured in 50%RPMI and 50% AIM-V.

      I thought that AIM-V medium was intended to be for human cultures. Could some of the conditions explain the low level of differentiation observed in some experiments? If there is such variability it might be because the conditions used are not optimal and therefore not reproducible.

      We appreciate this critique. Although AIM-V media is often used for ex vivo human T cell cultures, it can similarly be used for mouse T cell culture with the addition of b-mercaptoethanol, as suggested by ThermoFisher and as used in prior publications, such as PMID 36947105. As outlined in the responses above, the differentiation we observed was consistent in most experiments, with some variability based on experimental conditions (such as lower differentiation in the setting of ASO treatment). Furthermore, we believe the number of independent experiments, biological replicates, and independent experimental approaches used in the study supports the reproducibility of our findings.

      Figures S1 A, S2 B, and S4: the flow data are shown using both heights (FSC) and area (zombie NIR dye). It would be better to use areas for both parameters.

      In the revised manuscript, areas are now used on both the x- and y-axes for these figures.

      Figure S1 B and S2 C: The bar graphs are both showing proliferation index, however, the graphs are labelled differently in the two figures and in the legend (proliferation index -Fig S1 B; division index -Fig S2 C and replication index in the legend of Fig S2 C). The explanation of how the index has been calculated should probably go in the legend of the first figure that shows it.

      We thank the reviewer for this comment. In the revised manuscript, we have ensured consistency in the terminology (“proliferation index” is now used consistently), and the explanation of the proliferation index calculation is now included in the legend to Figure S1, where the proliferation index first appears.

      Were Tregs PGAM1 KD used for RNAseq sorted or not? Based on the plots shown in Figure S2 B there is ~ 50% death which needs to be taken into consideration for the analysis if not depleted.

      Similar question for all bulk experiments. It is not specified in the methods or figure legends.

      The cells used for RNAseq and other bulk experiments were not sorted. This point is now made clear in the text, figure legends, and Methods. However, cultures were only used for bulk analyses if the viability in those particular experiments was greater than 70%. Given the sensitivity of stable isotope tracing analyses, cultures were only analyzed for those studies if viability was greater than 90%. In these experiments, viability was similar across samples.

      It was mentioned in Figure 1 that the PGAM KD led to transcriptional changes that impacted MYC targets and mTORC1 signalling. It would be good to validate these findings maybe with more targeted experiments.

      We appreciate this suggestion and agree that validation and further investigation of these critical targets would be worthwhile. However, because of limitations to resources and the fact that these findings are not critical to the main conclusions of the study, we consider these experiments as future directions beyond the scope of the current work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few suggestions and recommendations to improve the research study.

      (1) The authors have used the word 'vehicle' in most of the figures, however, this word is not explained well in the figure legend. The authors may want to clarify to readers whether vehicle is a plasmid or a solvent for control purposes. For example, in Figure 1D, if vehicle is a plasmid, then another sample for vehicle +/-EGCG should be considered for the rigor in results.

      Thank you for identifying this point of confusion. For all drug treatment experiments, vehicle controls consisted of solvent alone without drug. For ASO experiments, the control condition consisted of scrambled ASO. This point is now made clear in the Methods (“Drug and ASO Treatments” section) as well as in the main text. Furthermore, the figure legends and axes have been edited such that “vehicle” is only used to refer to drug experiments (in which solvent vehicle alone was used as control), and “control” is used to refer to ASO experiments (in which scrambled ASO served as control).

      (2) Figure 1H represents the RNAseq data for knockdown of PGAM1. It might be interesting to see similar data for the overexpression of PGAM1.

      We appreciate this comment and agree that overexpression of PGAM1 would be an excellent way to complement and further corroborate our findings using PGAM1 knockdown and pharmacologic inhibition. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      (3) The font in most of the data from flow cytometry experiments (for example 1I) is not legible. Please increase the font size to make it legible.

      Font sizes have been increased.

      (4) Figure S2, PGAM expression was measured by Flow cytometry experiments. A similar experiment using western Blot, the direct measurement of protein expression, will strengthen the evidence.

      We appreciate this comment. As noted in the public reviews, Western blot would require sorting of viable cells, and unfortunately we do not currently have the resources to conduct additional experiments with FACS. However, we respectfully note that assessing protein expression via flow cytometry quantifies protein levels based on antibody binding, similar to Western blot (or in-cell Western blot), while also allowing gating on viable cells. We also note that nearly 100% of cultured CD4 cells took up ASO, as shown in revised Figure S2.

      (5) Figure 1J, it is mentioned in the text that 10 datasets were studied. a normalized parameter such as overexpression or suppression could be studied with the variance. It will be good to understand the variability in response among different datasets.

      We thank the reviewer for the opportunity to clarify this data. This data was taken from a single published dataset (Dykema et al., 2023, PMID 37713507) in which 10 distinct subsets of tumor-infiltrating Tregs (TIL-Tregs) were identified, rather than from 10 distinct datasets. After identifying the Activated (1)/OX40hiGITRhi cluster of TIL-Tregs as a highly suppressive subset that correlates with resistance to immune checkpoint blockade, Dykema et al. compared gene expression in this subset to the bulked collection of the other 9 subsets, and the data shown in Figure 1J is derived from this analysis. As such, the data in Figure 1J is, indeed, a normalized parameter of overexpression, showing overexpression of PGAM1 in this highly suppressive subset versus other subsets, out of proportion to proximal rate-limiting glycolytic enzymes. The main text and figure/figure legend have been edited to clarify this point.

      (6) It will be good to rephrase that the roles of PGAM and GAPDH are opposite, this paragraph is confusing since words such as "supporting Treg differentiation" and "augments Treg differentiation" have been used, although the data in S3 and 1D are opposite. Any possible explanation for the opposing roles of PGAM and GAPDH, despite their involvement in the same pathway of glycolysis, can be added to build up the interest of readers. What is the comparison of the expression of GAPDH and PGAM in Figure 1J?

      We thank the reviewer for this comment, as we appreciate that the language used in our initial manuscript was confusing. We have edited the main text, in both the Results and Discussion section, in order to clarify this point and provide explanation as suggested. Indeed, our experimental data indicate that GAPDH and PGAM play opposing roles in Treg differentiation; whereas inhibiting GAPDH activity leads to greater Treg differentiation (shown in revised Figure S4 and our previously published work), similarly inhibiting PGAM leads to diminished Treg differentiation. We view this point (that enzymes within the same glycolytic pathway can have divergent roles in T cells) as a primary implication of these findings, with the explanation that individual enzymes within the same pathway can differentially regulate the concentrations of key immunoactive metabolites. In our study, we identified 3PG as a key immunoactive metabolite whose concentration would be differentially impacted by GAPDH activity versus PGAM activity, since it lies downstream of GAPDH but upstream of PGAM.

      To provide further evidence for the opposing roles of GAPDH and PGAM, we analyzed existing datasets. In the revised Figure S5, we show that the PGAM1/GAPDH expression ratio increases in both human and mouse Tregs compared to other CD4 subsets.

      (7) Figure 2C, what is M+1, M+2 etc. Does it represent the number of hrs? If so, why are the results for 6 hrs are not shown since the study was for 6 hrs? And what is happening with M+2?

      We appreciate the opportunity to clarify this point and apologize for prior confusion. The terminology “M+n” refers to mass-shift produced by incorporation of 13-carbon. When a metabolite incorporates a single 13-carbon atom, it has a mass-shift of one (M+1), whereas incorporation of three 13-carbon atoms produces a mass-shift of three (M+3). Because we used uniformly 13-carbon labeled glucose, 3PG derived from the labeled glucose will have all three carbons labeled (M+3), as will serine that is newly synthesized from 3PG. Because serine can enter the downstream one-carbon cycle and be recycled, we also see the appearance of recycled serine with a single 13-carbon (M+1). The critical point in Figure 2C is that labeled serine is higher in Th17 versus Treg cells, demonstrating that de novo serine synthesis from glycolysis is greater. The main text has been edited to clarify this important point.

      (8) Including the quantification of inhibition and rescuing effect of EDCG and NCT will be helpful to readers.

      The inhibition and rescuing effects of these drugs are quantified in Figures 2D and 2E as they relate to Treg differentiation. The reviewer may be referring to quantification of relative effects on 3PG levels and serine synthesis. If so, we unfortunately do not have the resources to complete these studies, which would require large-scale quantitative mass spectrometry studies or enzyme activity assays.

      (9) Figure 2D and 2E: The authors could also experiment with a dose dependence curve on EGCG and NCT on this phenotype for Treg differentiation. That can help understand the balance between serine pathways and glycolysis pathways. Similarly, the dose dependence of 3PG for Figure 2E and comparing it to the kinetic constants of these enzymes involved and cellular concentrations, these details will be helpful to understand the metabolic dynamics, because this phenotype could be an interplay of both 3PG and serine concentrations.

      We appreciate this suggestion and agree that establishing detailed dose-dependence curves and relating these findings to enzyme kinetics would yield additional insights into the biochemical regulation provided by PGAM and PHGDH. Unfortunately we do not have the resources to pursue these additional studies, which therefore lie beyond the scope of our current work.

      (10) Figure 4: Explanation for no effect of methionine supplementation?

      Thank you for raising this point. We speculate that methionine supplementation had minimal effect because physiologic levels of serine were sufficient to provide basal substrates for the one-carbon cycle. On the other hand, eliminating methionine produced enough of a decrease in one-carbon metabolism to potentiate the effects of excess serine. This point is now briefly addressed in the text.

      (11) For direct connection between PGAM and methylation, methylation experiments could be worked out with NCT1 and SHIN1 (as in Figure 4H).

      We very much appreciate this suggestion, which we agree would provide a strong complementary approach. Unfortunately we do not have the resources to pursue these studies currently. However, we believe the increased methylation observed following PGAM knockdown (Figure 4G) as strong evidence that PGAM activity directly modulates methylation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to socalled sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from ”feedstock” pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      Strengths:

      High-quality theoretical modeling of an important problem is implemented.

      Weaknesses:

      The conclusions are somewhat convoluted and could be presented better.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling.

      We thank the Reviewer for pointing out that this important point was not clearly explained. The sequence errors observed in our model are indeed of the same nature as sequence scrambling previously identified by Chamanian and Higgs (Chamanian and Higgs, PLoS Comp Biol 2022). The core issue is the ligation of two oligomers representing non-adjacent segments of the genome sequence, leading to the formation of ”chimeric” products that are not part of the desired genome.

      Our analysis identifies the ligation of VCG oligomers (V+V reactions) as the primary mechanism driving sequence scrambling. This allowed us to propose two strategies to mitigate sequence scrambling: (i) tuning the length and concentration of the VCG oligomers, and (ii) considering scenarios where only feedstock monomers contribute to elongation (non-reactive VCG oligomers). We modified the Introduction and Results section of our manuscript to convey this connection more clearly.

      (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.

      We agree with the Reviewer and have followed the suggestion to introduce the two important length scales earlier in the manuscript (in the Model section of the main text). In the updated version, we refer to these length scales as the exhaustive coverage length L<sub>E</sub> (formerly LS1) and the unique subsequence length L<sub>U</sub> (formerly LS2). The exhaustive coverage length L<sub>E</sub> is defined as the maximum motif length for which all possible sequences of that length appear somewhere in the genome. In contrast, the unique subsequence length L<sub>U</sub> is the minimum motif length such that each subsequence of that length occurs only once in the genome, thus giving each motif a unique ”address”.

      Generally, a genome of length L<sub>G</sub> contains at most 2L<sub>G</sub> distinct subsequences, implying that L<sub>E</sub> can be at most , and L<sub>U</sub> must be at least , where ⌊...⌋ and ⌈...⌉ denote the next lower and higher integer, respectively. While the previous version of the manuscript focused exclusively on the limiting case L<sub>E</sub> \= L<sup>max</sup><sub>E</sub> and L<sub>U</sub> \= L<sup>min</sup><sub>U</sub> , we have extended our analysis to genomes with a broader range of L<sub>E</sub> and L<sub>U</sub> values the revised manuscript.

      This extended analysis reveals that, for accurate and efficient replication, the VCG oligomer length must always exceed L<sub>U</sub>, regardless of the choice of L<sub>E</sub>. The required margin beyond L<sub>U</sub> depends on the distribution of intermediate-length motifs (i.e., with L<sub>E</sub> < L < L<sub>U</sub>), but is typically only a few nucleotides.

      (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.

      We have integrated these new findings into the Results section of the main text and expanded the discussion of their implications for the prebiotic relevance of the VCG scenario in the Discussion section. Full methodological details are provided in the Supplementary Material (Sections S1 and S8).

      We thank the Reviewer for this insightful question. Our choice to assume specific length distributions for VCG oligomers is motivated by both conceptual and practical considerations. We explain our reasoning more clearly in the revised manuscript, in the beginning of the Model section of the main text.

      Conceptually, our study focuses on the propagation of sequence information by an already-formed VCG, rather than its emergence from a random pool. As discussed by Chamanian and Higgs, the spontaneous formation of a VCG from randomly interacting oligomers is a rare event. Our aim is to understand whether, once formed, such a structure can robustly replicate under prebiotic conditions. This question is best addressed when the genome and the oligomer pool (including their lengths and concentrations) can be systematically controlled.

      From a practical standpoint, working with a controllable pool of oligomers facilitates direct comparison to recent experimental studies that use predefined and well-characterized oligomer pools (Ding et al. JACS 2023). With our current methods and realistic rate constants, simulating the emergence of such pools from simple building blocks (e.g., monomers and dimers) would be computationally prohibitive, due to the low ligation rate. For example, in a system containing monomers (concentration 0.1mM) and octamers (concentration 1µM) in a volume of V = 3.3µm<sup>3</sup>, simulating the time between two ligation events takes over 300 hours of compute time (see SI Fig. S2). This renders dynamic pool generation unfeasible for the scope of our study.

      (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (∼ 10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L_0 < LS_1, by considering a higher ligation rate.

      Indeed, we assume that the ligation rate is smaller than both the hybridization and dehybridization rates for any oligomer typically included in the pool (up to length 10). In terms of effective length scales, this corresponds to L<sub>0</sub> ≈ 10nt, with L<sub>0</sub> defined as stated by the Reviewer, i.e., the hybridization length corresponding to a lifetime comparable to the ligation time. Most of our analysis actually exploits the small ligation rate, by employing an adiabatic approximation in which ligation is assumed to be slower than any hybridization or dehybridization process in the pool irrespective of oligomer length. As the Reviewer states, in this regime most hybridization events are transient, and will not result in ligation, since the complexes typically dissociate before ligation can occur.

      While we agree that this assumption limits the overall yield of replication, it has a beneficial effect on replication fidelity. Oligomers that hybridize with mismatches tend to unbind more quickly due to the destabilizing effect of mismatches. In the slow-ligation regime, such complexes are likely to dissociate before a ligation can occur, preventing the formation of incorrect products. In contrast, if the ligation rate was comparable to the unbinding rate of mismatched hybrids, these incorrect associations could undergo ligation, thereby lowering the fidelity of replication. We thus view the regime L<sub>0</sub> > L<sub>V</sub> as more favorable for studying the error-suppressing potential of the VCG mechanism, though we acknowledge that exploring the effects of faster ligation rates is an interesting question for future work.

      Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      • The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.

      • Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.

      • Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

      We thank the Reviewer for the careful assessment of our work and this concise summary of our main points.

      Weaknesses:

      • Most of the experimental work on the VCG has been performed with the bridged 2aminoimidazolium dinucleotides, which are not featured in the kinetic model of this work. Oher studies by Szostak and colleagues have demonstrated that non-enzymatic RNA extension with bridged dinucleotides have superior kinetics (Walton et al. JACS 2016, Li et al. JACS 2017), fidelity (Duzdevich et al. NAR 2021), and regioselectivity (Giurgiu et al. JACS 2017) compared to activated monomers, establishing the bridged dinucleotides as important for non-enzymatic RNA replication. Therefore, the omission of these species in the kinetic model presented here can be perceived as problematic. The major claim that avoidance of oligo ligations is beneficial for VCGs may be irrelevant if bridged dinucleotides are used as the extending species, because oligo ligations (V + V in this work) are kinetically orders of magnitude slower than monomer extensions (F + V in this work) (Ding et al. NAR 2022). Formally adding the bridged dinucleotides to the kinetic model is likely outside of the scope of this work, but perhaps the authors could test if this should be done in the future by simply increasing the rate of monomer extension (F + V) to match the bridged dinucleotide rate without changing rate of V + V ligation?

      We thank the Reviewer for this insightful comment. Indeed, we did not design our model to specifically describe the use of bridged 2-aminoimidazolium dinucleotides as feedstock for the VCG scenario. Adding the bridged dinucleotides to our model would require allowing for feedstock that effectively changes its length during the ligation reaction. As anticipated already by the Reviewer, this is outside the scope of our current modeling framework, which was chosen to explore the generic issue of sequence scrambling in the VCG scenario without distinguishing between different types of activation chemistries.

      Along the lines of the Reviewer’s suggestion, we clarified in the revised manuscript that we consider two limiting cases out of a family of models with two different ligation rate constants, k<sub>lig,1</sub> for ligations involving a monomer and k<sub>lig,>1</sub> for ligations involving no monomer, allowing for kinetic discrimination between these processes. We consider the two limiting cases where either k<sub>lig,1</sub> = k<sub>lig,>1</sub> or k<sub>lig,1</sub>/k<sub>lig,1</sub> → 0. The latter case, captures the behavior expected from an activation chemistry that enables fast primer extension but slow ligation, thereby suppressing sequence scrambling via V+V ligation events. The corresponding results, presented in Figure 6 and 7, indeed show that the VCG replication efficiency approaches 100% for pools that are rich in VCG oligomers.

      Our coarse-grained model, which does not explicitly describe the activation chemistry, was sufficient to capture important kinetic and thermodynamic constraints of the VCG scenario, and to qualitatively explain the experimental observation of a preferential extension of short over long VCG oligomers (Fig. 7B). For future work, we plan to extend our model to account for the activation chemistry in detail, to allow for a more quantitative comparison between theory and experiment.

      • The kinetic and thermodynamic parameters for oligo binding appear to be missing two potentially important components. First, base-paired RNA strands that contain gaps where an activated monomer or oligo can bind have been shown to display significantly different kinetics of ligation and binding/unbinding than complexes that do not contain such gaps (see Prywes et al. eLife 2016, Banerjee et al. Nature Nanotechnology 2023, and Todisco et al. JACS 2024). Would inclusion of such parameters alter the overall kinetic model?

      We thank the Reviewer for highlighting these recent studies. Todisco et al. (JACS 2024) report that complexes with gaps are well described by standard nearest-neighbor models, while stacking interactions at nick sites confer additional stability beyond these predictions. Our model is therefore expected to capture the thermodynamics of complexes with gaps accurately, but likely underestimates the stability of complexes containing nicks. In the VCG pool, all productive ligation complexes (F+F, F+V, V+V) inherently contain a nick and thus benefit from this stabilization, whereas unproductive complexes typically do not. The added stability is expected to increase the residence time of oligomers in productive complexes, thereby enhancing overall extension rates. However, since this stabilization applies uniformly across all productive complexes, it does not shift the relative contributions of different ligation pathways (in particular, correct vs. incorrect).

      This reasoning assumes that hybridization and dehybridization occur on timescales faster than ligation or primer extension. It is conceivable that this separation of timescales does not hold, particularly for oligomers binding to templates with gaps, where association is slower due to steric hindrance, while dissociation is further slowed by stabilizing nicks. As a result, the residence time of such complexes can become comparable to (or longer than) the ligation timescale. We now discuss this aspect more thoroughly in the revised Results and Discussion sections. Capturing the resulting effects in our analytical framework would require relaxing the adiabatic assumption, which is beyond the scope of this work. We recognize the relevance of the non-adiabatic regime of the dynamics, and hope to explore this regime in follow-up work.

      • Second, it has been shown that long base-paired RNA can tolerate mismatches to an extent that can result in monomer ligation to such mismatched duplexes (see Todisco et al. NAR 2024). Would inclusion of the parameters published in Todisco et al. NAR 2024 alter the kinetic model significantly?

      In contrast to complexes with nicks and gaps, mismatched complexes (Todisco et al. NAR 2024) will decrease replication fidelity relative to the results presented in our manuscript. Our current model assumes perfect base pairing, such that replication errors arise only from binding events involving regions too short to reliably identify the correct genomic position (sequence scrambling). Allowing mismatches will indeed introduce an additional error mechanism via imperfect yet sufficiently stable duplexes, thereby increasing the rate of incorrect extensions. However, we expect this effect to be limited. Due to the thermodynamic cost of internal loops, mismatched duplexes most often have their mismatches near the ends of the hybridized region, where their destabilizing effect is weakest (Todisco et al. NAR 2024). Terminal mismatches at the 3’end of the primer have been shown to reduce the primer-extension rate significantly via a stalling effect (Rajamani et al. JACS 2010, Leu et al. JACS 2013). Hence, we would expect errors due to mismatched duplexes to primarily occur for mismatches at the 5′ end. Such errors could be mitigated by a VCG pool consisting only of oligomers that are sufficiently long relative to the unique motif length of the virtual genome.

      We have extended the Discussion section to address this interesting issue.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      • ’(apostrophes) should be prime symbols instead of apostrophes

      We thank the Reviewer for spotting this mistake, which we have now corrected.

      • In the Introduction, the section that discusses the fidelity of enzyme-free copying should include a reference to Duzdevich et al. NAR 2021, as that work measured the fidelity experimentally.

      We have included this reference together with other references on the kinetics of hybridization/dehybridization to nicks and gaps in the main text.

      • The term feedstock oligomers may be problematic, because these also include monomers. In the ”Templated ligation” section of the Model, the statement ”We consider pools in which all oligomers are activated, as well as pools in which only monomers are activated” is imprecise. ”All oligomers, including monomers,...” would be better so as to avoid confusion in readers accustomed to standard RNA language.

      We thank the Reviewer for this helpful suggestion. In the revised manuscript, we now use the term feedstock (rather than feedstock oligomers) to avoid confusion. We have also revised the sentence in the ”Templated ligation” section to read ”all oligomers, including monomers, ...” as recommended.

      • The ”Experimentally determined association rate constants” reference 24-26, which measured the rate constants for DNA. Considering that the authors are modeling RNA, I wonder if Ashwood et al. Biophysical Journal 2023 contains any relevant RNA data that could help refine the model?

      We thank the Reviewer for pointing us to the study by Ashwood et al. We have added this reference to the corresponding paragraph in the revised manuscript. Their RNA association rate constant (∼ 5 × 10<sup>7</sup> M<sup>−1</sup> s<sup>−1</sup>) is larger than the one we used (∼ 1×10<sup>6</sup> M<sup>−1</sup> s<sup>−1</sup>), however a larger association rate is in fact beneficial for the validity of our adiabatic approximation, and thus would not affect our results, as long as the thermodynamic stability remains the same. This is because faster association then also implies faster dissociation, and the ratio of the ligation timescale to the timescales of (de)hybridization then becomes even smaller, which is the regime where the adiabatic approximation made in our analysis is justified.

      • In ”Triplexe softype 1—8 and 1—9...”,the word triplexes will confuse readers with RNA expertise as triplexe simply a triple-strandedRNA.

      We thank the Reviewer for pointing out the potentially ambiguous nomenclature. To avoid confusion with triplestranded RNA structures, we now refer to binary (ternary, ...) complexes instead of duplexes (triplexes, ...) throughout the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have given the wrong impression about SiNET6 classification (it was labeled in Fig. 4a in a misleading manner). In the revised manuscript, we corrected the labeling in Fig. 4a and clarified that SiNET6 is not assigned to any subtype. We also further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.     

      (Additional specific recommendations for the authors are provided below)

      (2) Results:

      Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We do not find evidence for similar progenitors in the SiNET samples, but they also do not contain two co-existing lineages of cancer cells within the same tumor, so this is harder to define. We agree about the need for additional validation for this specific finding and have noted that in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Methods:

      a) Could the team clarify the discrepancy in subtype assignment between two samples from the same patient? i.e. are these samples from the same tumor? If so, what does the team think is the explanation for the difference in subtype assignment?

      As noted above in response to the public review of reviewer #1, SiNET6 was in fact not assigned to any subtype (due to insufficient NE cells) and hence there was no discrepancy. We apologize for the misleading labeling of SiNET6 in the previous version and have corrected this In the revised version of Figure 4.

      b) What is the rationale for scoring tumor-derived programs on samples with no tumor cells? For instance, SiNET3 does not contain NE cells, and SiNET9 has a very low fraction of NE cells. Please clarify how the scoring was performed on these samples, as the program assignments may be driven by other cell types in samples with little to no NE cells.

      Scoring for tumor-derived programs was done only for the NE cells. Accordingly, SiNET3 was not scored or assigned to any of the programs. SINET9 was included in this analysis - although it had a relatively small fraction of NE cells, the absolute number of profiled cells was particularly high in this sample and therefore the number of NE cells was 130, higher than our cutoff of 100 cells.

      c) Given the heterogeneity of cell types within each sample, would there be a way to provide a refined sense of confidence for certain cell type annotations? This would be helpful given the heterogeneity in marker gene expression and the absence of gold-standard markers for fibroblasts and endothelial cells in this cancer type. Additionally, there seems to be an unusually large proportion of NK and T cells - was there selection for this (given that these tumors are largely not immune infiltrated)?

      Author Response: Except for the Neuroendocrine cells, there are six TME cell types that we consistently find in multiple SiNET samples: macrophages, T cells, B/plasma cells, fibroblasts, endothelial and epithelial cells. Each of these cell types are identified as discrete clusters in analysis of the respective tumors (as shown in Fig. 1a,b and Fig. S1), and these are exactly the six most common non-malignant cell types that we and others found in single cell analysis across various other tumor types (e.g. see Gavish et al. 2023, ref. #15). The signatures used to annotate these cell types are shown in Table S2, and they primarily consist of classical markers that are traditionally used to define those cell types. We therefore believe that the annotation of these typical tumor-associated cell types is robust and does not include major uncertainties. In addition to these five common cell types, there are three cell types that we find only in 1-2 of the samples – epithelial cells, plasma cells and NK cells. Again, we believe that their annotation is robust, and these cell types are primarily not used for further analysis.

      There was no selection for any specific cell types in this study. Nevertheless, single cell (or single nuclei) analysis may lead to biases towards specific cell types, that we cannot evaluate directly from the data. NK cells were detected only in one tumor. T cells were detected in eight of the ten samples; but in four of those samples the frequency of T cells was lower than 5% and only in one sample the frequency was above 20%. Therefore, while we cannot exclude a technical bias towards high frequency of T/NK cells, we do not consider these frequencies as high enough to suggest this specific type of bias. In the revised manuscript, we clarify that the commonly observed cell types in SiNETs are the same as those commonly observed in other tumors and we acknowledge the possibility of a technical bias in cell type capture.  

      d) Evaluating the expression of one gene at a time may not effectively demonstrate subtype-specific patterns, particularly when comparing NE cells from one tumor to non-NE cells from another, which may not be an appropriate approach for identifying differentially expressed genes. DE analysis coupled with concordance analysis, for example, could strengthen the results.

      We apologize, but we do not fully understand this comment. We note that the initial normalization by non-NE cells was done in order to decrease batch effects when combining the data of the two platforms. We also note that the two subtypes were identified by two distinct approaches, as shown in Fig. 2c and in Fig. 2f.

      (2) Results:

      See the above public review.

      (3) Minor Comments:

      a) Results: Single cell and single nuclei RNA-seq profiling of SiNETs

      The results say ten primary tumor samples from eight patients. Later in the paragraph it says, "After initial quality controls, we retained 29,198 cells from the ten patients." Please clarify to either ten samples or eight patients.

      Indeed these are ten samples rather than ten patients. We corrected that in the revised version and thank the reviewer for noticing our error.

      b) Methods:

      - Please specify which computational tools were used to perform quality control, signature scoring, etc.

      The approaches for quality control, scoring etc. are described in the methods. We implemented these approaches with R code and did not use other computational tools.

      - Minor point but be consistent with naming convention (ie, siAdeno vs SiAdeno) throughout the paper. For example, under "Sample Normalization, Filtering and annotations" change "siAdeno" to "SiAdeno."

      Thank you for noting this, we corrected that.

      - Add processing and analysis of MiNEN sample to the methods section. It is not mentioned in the methods at all.

      As noted in the revised manuscript, the MiNEN sample was analyzed in the same way as the SiNET fresh samples.

      c) Supplementary Figures:

      Figure S1: Change (A-H) to (A-I) to account for all panels in the figure.

      Figure S4: Add (C) after "the siAdeno sample" in the legend.

      Thank you for noting this, we corrected that.

      (4) Font size is quite small in the main figures.

      We enlarged the font in selected figure panels.

      Reviewer #2 (Recommendations for the authors):

      (1) The small number of samples used in some analyses affects the robustness of the findings. Increasing the sample size or including more validation data could improve the statistical reliability and make the results more convincing. The authors should consider expanding the cohort size or integrating additional external datasets to increase statistical power.

      We agree with the reviewer that adding more samples would improve the reliability of the results. However, the external data that we found was not comparable enough to enable integration with our data, and we are unable to profile additional SiNET samples in our lab. We hope that future studies would support our results and extend them further.

      (2) The biological significance of differentially expressed genes needs more depth, limiting the insights into SiNET biology. The authors should perform a comprehensive pathway enrichment analysis and integrate findings with existing literature. Tools like Gene Set Enrichment Analysis (GSEA) or Overrepresentation Analysis (ORA) could provide a more holistic view of altered biological processes.

      We thank the reviewer for this suggestion. We did examine the functional enrichment of differentially expressed genes and did not find additional enrichments that we felt were important to highlight beyond what we described. We report the genes in supplementary tables, enabling other researchers to examine these lists further. 

      (3) The unexpected finding of higher proliferation in non-malignant cells requires further investigation and plausible biological explanation. The authors should perform additional analyses to explore potential mechanisms, such as investigating cell cycle regulators or performing in vitro validation experiments. The authors should consider single-cell trajectory analysis to explore these highly proliferative non-malignant cells' potential differentiation or activation states.

      We agree that our results are descriptive and that we do not fully explain the mechanism for the high level of non-malignant cell proliferation. We did attempt to perform follow up computational analysis. These analyses raised the hypothesis that high levels of MIF are causing the proliferation of immune cells. Additional analyses that we performed were not sufficient to conclusively identify a mechanism, and we felt that they were not informative enough to be included in the manuscript. Further in vitro (or in vivo) studies are beyond the scope of the current work.

      (3) More details are required on methods used for p-value adjustment, and criteria for statistical significance should be clearly defined. Additionally, integrating scRNA-seq and snRNA-seq data needs a more thorough explanation, including batch effect mitigation and more explicit cell clustering representation. The authors should clearly describe p-value adjustments (e.g., FDR) and batch correction methods (e.g., Harmony, FastMNN integration) and include additional figures showing corrected UMAP plots or heatmaps post-batch correction to enhance the confidence in results.

      We now clarify in the Methods our use of FDR for p-value adjustments. As for batch correction, we have avoided the use of integration methods as we believe that they tend to distort the data and decrease tumor-specific signals. Instead, we primarily analyzed one tumor at a time and never directly compared cell profiles across distinct tumors but only compared the differences between subpopulations; specifically, we normalized the expression of NE cells by subtracting the expression of reference non-NE cells from the same tumor as a method to decrease batch effects. We now clarify this point in the Methods section.

      (4) The lack of analysis of interactions between different cell types limits understanding of tumor microenvironment dynamics. The authors should employ cell-cell interaction analysis tools (e.g., CellPhoneDB, NicheNet) to explore potential communication networks within the tumor microenvironment. This could provide valuable insights into how different cell types influence tumor progression and maintenance.

      We thank the reviewer for this suggestion. We have tried to use such methods but found the results difficult to interpret since these approaches generated very long lists of potential cell-cell interactions that are largely not unique to the SiNET context and their relevance remains unclear without follow up experiments, which are beyond the scope of this work. We therefore focused only on ligand/receptors that came up robustly through specific analyses such as the differences between SiNET subtypes. In particular, MIF is highly expressed in the epithelial subtype, and remarkably, MIF upregulation is shared across multiple cell types. Thus, the cell-cell interactions that are suggested by the SiNET data as somewhat unique to this context are those involving MIF and its receptor (CD74 on immune cell types), while other interactions detected by the proposed methods primarily reflect the generic ligand/receptors expressed by corresponding TME cell types.   

      Reviewer #3 (Recommendations for the authors):

      (1) For a relatively small dataset, the mixing of single-cell versus single-nucleus RNA-seq should be discussed more. It would be nice to have 1-2 tumors that are analyzed by both methods to compare and increase our understanding of how these different approaches may affect the results. This could be accomplished by splitting a fresh tumor into two parts, processing it fresh for single-cell RNA-seq, and freezing the other part for single-nucleus RNA-seq.

      We agree with the reviewer that the different techniques may bias our results and we refer to this limitation in the Results and Discussion sections. However, it is important to note that we do not directly integrate the primary data across these modalities, but rather analyze each tumor separately and only combine the results across tumors. For example, we first compare the NE cells from each tumor to control non-NE cells from the same tumor and then only compare the sets of NE-specific genes across tumors. Moreover, the subtypes that we detect cannot be explained by these modalities, as the first subtype contains samples from both methods and these subtypes are further demonstrated in external bulk data. Similarly, the results regarding low proliferation of NE cells and high proliferation of B/plasma cells are observed across both modalities. We therefore argue that while the combination of methods is a limitation of this work it does not account for the main results.  

      (2) The authors state that they defined the siNET transcriptomic signature by comparing their siNET single-cell/nucleus data to other NETs profiled by bulk RNA-seq. Some of the genes in the signature, such as CHGA, are widely used as markers for NETs (and not specific for siNET). The authors should address this in more detail.

      To define the SiNET transcriptomic signature we first analyzed each tumor separately and compared the expression of Neuroendocrine (NE) cells to that of non-NE cells to detect NE-specific genes. Next, we compared the lists of NE-specific genes across the 8 SiNET patients and found a subset of 26 genes which were shared across most of the analyzed SiNET samples (Fig. 2a). Thus, the signature was defined only from analysis of SiNETs and not based on comparison to other types of NETs and hence it is expected that the signature could contain both SiNET-specific genes and more generic NET genes such as CHGA.

      Only after defining this signature, we went on to compare it between SiNETs and other types of NETs (pancreatic and rectal) based on external bulk RNA-seq data. In this comparison, we observed that the signature was clearly higher in SiNETs than in the other NETs (Fig. 2b). This result supports the accuracy of the signature and further suggests that it contains a fraction of SiNET-specific genes and not only generic NET genes such as CHGA. Thus, we would expect this signature to perform well also for distinguishing between SiNET and types of NETs, but it does contain a subset of genes that would be high in the other NETs. Finally, we note that even though CHGA is a generic NET marker, the bulk RNA-seq data would suggest that, at least at the mRNA level, this gene is still higher expressed in SiNETs than in other NETs. To avoid confusion regarding the definition and specificity of the SiNET transcriptomic signature we have extended the description of this section in the revised manuscript.

      (3) The authors only compare their data to bulk transcriptomic data on NETs. While in some instances this makes sense given the bulk dataset has >80 tumors, they should at least cite and do some comparison to other published single-cell RNA-seq datasets of NETs (e.g., PMID: 37756410, 34671197). The former study listed has 3 siNETs, 4 pNETs, and 1 gNET. Do the epithelial-like and neuronal-like signatures show up in this dataset too?

      We examined these studies but concluded that their data was inadequate to identify the two SiNET subtypes. The latter study was of pNETs, while the former study had 3 SiNET samples but only from 2 patients, and furthermore it was enriching for immune cells with only very low amounts of NE cells. Therefore, we now cite this work in the discussion but cannot use it to extend the results from our work.

      (4) How did the authors statistically handle patients with more than one tumor sample (true for n = 2)? These tumor samples would not be truly independent.

      In both cases where we had two distinct samples of the same patient, only one sample had sufficient NE cells to be included in NE-related analysis and therefore the other samples (SiNET3 and SiNET6) were excluded from all analysis of NE differential expression and subtypes. These samples were only included in the initial analysis (Fig. 1) and in TME-related analysis (Fig. 3-4) in which there was no statistical analysis of differences between patients and hence no problem with the inclusion of 2 samples for the same patient. We clarified this issue in the revised version.

      (5) The association between siNET subtype and B/plasma cell proliferation is very interesting, as is the hypothesis regarding MIF signaling. It would be illuminating for the authors to perform cell-cell interaction analyses with methods such as CellChat in this context rather than just relying on DE. Spatial mapping would be helpful too and while this may be outside the scope of this study, it should at least be expounded upon in the Discussion section.

      Indeed, spatial transcriptomic analysis would add interesting insight to our data and to SiNET biology. Unfortunately, this is not within the scope of the current project but we note this interesting possibility in the Discussion. Regarding additional methods for cell-cell interactions, we have performed such analysis but found it not informative as it highlighted a large number of interactions that are not unique SiNETs and are difficult to interpret, and therefore we do not include this in the revised version. 

      (6) The authors note that in the mixed lung tumor, the NE component was more proliferative than that observed with siNETs. How does the proliferation compare to pNETs, gNETs, in other published studies? How about assessing the clonality of the SCC and LNET malignant cells with various genomic or combined genomic/transcriptomic methods?

      The percentage of proliferating NE cells in the mixed lung tumor was higher than 60%. This is extremely high, approximately four-fold higher than the average that we found in a pan-cancer analysis and higher than the average of any of the >20 cancer types that we analyzed (Gavish et al. 2023, ref. #15). This remarkably high proliferation serves as a control for the low proliferation that we found in SiNET NE cells.

      (7) In the Discussion on page 13, the authors write "Second, proliferation of NE cells may be inhibited by prior treatments with somatostatin analogues." How many patients were treated in this manner? This information should be made more explicit in the manuscript.

      Details on pretreatment with somatostatin analogues are provided in Table S1. All patients were pre-pretreated with somatostatin analogues, with the possible exception of one patient (P8, SiNET10) for which we could not confidently obtain this information.

      (8) On page 5, "bone-fide" is misspelled.

      (9) On page 8, "exact identify" is misspelled.

      We thank the reviewer and have corrected the typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors provide a study among healthy individuals, general medical patients and patients receiving haematopoietic cell transplants (HCT) to study the gut microbiome through shotgun metagenomic sequencing of stool samples. The first two groups were sampled once, while the patients receiving HCT were sampled longitudinally. A range of metadata (including current and previous (up to 1 year before sampling) antibiotic use) was recorded for all sampled individuals. The authors then performed shotgun metagenomic sequencing (using the Illumina platform) and performed bioinformatic analyses on these data to determine the composition and diversity of the gut microbiota and the antibiotic resistance genes therein. The authors conclude, on the basis of these analyses, that some antibiotics had a large impact on gut microbiota diversity, and could select opportunistic pathogens and/or antibiotic resistance genes in the gut microbiota.

      Strengths:

      The major strength of this study is the considerable achievement of performing this observational study in a large cohort of individuals. Studies into the impact of antibiotic therapy on the gut microbiota are difficult to organise, perform and interpret, and this work follows state-of-the-art methodologies to achieve its goals. The authors have achieved their objectives and the conclusion they draw on the impact of different antibiotics and their impact on the gut microbiota and its antibiotic resistance genes (the 'resistome', in short), are supported by the data presented in this work.

      Weaknesses:

      The weaknesses are the lack of information on the different resistance genes that have been identified and which could have been supplied as Supplementary Data.

      We have now supplied a list of individual resistance genes as supplementary data.

      In addition, no attempt is made to assess whether the identified resistance genes are associated with mobile genetic elements and/or (opportunistic) pathogens in the gut. While this is challenging with short-read data, alternative approaches like long-read metagenomics, Hi-C and/or culture-based profiling of bacterial communities could have been employed to further strengthen this work.

      We agree this is a limitation, and we now refer to this in the discussion. Unfortunately we did not have funding to perform additional profiling of the samples that would have provided more information about the genetic context of the AMR genes identified.

      Unfortunately, the authors have not attempted to perform corrections for multiple testing because many antibiotic exposures were correlated.

      The reviewer is correct that we did not perform formal correction for multiple testing. This was because correlation between antimicrobial exposures meant we could not determine what correction would be appropriate and not overly conservative. We now describe this more clearly in the statistical analysis section.

      Impact:

      The work may impact policies on the use of antibiotics, as those drugs that have major impacts on the diversity of the gut microbiota and select for antibiotic resistance genes in the gut are better avoided. However, the primary rationale for antibiotic therapy will remain the clinical effectiveness of antimicrobial drugs, and the impact on the gut microbiota and resistome will be secondary to these considerations.

      We agree that the primary consideration guiding antimicrobial therapy will usually be clinical effectiveness. However antimicrobial stewardship to minimise microbiome disruption and AMR selection is an increasingly important consideration, particularly as choices can often be made between different antibiotics that are likely to be equally clinically effective.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript by Peto et al., the authors describe the impact of different antimicrobials on gut microbiota in a prospective observational study of 225 participants (healthy volunteers, inpatients and outpatients). Both cross-sectional data (all participants) and longitudinal data (a subset of 79 haematopoietic cell transplant patients) were used. Using metagenomic sequencing, they estimated the impact of antibiotic exposure on gut microbiota composition and resistance genes. In their models, the authors aim to correct for potential confounders (e.g. demographics, non-antimicrobial exposures and physiological abnormalities), and for differences in the recency and total duration of antibiotic exposure. I consider these comprehensive models an important strength of this observational study. Yet, the underlying assumptions of such models may have impacted the study findings (detailed below). Other strengths include the presence of both cross-sectional and longitudinal exposure data and the presence of both healthy volunteers and patients. Together, these observational findings expand on previous studies (both observational and RCTs) describing the impact of antimicrobials on gut microbiota.

      Weaknesses:

      (1) The main weaknesses result from the observational design. This hampers causal interpretation and corrects for potential confounding necessary. The authors have used comprehensive models to correct for potential confounders and for differences between participants in duration of antibiotic exposure and time between exposure and sample collection. I wonder if some of the choices made by the authors did affect these findings. For example, the authors did not include travel in the final model, but travel (most importantly, south Asia) may result in the acquisition of AMR genes [Worby et al., Lancet Microbe 2023; PMID 37716364). Moreover, non-antimicrobial drugs (such as proton pump inhibitors) were not included but these have a well-known impact on gut microbiota and might be linked with exposure to antimicrobial drugs. Residual confounding may underlie some of the unexplained discrepancies between the cross-sectional and longitudinal data (e.g. for vancomycin).

      We agree that the observational design means there is the potential for confounding, which, as the reviewer notes, we attempt to account for as far as possible in the multivariable models presented. We cannot exclude the possibility of residual confounding, and we highlight this as a limitation in the  discussion. We have expanded on this limitation, and mention it as a possible explanation for inconsistencies between longitudinal and cross sectional models. Conducting randomised trials to assess the impacts of multiple antimicrobials in sick, hospitalised patients would be exceptionally difficult, and so it is hard to avoid reliance on observational data in these settings.

      We did record participants’ foreign travel and diet, but these exposures were not included in our models as they were not independently associated with an impact on the microbiome and their inclusion did not materially affect other estimates. However, because most participants were recruited from a healthcare setting, few had recent foreign travel and so this study was not well powered to assess the effects of travel on AMR carriage. We have added this as a limitation.

      In addition, the authors found a disruption half-life of 6 days to be the best fit based on Shannon diversity. If I'm understanding correctly, this results in a near-zero modelled exposure of a 14-day-course after 70 days (purple line; Supplementary Figure 2). However, it has been described that microbiota composition and resistome (not Shannon diversity!) remain altered for longer periods of time after (certain) antibiotic exposures (e.g. Anthony et al., Cell Reports 2022; PMID 35417701). The authors did not assess whether extending the disruption half-life would alter their conclusions.

      The reviewer is correct that the best fit disruption half-life of 6 days means the model assumes near-zero exposure by 70 days. We appreciate that antimicrobials can cause longer-term disruption than is represented in our model, and we refer to this in the discussion (we had cited two papers supporting this, and we are grateful for the additional reference above, which we have added). We agree that it is useful to clarify that the longer term effects may be seen in individual components of the microbiome or AMR genes, but not in overall measures of diversity, so have added this to the discussion.

      (2) Another consequence of the observational design of this study is the relatively small number of participants available for some comparisons (e.g. oral clindamycin was only used by 6 participants). Care should be taken when drawing any conclusions from such small numbers.

      We agree. Although our participants received a large number of different antimicrobial exposures, these were dependent on routine clinical practice at our centre and we lack data on many potentially important exposures. We had mentioned this in relation to antimicrobials not used at our centre, and have now clarified in the discussion that this also limits reliability of estimates for antimicrobials that were rarely used in study participants.

      (3) The authors assessed log-transformed relative abundances of specific bacteria after subsampling to 3.5 million reads. While I agree that some kind of data transformation is probably preferable, these methods do not address the compositional data of microbiome data and using a pseudocount (10-6) is necessary for absent (i.e. undetected) taxa [Gloor et al., Front Microbiol 2017; PMID 29187837]. Given the centrality of these relative abundances to their conclusions, a sensitivity analysis using compositionally-aware methods (such as a centred log-ratio (clr) transformation) would have added robustness to their findings.

      We agree that using a pseudocount is necessary for undetected taxa, which we have done assuming undetected taxa had an abundance of 10<sup>-6</sup> (based on the lower limit of detection at the depth we sequenced). We refer to this as truncation in the methods section, but for clarity we have now also described this as a pseudocount.  Because our analysis focusses on major taxa that are almost ubiquitous in the human gut microbiome, a pseudocount was only used for 3 samples that had no detectable Enterobacteriaciae.

      We are aware that compositionally-aware methods are often used with microbiome data, and for some analyses these are necessary to avoid introducing spurious correlations. However the flaws in non-compositional analyses outlined in Gloor et al do not affect the analyses in this paper:

      (1) The problems related to differing sequence depths or inadequate normalisation do not apply to our dataset, as we took a random subset of 3.5 million reads from all samples (Gloor et al correctly point out that this method has the drawback of losing some information, but it avoids problems related to variable sequencing depth)

      (2) The remainder Gloor et al critiques multivariate analyses that assess correlations between multiple microbiome measurements made on the same sample, starting with a dissimilarity matrix. With compositional data these can lead to spurious correlations, as measurements on an individual sample are not independent of other measurements made on the same sample. In contrast, our analyses do not use a dissimilarity matrix, but evaluate the association of multiple non-microbiome covariates (e.g. antibiotic exposures, age) with single microbiome measures. We use a separate model for each of 11 specified microbiome components, and display these results side-by side. This does not lead to the same problem of spurious correlation as analyses of dissimilarity matrices. However, it does mean that estimates of effects on each taxa outcome have to be interpreted in the context of estimates on the other taxa. Specifically, in our models, the associations of antimicrobial exposure with different taxa/AMR genes are not necessarily independent of each other (e.g. if an antimicrobial eradicated only one taxon then it would be associated with an increase in others). This is not a spurious correlation, and makes intuitive sense when using relative abundance as outcome. However, we agree this should be made more explicit.

      For these reasons, at this stage we would prefer not to increase the complexity of the manuscript by adding a sensitivity analysis.

      (4) An overall description of gut microbiota composition and resistome of the included participants is missing. This makes it difficult to compare the current study population to other studies. In addition, for correct interpretation of the findings, it would have been helpful if the reasons for hospital visits of the general medical patients were provided.

      We have added a summary of microbiome and resistome composition in the results section and new supplementary table 2), and we also now include microbiome and resistome profiles of all samples in the supplementary data. We also provide some more detail about the types of general medical patients included. We are not able to provide a breakdown of the initial reason for admission as this was not collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Provide a supplementary table with information on the abundance of individual genes in the samples.

      This supplementary data is now included.

      (2) Engage with an expert in statistics to discuss how statistical analyses can be improved.

      A experienced biostatistician has been involved in this study since its conception, and was involved in planning the analysis and in the responses to these comments.

      (3) Typos and other minor corrections:

      Methods: it is my understanding that litre should be abbreviated with a lowercase l.

      Different journals have different house styles: we are happy to follow Editorial guidance.

      p. 9: abuindance should be corrected to abundance.

      Corrected

      p. 9: relative species should be relevant species?  

      Yes, corrected. Thank you.

      p. 9 - 10: can the apparent lack of effect of beta-lactams on beta-lactamase gene abundance be explained by the focus on a small number of beta-lactamase resistance genes that are found in Enterobacteriaceae and which are not particularly prevalent, while other classes of resistance genes (e.g. Bacteroidal beta-lactamases) were excluded?

      It is possible that including other beta-lactamases would have led to different results, but as a small number of beta-lactamases in Enterobacteriaceae are of major clinical importance we decided to focus on these (already justified in the Methods). A full list of AMR genes identified is now provided in the supplementary data.

      p. 10: beta-lactamse should be beta-lactamase

      Corrected

      Figure 3A: could the data shown for tetracycline resistance genes be skewed by tetQ, which is probably one of the most abundant resistance genes in the human gut and acts through ribosome protection?

      TetQ was included, but only accounted for 23% of reads assigned to tetracycline resistance genes so is unlikely to have skewed the overall result. We limited the analysis to a few major categories of AMR genes and, other than VanA, have avoided presenting results for single genes to limit the degree of multiple testing. We now include the resistome profile for each sample in the supplementary data so that readers can explore the data if desired.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the importance of obligate anaerobic gut microbiota for human health, it might be interesting to divide antibiotics into categories based on their anti-anaerobic activity and assess whether these antibiotics differ in their effects on gut microbiota.

      The large majority of antibiotics used in clinical practice have activity against aerobic bacteria and anaerobic bacteria, so it is not possible to easily categorise them this way. There are two main exceptions (metronidazole and aminoglycosides) but there was insufficient use of these drugs to clearly detect or rule out a difference between them, even when categorising antimicrobials by class, so we prefer not to frame the results in these terms. Also see our comments on this categorisation below.

      (2) For estimating the abundance of anaerobic bacteria, three major groups were assessed: Bacteroidetes, Actinobacteria and Clostridia. To me, this seems a bit aspecific. For example, the phylum Bacteroidetes contains some aerobic bacteria (e.g. Flavobacteriia). Would it be possible to provide a more accurate estimation of anaerobic bacteria?

      We think that an emphasis on a binary aerobic/anaerobic classification is less biologically meaningful that the more granular genetic classification we use, and its use largely reflects the previous reliance on culture-based methods for bacterial identification. Although some important opportunistic human pathogens are aerobic, it is not clear that the benefit or harm of most gut commensals relates to their oxygen tolerance, and all luminal bacteria exist in an anaerobic environment. As such we prefer not to perform an additional analysis using this category. We are also not sure that this could be done reliably, as many of the taxa are characterised poorly, or not at all.

      We appreciate that Bacteroidetes, Actinobacteria and Clostridia are diverse taxa that include many different species, so may seem non-specific, but these were chosen because:

      i) they are non-overlapping with Enterobacteriaceae and Enterococcus, the major opportunistic pathogens of clinical relevance, so could be used in parallel, and

      ii) they make up the large majority of the gut microbiome in most people and most species are of low pathogenicity, so it is plausible that their disruption might drive colonisation with more pathogenic organisms (or those carrying important AMR genes).

      We have more clearly stated this rationale.

      (3) A statement on the availability of data and code for analysis is missing. I would highly recommend public sharing of raw sequence data and R code for analysis. If possible, it would be very valuable if processed microbiome data and patient metadata could be shared.

      We agree, and these have been submitted as supplementary data. We have added the following statement “The data and code used to produce this manuscript are available in the supplementary material, including processed microbiome data, and pseudonymised patient metadata. The sequence data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB86785.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Cao et al. provides a compelling investigation into the role of mutational input in the rapid evolution of pesticide resistance, focusing on the two-spotted spider mite's response to the recent introduction of the acaricide cyetpyrafen. This well-documented introduction of the pesticide - and thus a clearly defined history of selection - offers a powerful framework for studying the temporal dynamics of rapid adaptation. The authors combine resistance phenotyping across multiple populations, extensive resequencing to track the frequency of resistance alleles, and genomic analyses of selection in both contemporary and historical samples. These approaches are further complemented by laboratory-based experimental evolution, which serves as a baseline for understanding the genetic architecture of resistance across mite populations in China. Their analyses identify two key resistance-associated genes, sdhB and sdhD, within which they detect 15 mutations in wild-collected samples. Protein modeling reveals that these mutations cluster around the pesticide's binding site, suggesting a direct functional role in resistance. The authors further examine signatures of selective sweeps and their distribution across populations to infer the mechanisms - such as de novo mutation or gene flow-driving the spread of resistance, a crucial consideration for predicting evolutionary responses to extreme selection pressure. Overall, this is a well-rounded, thoughtfully designed, and well-written manuscript. It shows significant novelty, as it is relatively rare to integrate broad-scale evolutionary inference from natural populations with experimentally informed bioassays, however, some aspects of the methods and discussion have an opportunity to be clarified and strengthened.

      Strengths:

      One of the most compelling aspects of this study is its integration of genomic time-series data in natural populations with controlled experimental evolution. By coupling genome sequencing of resistant field populations with laboratory selection experiments, the authors tease apart the individual effects of resistance alleles along with regions of the genome where selection is expected to occur, and compare that to the observed frequency in the wild populations over space and time. Their temporal data clearly demonstrates the pace at which evolution can occur in response to extreme selection. This type of approach is a powerful roadmap for the rest of the field of rapid adaptation.

      The study effectively links specific genetic changes to resistance phenotypes. The identification of sdhB and sdhD mutations as major drivers of cyetpyrafen resistance is well-supported by allele frequency shifts in both field and experimental populations. The scope of their sampling clearly facilitated the remarkable number of observed mutations within these target genes, and the authors provide a careful discussion of the likelihood of these mutations from de novo or standing variation. Furthermore, the discovered cross-resistance that these mutations confer to other mitochondrial complex II inhibitors highlights the potential for broader resistance management and evolution.

      Weaknesses:

      (1) Experimental Evolution:

      - Additional information about the lab experimental evolution would be useful in the main text. Specifically, the dose of cyetpyrafen used should be clarified, especially with respect to the LD50 values. How does it compare to recommended field doses? This is expected to influence the architecture of resistance evolution. What was the sample size? This will help readers contextualize how the experimental design could influence the role of standing variation.

      The experimental design involved sampling approximately 6,000 individuals from the wild population ZJSX1, which were subsequently divided into two parallel cohorts under controlled laboratory conditions. The selection group (LabR) was subjected to continuous selection pressure using cyetpyrafen, while the control group (LabS) was maintained under identical laboratory conditions without exposure to acyetpyrafen. A dynamic selection regime was implemented wherein the acaricide dosage was systematically adjusted every two generations to maintain a consistent selection intensity, achieving a mortality rate of 60% ± 10% in the LabR population. This adaptive dosage strategy ensured sustained evolutionary pressure while preventing population collapse. The LC<sub>50</sub> values were tested at F1, F32, F54, F60, F62, and F66 generations using standardized bioassay protocols to quantify resistance development trajectories and optimize dosage for subsequent selection cycles. We provided the additional information in subsection 4.1 of the materials and methods section.

      - The finding that lab-evolved strains show cross-resistance is interesting, but potentially complicates the story. It would help to know more about the other mitochondrial complex II inhibitors used across China and their impact on adaptive dynamics at these loci, particularly regarding pre-existing resistance alleles. For example, a comparison of usage data from 2013, 2017, and 2019 could help explain whether cyetpyrafen was the main driver of resistance or if previous pesticides played a role. What happened in 2020 that caused such rapid evolution 3 years after launch?

      Although the introduction of the other two SDHI acaricides complicates the story, we would like to provide a complete background on the usage of acaricides with this mode of action in China. Although cyflumetofen was released in 2013 before cyetpyrafen, and cyenopyrafen was released in 2019 after cyetpyrafen, their market share is minor (about 3.2%) compared to cyetpyrafen (about 96.8%, personal communication). Since cross-resistance is reported among SDHIs, we could not exclude the contribution of cyflumetofen to the initial accumulation of resistance alleles, but the effect should be minor, both because of their minimal market share and because of the independent evolution of resistance in the field as found in our study. Although the contribution of cyflumetofen and cyenopyrafen cannot be entirely excluded, the rapid evolution of resistance seems likely to be mainly explained by the intensive application of cyetpyrafen. To clarify this issue, we added relevant information in the first paragraph of the discussion section.

      (2) Evolutionary history of resistance alleles:

      - It would be beneficial to examine the population structure of the sampled populations, especially regarding the role of migration. Though resistance evolution appears to have had minimal impact on genome-wide diversity (as shown in Supplementary Figure 2), could admixture be influencing the results? An explicit multivariate regression framework could help to understand factors influencing diversity across populations, as right now much is left to the readers' visual acuity.

      The genetic structure of the populations was examined by Treemix analysis. We detected only one migration event from JXNC to SHPD (no resistance data available for these two populations), suggesting a limited role for migration to resistance evolution. The multiple regression analysis revealed that overall genetic diversity and Tajima’s D across the genome were not significantly associated with resistance levels, genetic structure or geographic coordinates (P > 0.05), which all support a limited role of migration in resistance development.

      - It is unclear why lab populations were included in the migration/treemix analysis. We might suggest redoing the analysis without including the laboratory populations to reveal biologically plausible patterns of resistance evolution.

      Thank you for the constructive suggestion. The Treemix analysis was redone by removing laboratory populations and is now reported.

      - Can the authors explore isolation by distance (IBD) in the frequency of resistance alleles?

      Thank you for the constructive suggestion. No significant isolation-by-distance pattern was detected for resistance allele frequencies across all surveyed years (2020: P=0.73; 2021: P=0.52; 2023: P=0.16; Mantel test). We added these results to the text.

      - Given the claim regarding the novelty of the number of pesticide resistance mutations, it is important to acknowledge the evolution of resistance to all pesticides (antibiotics, herbicides, etc.). ALS-inhibiting herbicides have driven remarkable repeatability across species based on numerous SNPs within the target gene.

      We appreciate this comment, which highlights the need to place our findings within the broader evolutionary context of pesticide resistance. We have investigated references relevant to the evolution of resistance to diverse pesticides. As far as we can tell, the 15 target mutations in eight amino acid residues are among the highest number of pesticide resistance mutations detected, especially within the context of animal studies. We have added relevant text to the second paragraph of the discussion.

      - Figure 5 A-B. Why not run a multivariate regression with status at each resistance mutation encoded as a separate predictor? It is interesting that focusing on the predominant mutation gives the strongest r2, but it is somewhat unintuitive and masks some interesting variation among populations.

      We conducted a multiple regression analysis to explore the influence of multiple mutations on resistance levels of field populations. However of 15 putative resistant mutations, only five were detected in more than three populations where bioassay data are available, i.e. I260T, I260V, D116G, R119C, R119L. The frequency of three of these mutations, I260T (P = 0.00128), I260V (P = 0.00423) and D116G (P = 0.00058), are significantly correlated with the resistance level of field populations. This has been added.

      (3) Haplotype Reconstruction (Line 271-):

      - We are a bit sceptical of the methods taken to reconstruct these haplotypes. It seems as though the authors did so with Sanger sequencing (this should be mentioned in the text), focusing only on homozygous SNPs. How many such SNPs were used to reconstruct haplotypes, along what length of sequence? For how many individuals were haplotypes reconstructed? Nonetheless, I appreciated that the authors looked into the extent to which the reconstructed haplotypes could be driven by recombination. Can the authors elaborate on the calculations in line 296? Is that the census population size estimate or effective?

      Because haplotypes could not be determined when more than two loci were heterozygous, we detected haplotypes from sequencing data with at most one heterozygous locus. In total 844 individuals and 696 individuals were used to detect haplotypes of sdhB and sdhD. We detected 11 haplotypes (with 8 SNPs) and 24 haplotypes (with 11 SNPs) along 216 bp of the sdhB and 155 bp of the sdhD genes, respectively. Please see the fifth paragraph of subsection 2.4. We used ρ = 4 × Ne × d (genetic distance) (Li and Stephens, 2003) to calculate the number of effective individuals for one recombination event.

      (4) Single Mutations and Their Effect (line 312-):

      - It's not entirely clear how the breeding scheme resulted in near-isogenic lines. Could the authors provide a clearer explanation of the process and its biological implications?

      To investigate the effect of single mutations or their combination on resistance levels, we isolated the females and males with the same homozygous/ hemizygous genotypes for creating homozygous lines. Females from these lines were not near-isogenic, but homozygous for the critical mutations. We revised the description in the methods section to clearly define these lines.

      - If they are indeed isogenic, it's interesting that individual resistance mutations have effects on resistance that vary considerably among lines. Could the authors run a multivariate analysis including all potential resistance SNPs to account for interactions between them? Given the variable effects of the D116G substitution (ranging from 4-25%), could polygenic or epistatic factors be influencing the evolution of resistance?

      We couldn’t conduct multivariate analysis because most lines have only one resistant SNP. The four lines homozygous for 116G were from the same population. The variable mortality may reflect other unknown mechanisms but these are beyond the scope of this study.

      - Why are there some populations that segregate for resistance mutations but have no survival to pesticides (i.e., the green points in Figure 5)? Some discussion of this heterogeneity seems required in the absence of validation of the effects of these particular mutations. Could it be dominance playing a role, or do the authors have some other explanation?

      We didn’t investigate the degree of dominance of each mutation. The mutation I260V shows incompletely dominant inheritance (Sun, et al. 2022). To investigate survival rate of different populations, the two-spotted spider mite T. urticae was exposed to 1000 mg/L of cyetpyrafen, higher than the recommended field dose of 100 mg/L. Such a high concentration may lead to death of an individual heterozygous for certain mutations, such as I260V.

      - The authors mention that all resistance mutations co-localized to the Q-site. Is this where the pesticide binds? This seems like an important point to follow their argument for these being resistance-related.

      Yes. We revised Fig. 3c to show the Q-site.

      (5) Statistical Considerations for Allele Frequency Changes (Figure 3):

      - It might be helpful to use a logistic regression model to assess the rate of allele frequency changes and determine the strength of selection acting on these alleles (e.g., Kreiner et al. 2022; Patel et al. 2024). This approach could refine the interpretation of selection dynamics over time.

      Thank you for this suggestion. A logistic regression model was used to track allele frequencies trajectories. The selection coefficient of each allele and their joint effects were estimated.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the evolution of pesticide resistance in the two-spotted spider mite following the introduction of an SDHI acaricide, cyatpyrafen, in China. The authors make use of cyatpyrafen-naive populations collected before that pesticide was first used, as well as more recent populations (both sensitive and resistant) to conduct comparative population genomics. They report 15 different mutations in the insecticide target site from resistant populations, many reported here for the first time, and look at the mutation and selection processes underlying the evolution of resistance, through GWAS, haplotype mapping, and testing for loss of diversity indicating selective sweeps. None of the target site mutations found in resistant populations was found in pre-exposure populations, suggesting that the mutations may have arisen de novo rather than being present as standing variation, unless initially present at very low frequencies; a de novo origin is also supported by evidence of selective sweeps in some resistant populations. Furthermore, there is no significant evidence of migration of resistant genotypes between the sampled field populations, indicating multiple origins of common mutations. Overall, this indicates a very high mutation rate and a wide range of mutational pathways to resistance for this target site in this pest species. The series of population genomic analyses carried out here, in addition to the evolutionary processes that appear to underlie resistance development in this case, could have implications for the study of resistance evolution more widely.

      Strengths:

      This paper combines phenotypic characterisation with extensive comparative population genomics, made possible by the availability of multiple population samples (each with hundreds of individuals) collected before as well as after the introduction of the pesticide cyatpyrafen, as well as lab-evolved lines. This results in findings of mutation and selection processes that can be related back to the pesticide resistance trait of concern. Large numbers of mites were tested phenotypically to show the levels of resistance present, and the authors also made near-isogenic lines to confirm the phenotypic effects of key mutations. The population genomic analyses consider a range of alternative hypotheses, including mutations arising by de novo mutation or selection from standing genetic variation, and mutations in different populations arising independently or arriving by migration. The claim that mutations most likley arose by multiple repeated de novo mutations is therefore supported by multiple lines of evidence: the direct evidence of none of the mutations being found in over 2000 individuals from naive populations, and the indirect evidence from population genomics showing evidence of selective sweeps but not of significant migration between the sampled populations.

      Weaknesses:

      As acknowledged within the discussion, whilst evidence supports a de novo origin of the resistance-associated mutations, this cannot be proven definitively as mutations may have been present at a very low frequency and therefore not found within the tested pesticide-naive population samples.

      We agree that we could not definitively exclude the presence of a very low incidence of favoured mutations before the introduction of this novel acaricide.

      Near-isofemale lines were made to confirm the resistance levels associated with five of the 15 mutations, but otherwise, the genotype-phenotype associations are correlative, as confirmation by functional genetics was beyond the scope of this study.

      We hope that future functional studies will validate the effects of these mutations on resistance in both the two-spotted spider mite T. urticae and other spider mite species. This could be done by creating near-isogenic female lines or using CRISPR-Cas9 technology, as gene knockouts have recently been established for T. urticae.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Could the authors elaborate on the environmental context (e.g., climate, geography) of the sampled populations to give more nuance to the analysis of genetic differentiation and resistance evolution?

      We have explored the influence of geographic isolation on the frequency of resistance alleles by Mantel tests (isolation by distance). We didn’t investigate the influence of climate, because most of the samples were from greenhouses, where the climate to which the pest is exposed is unclear.

      (2) Line 161: is this supposed to be one R and one S?

      Yes, we added this information (LabR and LabS).

      (3) Line 207: variation is not saturated at the first two sites because the different combinations are not seen. This is a bit misleading.

      What we wanted to indicate was that the two codon positions are saturated, rather than their combinations. We revised this sentence by adding “of each codon position”.

      (4) Line 376: continuous selection did not "result in a new mutation arising". Rather, the mutation arose and was subsequently selected on.

      We revised the expression of this de novo mutation and selection process.

      (5) Line 402: can the authors explore what Ne would be necessary to drive the number of mutational origins they observe, as in (Karasov et al. 2010)?

      It is challenged to estimate Ne, especially when mutation rate data from the two-spotted spider mite T. urticae is unavailable. We observed 2.7 resistant mutations per population in samples collected in 2024, seven years after the release of cyetpyrafen. The estimated mutation rate (Θ) is  0.0193, given 20 generations per year for T. urticae. An effective population size (Ne) of 2.29*10<sup>6</sup> would be necessary to reach the number of de novo mutations observed in this study, given Θ  =  3Neμ (haplodiploid sex determination of T. urticae) and a mutation rate of μ  =  2.8*10<sup>-9</sup> per base pair per generation as estimated for Drosophila melanogaster (Keightley et al., 2014). The high reproductive capacity of T. urticae (> 100 eggs per female) and short generation time makes it easier to reach such a population size in the field as we now note.

      (6) Line 482: how did the authors precisely kill 60% of samples with their selection? What was the applied rate? In general, listing the rates of insecticide used in dose response would be useful to decipher if LD50s are projected outside of the doses used (seems like they are). In this case, authors should limit their estimates to those > the highest rate used in the dose response.

      It is difficult to control mortality precisely. We applied cyetpyrafen every two generations but did not determine the LC<sub>50</sub> every two generations. When mortality was lower than 60%, another round of spraying was applied by increasing the dosage of the pesticide. The LC<sub>50</sub> values were tested at F<sub>1</sub>, F<sub>32</sub>, F<sub>54</sub>, F<sub>60</sub>, F<sub>62</sub>, and F<sub>66</sub> generations to establish the trajectories around resistance.

      (7) The light pink genomic region in Figure 2 was distracting. Why is it included if there is no discussion of genomic regions outside the sdh genes? Generally, there was a lot going on in this figure, and some guiding categories (i.e., lab selected vs wild population) on the figure itself could help orient the reader.

      We included chromosome 2 colored in light pink/ red to show the selection signal across a wider genomic region. In the figure legend, we added a description of the lab selected, field resistant and field susceptible populations. Very little common selection signal was detected among resistant populations on chromosome 2, indicating this region was less likely to be involved in resistance evolution of T. urticae to cyetpyrafen. We also described the result briefly in the figure legend.

      Reviewer #2 (Recommendations for the authors):

      (1) The most significant aspect of this study is the use of multiple pest population samples taken before as well as after the introduction of a class of pesticides, allowing a thorough comparative population genomics study in a species where a range of resistance mutations have appeared within a few years. I would prefer to see a title conveying this significance, rather than the current study, which focuses on the total number of mutations and claimed notoriety of the (at that point unnamed) study species. Similarly, I would prefer an abstract that relies less on superlative claims and includes more details: the scientific name of the study species; the number of years in which resistance evolved; the number of historical specimens; how the resistance levels for single mutations were shown.

      (1) The title was changed by adding “the two-spotted spider mite Tetranychus urticae” and removing the “unprecedented number” to emphasize that “recurrent mutations drive rapid evolution”, i.e., “Recurrent Mutations Drive the Rapid Evolution of Pesticide Resistance in the Two-spotted Spider Mite Tetranychus urticae.”

      (2) The scientific name of the study species was added.

      (3) The number of years in which resistance evolved was added.

      (4) The number of historical specimens was added (2666).

      (5) Because we used homozygous lines but not iso-genic lines or gene-edited lines, our bioassay data could not provide direct evidence on the level of resistance conferred by each mutation. We revised our description of the results and removed this content from the abstract.

      Line 29: if you want to claim the number is unprecedented, please specify the context: unprecedented for a pesticide target in an arthropod pest? (more resistance mutations may have been found in bacteria/fungi...).

      We revised the sentence by adding “in an arthropod pest”.

      Line 30: rather than a claim of notoriety, it may be better to specify what damage this pest causes.

      Revised by describing it as an arthropod pest.

      Line 34: please clarify, was this all in different haplotypes, or were some mutations found in combination?

      Done: We identified 15 target mutations, including six mutations on five amino acid residues of subunit sdhB, and nine mutations on three amino acid residues of subunit sdhD, with as many as five substitutions on one residue.

      (2) The introduction begins by framing the context as resistance evolution in invertebrate pests. However, the evolutionary processes examined in the study are applicable to resistance in other systems, and potentially to other cases of rapid contemporary evolution. The authors could show wider significance for their work beyond the subfield of invertebrate pests by including more of this wider context in their introduction and discussion: even if this means they can no longer claim novelty based on the number of mutations alone, the study is a strong example of the use of population genomics combined with functional and phenotypic characterisation to investigate the evolutionary processes underlying the emergence of resistance, so could have wider importance than within its current framing.

      The background was revised as mentioned above to take this into account.

      For example, in lines 48-50, please clarify what is meant by pesticides here (insects/arthropods? weeds and pathogens too?) In lines 69-73, the opposite is sometimes seen in fungal pathogens, with large numbers of mutations generated in lab-evolved strains.

      We extended pesticides to those targeting arthropods, weeds and pathogens. We still emphasize the situation mainly with respect to arthropod pests.

      (3) Lines 91-93: how many modes of action? How recently were SDHI acaricides introduced?

      Added: at least 11 groups of acaricides based on their modes of action. SDHI was launched in 2007.

      (4) Line 98-102: Use in China is a useful background for the study populations, but the global context should be included too.

      Yes, four SDHI acaricides developed around the globe were introduced.

      (5) Line 113: They show diverse mutations, but all within the mechanism of target-site point mutations.

      We agree to your suggestion. This sentence has been removed as it repeats information stated above it.

      (6) Line 115-116: Yes, agreed; I think this is the main strength of the current study and should be emphasised sooner.

      Thanks.

      (7) Line 158: Selective sweep signals were clear in half of the resistant populations but not in the others. The suggestion that the others had undergine soft sweeps, with multiple mutations increasing in frequency simultaneously but no one reaching fixation, seems reasonable; but the authors could compare the populations that did show a sweep with those that did not (for example, was there greater diversity or evenness of genotypes in those that did not?).

      Five resistant populations with selection signals identified by PBE analysis (Figure 2b) showed corresponding decreases in π and Tajima’s D near the two SDH genes but not across the genome (Figure S1).

      (8) Line 313: please clarify "in combination with other mutations" within a mixed population or combined in one individual/haplotype? Also, the phrase "characterised the function" may be a little misleading, as this is a correlative analysis, not functional confirmation.

      None of the combinations of different resistant mutations was observed in a single haplotype. Here, we examine resistance levels associated with a single mutation or two mutations on sdhB and sdhD in one individual, i.e. sdhB_I260V and sdhD_R119C. We revised the sentences to avoid any implication of functional confirmation.

      (9) Line 358: again, please clarify the context: among arthropod pests?

      Done.

      (10) Line 360-363: please give some background on when and where these related compounds were introduced.

      Added.

      (11) Line 410: yes fitness costs may be a factor, but you could also give an example of a cost expressed in the absence of any pesticides, as well as the given example of negative cross-resistance.

      We added the example of the H258Y mutation which causes both fitness costs and negative cross-resistance.

      (12) Lines 419-438: this is one aspect where the situation for insecticides is in contrast with some other resistance areas.

      Yes, we restricted these statements to arthropod pests.

      (13) Line 466: some more detail could be given here: for example, SNP-specific monitoring would be less effective, but amplicon sequencing would be more suitable.

      Yes, revised.

      (14) Lines 472-475: Please list the numbers of field/lab, pre/post exposure, and sensitive/resistant populations within the main text.

      Done. The number of sensitive/resistant populations was reported in the result section.

      (15) Line 483: randomly selected individuals?

      Yes, added randomly selected individuals.

      (16) Line 556: Sanger sequencing to characterise populations? Or a number of individuals from each population?

      Revised.

      (17) References: there are some duplicate entries, please check this.

      Checked.

      (18) Figure 1e: consider a log(10) scale to better show large fold changes and avoid multiple axis breaks.

      Thanks for your suggestions. However we didn’t scale the LC<sub>50</sub> value, because we wanted to show the specific impact of 1,000 mg/L. The breaks in the Y axis around 30 mg/L -1,000 mg/L reveal that the LC50s of the resistant populations were all greater than 1000 mg/L, while those of the susceptible populations were all below 30 mg/L. This justified the use 1000 mg/L as a discriminating dose to investigate resistance status and level in subsequent work.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for the GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Comments on latest version:

      The authors have attempted to address my initial concerns with additional experiments and refutations. Unfortunately, my concerns, especially my specific comments 1-3, remain unaddressed. The present manuscript is descriptive and fails to describe the molecular mechanism by which Sakura exerts its function in the germline. Nevertheless, this reviewer acknowledges that the observed defects in sakura mutant ovaries and the possible physiological significance of the Sakura-Out interaction are worth sharing with the research community, as they may lay the groundwork for future research in functional analysis.

      We thank the reviewer for valuable comments. We would like to investigate the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (named it sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. In this revised manuscript, the authors further investigated whether Sakura affects the function of Orb, a binding partner they identified, in deubiquitinase activity when Orb interacts with Bam.

      We appreciate the authors' efforts to address all our comments. While these revisions have greatly improved the clarity of certain sections, some of the concerns remain unclear, while details mentioned in the responses about these studies should be incorporated in the manuscript. Specifically, the manuscript still lacks the demonstration that Sakura co-localizes with Orb/Bam despite having the means for staining and visualization. This would bring insight into the selective binding of Orb with Bam vs. Sakura perhaps at different stages of oogenesis. Such analyses would allow for more specific conclusions, further alluding to the underlying mechanism, rather than the general observations currently presented.

      This elaborate study will be embraced by both germline-focused scientists and the developmental biology community.

      We thank the reviewer for valuable comments. We believe that the author meant Otu, not Orb, for the binding partner of Sakura that we identified. We would like to investigate the colocalization of Sakura with other proteins including Otu and the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field.

      Comments on latest version:

      With these revisions, the authors have addressed my main concerns.

      We thank the reviewer for valuable comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript is much improved based on the changes made upon recommendations from the reviewers.

      Though most of our comments have been addressed, we have a few more we wish to recommend. For previous points we made, we replied with further clarification for the authors.

      Figure 1

      (1) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      • Previous Fig1B (sakura mRNA expression level) is now Fig S2, not S1. Please make this data as Fig S1.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      (2) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      • The labels on lanes for Stages 12-13 and Stage 14, still only say "chambers", not "egg chambers". Also there is no Stage 1-3 egg chamber. More accurately, the label should be "Germarium - Stage 11 egg chambers".

      We updated the lables on lanes as suggested by the reviewer.

      (3) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakuranull phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      • Please put this info into the Methods section.

      We added this info into the Methods section.

      (4) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      • Please add this detail into the manuscript.

      We added this info into the Methods section.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer's point. We think using numbers, not %, makes more sense.

      • Having a different 'n' number for each experiment does not allow one to compare anything except numbers of the egg chambers. This must be normalized.

      We still don’t agree with the reviewer. In Fig 5D, we are showing the numbers of stage 14 oocytes per fly (= per a pair of ovaries). ‘n’ is the number of flies (= number of a pair of ovaries) examined. We now clarified this in the figure legend. Different ‘n’ number does not prevent us from comparing the numbers of stage 14 oocytes per fly. Therefore, we would like to show as it is now.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      • Please add this information to the manuscript.

      We added this info into the Methods section.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      • Current Fig S1 should go to Fig 7, to better understand the relationship between pMad and Bam expression.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      Figure 9C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      • Please add this info to the Methods section.

      We added this info into the Methods section.

      Figure 10- Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer's points. In our study, even for the full-length proteins. We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      • Based on your binding studies, we would expect them to colocalize in the egg chamber, and since there are antibodies and a GFP-line available, it would be important to demonstrate that via visualization.

      As we wrote in the response and now in the manuscript, our antibodies are not best for immunostaining. We will try to optimize the experimental conditions in the future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      There are four main areas that need further clarification:

      (1) Further and more complete assessment of senescence and the fibroblasts must be done to support the claims. 

      We sincerely appreciate the Reviewing Editor's valuable suggestion regarding the addition of cellular senescence detection markers. In the revised manuscript, we have incorporated additional detection markers for cellular senescence, such as H3K9me3 and SA-β-gal staining, in healthy and periodontitis gingival samples to further validate our findings (Figure 1A, B in revised manuscripts).

      (2) Confusion between ageing and senescence throughout the manuscript.

      We fully understand the concerns raised by the Reviewing Editor and reviewers regarding the confusion between the concepts of ageing and senescence in the manuscript. Cellular senescence is a manifestation of ageing at the cellular level. In the revised manuscript, we have given priority to the term ‘senescence’ to describe the cell condition instead of ‘aging’.

      (3) The lipid metabolism mechanistic claims are very speculative and largely unsupported by experimental data. 

      We greatly appreciate the Reviewing Editor and reviewers for pointing out the incorrect statements regarding the role of lipid metabolism in regulating cellular senescence. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (4) Concerns about the use of Metformin as a senotherapy vs other pleiotropic effects in periodontitis and the suggestion of using an alternative Senolytic drug (Bcl2 inhibitors, etc.). 

      We fully understand the concerns of the Reviewing Editor and reviewers regarding metformin as an anti-aging therapy. In the revised manuscript, we have included additional experiments using other senolytic drugs ABT-263, a Bcl2 inhibitor, in the ligature-induced periodontitis mouse model. The corresponding results could be found in the Figure 6. and Page 9-10, Line 248-264 in the revised manuscripts.

      Reviewer #1 (Recommendations For The Authors):

      While most of the experiments are elegantly designed and the procedures well conducted there are several critical weaknesses that temper my enthusiasm for this solid and timely work. Considering my main points, I would recommend the following:

      (1) Potentiate the senescent assessment in vitro and, most importantly, in vivo. E.g. SABgal with fresh tissue, other senescent biomarkers like SAHFs (HP1g or H3K9me3), etc.

      We sincerely appreciate the reviewers' suggestion to potentiate the assessment of cellular senescence. In the revised manuscript, we performed SA-β-gal staining on fresh frozen samples, revealing a significantly higher number of SA-β-gal positive cells in the gingival tissue of periodontitis, particularly in the lamina propria, while few SA-β-gal positive cells were observed in healthy gingival tissue (Figure. 1A). Additionally, we assessed the protein level changes of H3K9me3, a marker of senescence-associated heterochromatin foci (SAHF), in gingival tissues from healthy individuals and periodontitis patients. The results showed a notable increase in the number of H3K9me3 positive cells in periodontitis tissues, approximately double that found in healthy gingiva (Figure. 1B). This trend aligns with our previous findings of elevated p16 and p21 levels. Collectively, these results further confirm that periodontitis gingival tissues contain a greater number of senescent cells compared to healthy gingiva.  

      (2) Claims on disturbances in lipid metabolism as a driver of CD81+ fibroblast senescence require appropriate functional/mechanistic validations and experiments of metabolism rewiring.

      We sincerely appreciate the reviewers' suggestion for more experimental evidence regarding the role of lipid metabolism in driving CD81+ fibroblast senescence. The influence and mechanisms of lipid metabolism on cellular senescence is a complex and important scientific issue, and it is not the central focus of this manuscript. Therefore, to avoid causing confusion for the reviewers and readers, we have removed the metabolism analysis in the Figure 4-figure supplement 1 and revised the presentation of the relevant results in the revised manuscript to ensure a more rigorous interpretation of our findings (Page 7-8, Line 186-194). 

      (3) Do LPS-stimulated HGFS implementing the senescent programme secrete C3? Detection of complement C3 at the protein level (e.g. by ELISA) would reinforce the proposed mechanism.

      This is indeed a very interesting question. In response to the reviewers' suggestion, we measured the levels of C3 protein secreted by human gingival fibroblasts induced by Pg-LPS, which is one of the markers of the senescence-associated secretory phenotype (SASP). The results indicated that, compared to untreated fibroblasts, those induced by Pg-LPS exhibited significantly higher levels of C3 secretion, approximately 1.5 times that of the control group (Figure. 5G). Additionally, we also found that primary gingival fibroblasts derived from periodontitis tissues secreted more complement C3 compared to those derived from healthy tissues (Figure. 5F). These findings suggest that the increased secretion of complement C3 by gingival fibroblasts in periodontitis tissues may be related to Pg-LPS-induced cellular senescence.

      (4) The mechanism of Metformin to impair senescence and/or the SASP is not fully validated and Metformin can produce other pleiotropic effects. A key experiment (including therapeutic implications) is using a senolytic drug (e.g. Navitoclax) to causally connect the eradication of senescent CD81+ fibroblasts with the recruitment of neutrophils. If the hypothesis of the authors is correct this approach should result in reduced levels of gingival CD81 and C3 positivity, prevention of neutrophils infiltration (reduced MPO positivity), and ameliorate bone damage in ligationinduced periodontitis murine models.

      We fully understand the reviewers' concerns regarding the role of metformin in alleviating cellular senescence and the possibility of it acting through non-senescent pathways. To clarify the role of cellular senescence in the recruitment of neutrophils by CD81+ fibroblasts through C3 in periodontitis, we treated a ligature-induced periodontitis mouse model with ABT-263, also known as Navitoclax. The results showed that after ABT-263 treatment, the number of p16-positive or H3K9me3-positive senescent cells in the periodontitis mice significantly decreased. Additionally, we observed reductions in the quantities of CD81+ fibroblasts, C3 protein levels, neutrophil infiltration, and osteoclasts to varying degrees in the LIP model after ABT263 treatment (Figure. 6). These results further support our hypothesis that the eradication of senescent CD81+ fibroblasts could reduce neutrophil infiltration and alveolar bone resorption. 

      (5) Have the authors considered using any of the available C3/C3aR inhibitors to validate the involvement of neutrophils and the inflammatory response in periodontitis? A C3/C3aR inhibitor would be an elegant treatment group in parallel with the senolytic approach.

      Thank you very much for the reviewers' suggestion to investigate neutrophil infiltration and inflammatory responses after treating periodontitis with C3/C3aR inhibitors. In a clinical study by Hasturk et al. in 2021 (Reference 1), it was found that using the C3 inhibitor AMY-101 effectively alleviated gingival inflammation levels in periodontitis patients. This was reflected in significant decreases in clinical indicators such as the modified gingival index and bleeding on probing, as well as a marked reduction in inflammatory tissue destruction markers, including MMP-8 and MMP-9. In addition, Tomoki Maekawa et al. (Reference 2) demonstrated that a peptide inhibitor of complement C3 effectively reduced inflammation levels and the extent of bone resorption in periodontitis. Moreover, research by Guglietta et al. (Reference 3) clarified that the C3 complement promotes neutrophil recruitment and the formation of neutrophil extracellular traps (NETs) via C3aR. And neutrophil extracellular traps are considered key pathological factors in causing sustained chronic inflammation in periodontitis (References 4 and 5). In summary, existing studies have clearly indicated that C3/C3aR inhibitors likely reduce neutrophil recruitment and inflammation in periodontitis. 

      Reference

      (1) Hasturk, H., Hajishengallis, G., Forsyth Institute Center for Clinical and Translational Research staff, Lambris, J. D., Mastellos, D. C., & Yancopoulou, D. (2021). Phase IIa clinical trial of complement C3 inhibitor AMY-101 in adults with periodontal inflammation. The Journal of clinical investigation, 131(23), e152973.

      (2) Maekawa, T., Briones, R. A., Resuello, R. R., Tuplano, J. V., Hajishengallis, E., Kajikawa, T., Koutsogiannaki, S., Garcia, C. A., Ricklin, D., Lambris, J. D., & Hajishengallis, G. (2016). Inhibition of pre-existing natural periodontitis in non-human primates by a locally administered peptide inhibitor of complement C3. Journal of clinical periodontology, 43(3), 238–249.

      (3) Guglietta, S., Chiavelli, A., Zagato, E., Krieg, C., Gandini, S., Ravenda, P. S., Bazolli, B., Lu, B., Penna, G., & Rescigno, M. (2016). Coagulation induced by C3aR-dependent NETosis drives protumorigenic neutrophils during small intestinal tumorigenesis. Nature communications, 7, 11037.

      (4) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (5) Silva, L. M., Doyle, A. D., Greenwell-Wild, T., Dutzan, N., Tran, C. L., Abusleme, L., Juang, L. J., Leung, J., Chun, E. M., Lum, A. G., Agler, C. S., Zuazo, C. E., Sibree, M., Jani, P., Kram, V., 6 Martin, D., Moss, K., Lionakis, M. S., Castellino, F. J., Kastrup, C. J., … Moutsopoulos, N. M. (2021). Fibrin is a critical regulator of neutrophil effector function at the oral mucosal barrier. Science (New York, N.Y.), 374(6575), eabl5450.

      Other comments

      (1) Figure 1. The authors report upregulation of the aging pathway in bulk RNAseq analyses. What about the upregulation of senescence-related pathways and differential expression of SASP-related genes in this experiment?

      Thanks for this interesting question. Through further analysis of the bulk RNA sequencing results of gingival tissues from LIP mice model, we found significant alterations in multiple senescence-associated secretory phenotype (SASP) genes and several cellular senescencerelated pathways. SASP genes, such as Icam1, Mmp3, Nos3, Igfbp7, Igfbp4, Mmp14, Timp1, Ngf, Il6, Areg, and Vegfa, were markedly upregulated in the periodontitis samples of ligature-induced mice (Figure 1-figure supplement 2A). Moreover, we observed a significant reduction in oxidative phosphorylation levels and the tricarboxylic acid (TCA) cycle in the periodontitis group, suggesting that the occurrence of cellular senescence may be related to mitochondrial dysfunction (Figure 1figure supplement 2B and C.).

      Additionally, we noted the activation of the PI3K-AKT and MAPK pathways in LIP model (Figure 1-figure supplement 2D and E), both of which can induce cellular senescence by activating the tumor suppressor pathway TP53/CDKN1A, leading to cell cycle arrest (References 1, 2). Furthermore, the NF-κB signaling pathway was also significantly enriched in LIP model (Figure 1-figure supplement 2F), which is closely associated with the secretion of SASP factors (Reference 3).

      In summary, our bulk RNA sequencing results suggest enrichment of cellular senescencerelated pathways in the periodontitis group, including mitochondrial metabolic dysregulation, senescence-related pathways, and alterations in the SASP. Related results were added into Page 56 of the revised manuscripts.

      Reference

      (1) Tang Q, Markby GR, MacNair AJ, Tang K, Tkacz M, Parys M, Phadwal K, MacRae VE, Corcoran BM. TGF-β-induced PI3K/AKT/mTOR pathway controls myofibroblast differentiation and secretory phenotype of valvular interstitial cells through the modulation of cellular senescence in a naturally occurring in vitro canine model of myxomatous mitral valve disease. Cell Prolif. 2023 Jun;56(6):e13435. doi: 10.1111/cpr.13435.

      (2) Sayegh S, Fantecelle CH, Laphanuwat P, Subramanian P, Rustin MHA, Gomes DCO, Akbar AN, Chambers ES. Vitamin D3 inhibits p38 MAPK and senescence-associated inflammatory mediator secretion by senescent fibroblasts that impacts immune responses during ageing. Aging Cell. 2024 Apr;23(4):e14093.

      (3) Raynard C, Ma X, Huna A, Tessier N, Massemin A, Zhu K, Flaman JM, Moulin F, Goehrig D, Medard JJ, Vindrieux D, Treilleux I, Hernandez-Vargas H, Ducreux S, Martin N, Bernard D. NF-κB-dependent secretome of senescent cells can trigger neuroendocrine transdifferentiation of breast cancer cells. Aging Cell. 2022 Jul;21(7):e13632.

      (2) I wonder whether the authors could clarify how the semi quantifications for p21, p16, Masson's trichrome, C3, or MPO were done in Figures 1, 2, and 6.

      Thank you very much for the reviewer's suggestion. We have added the semi-quantitative methods for p21, p16, Masson's trichrome, C3, and MPO in the Methods section. Specifically, for semi-quantification of protein expressions, the mean optical density (MOD) of positive stains for p21, p16, and C3 was measured using the ImageJ2 software (version 2.14.0, National Institutes of Health, Bethesda, MD). The number of MPO-positive cells and collagen volume fractions (stained blue) for individual sections were also measured using the ImageJ2 software. (Page 19, Line 537-541 in the revised manuscripts).  

      (3) Figure 2. It is unclear whether N=6 refers to 6 mice, maxilla, or fields per group.

      Thank you very much for the reviewer's question. To avoid any misunderstandings for the reviewer and readers, we have added a definition of the sample size in the description of the micro-CT analysis method. Specifically, in the micro-CT quantitative analysis, the sample size n for each group consists of 6 mice, with the average value of the BV/TV of the bilateral maxillary alveolar bone taken as one sample for statistical analysis (Page 17-18, Line 488-490 in the revised manuscripts).  

      (4)  igure 4K. Please provide separated staining for p16, VIM, and CD81, and not only the Merge. It is difficult to identify the triple-positive cells. Also, the arrows are difficult to observe.

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are indicated with white arrows (Figure 5-figure supplement 1). 

      (5) Overall, improve the magnifications in the IF experiments and show where the magnified areas come from.

      Thank you very much for the reviewer's suggestion. We have enlarged the fluorescence result images.

      (6) Refer to the original datasets of the scRNAseq results in figure legends.

      Thank you very much for the reviewer's suggestion. We have indicated the source of the raw single-cell sequencing data in the figure legend.

      (7) Check English grammar and writing.

      Thank you for the reviewer's suggestion. We checked the grammar and writing in the revised manuscript assisted by a native English speaker and AI tools like Chat-GPT.

      Reviewer #2 (Recommendations For The Authors):

      (1) When the authors refer to accelerated aging and/or senescence, they are doing so in comparison to what?

      Thank you for the reviewer's question, which allows me to further clarify the concepts of accelerated aging and/or senescence. In sections 2.1 and Figure 1 of this manuscript, we referred to accelerated aging and/or senescence. This indicates that the gingival tissues of periodontitis patients exhibit a higher number of senescent cells and elevated levels of senescence-related markers compared to healthy gingival tissues. In the title of this manuscript, we describe CD81+ fibroblasts as a unique subpopulation with accelerated cellular senescence. This means that CD81+ fibroblasts display higher expression levels of senescence-related genes, cell cycle inhibitor p16, and SASP factors compared to other fibroblast subpopulations. To avoid any misunderstanding, we have deleted the text ‘accelerated senescence’ in the revised manuscripts. 

      (2) In general, the main text does not describe the results using exact and reproducible terminology. Phrases like "X was most active", "a significant increase was observed", "the highest proportion was", and "the level of aging increased" should be supported by adding quantification details and by detailing what these comparisons are made to, to improve the reproducibility of the results.

      Thank you for the reviewer's suggestion. To improve the reproducibility of the results, we have added quantification details in the results section and clarified what comparisons are being made through the whole manuscript.

      (3) In some sections of the main text and figure legends, it is not entirely clear which sequencing experiments were conducted by the authors, which analyses were conducted by the authors on publicly available sequencing data, and which analyses were conducted on their mouse sequencing data.

      Thank you for the valuable feedback from the reviewer. To further clarify the source of the sequencing data, we have clearly indicated the data source in both the results section and the figure legends. 

      (4) In Figure 3H, the images showing SA-beta-gal staining on LPS-treated fibroblasts do not show convincingly the difference between treatments that are represented in the graph.

      Thank you for the reviewer's suggestion. To further clearly show the differences between treatments, we have enlarged the partial image of SA-β-gal staining shown in Figure 2-figure supplement 2 of the revised manuscripts. 

      (5) The choice of colors for Figure 4K is far from ideal as it is very difficult to tell apart red from purple channels and thus to visualize triple positive cells. A different LUT should be chosen, and separate individual channels should be shown to clearly identify triple-positive cells from others. Arrows also do not currently point at triple-positive cells.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are marked with white arrows shown in Figure 5-figure supplement 1 of the revised manuscripts.  

      (6) The authors state that treatment with metformin "alleviated.... inflammatory cell infiltration (Figure 2C), and collagen degradation (Figure 2D) as observed through H&E and Masson staining." However, I cannot find a description of how the "relative fraction of collagen" in Figure 2Gc was calculated and how the H&E image they provide shows evidence of a reduction in inflammatory cells at that magnification.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have added details in the methods section regarding the calculation of the "relative fraction of collagen" (Page 19, Line 539-541). Specifically, the collagen volume fractions (stained blue) for individual sections were measured using ImageJ2 software. Additionally, we have marked the infiltrating inflammatory cells in the gingiva in the H&E images with black arrows shown in Figure 7-figure supplement 1B of the revised manuscripts.

      (7) It appears that the in vivo experiment for metformin treatment was conducted with 6 animals per group, but this is not clear in the figures, main text, and methods.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included the number of mice in each group for the in vivo experiments, specifying that there are 6 mice per group in the figures, main text, and methods sections.

      (8) The methodology described for the bulk RNA-sequencing experiment in mice should describe the sequencing library characteristics and some reference to quality control thresholds that were implemented (mapped and aligned reads, sequencing depth and coverage, etc.).

      In the bulk RNA-sequencing experiment, the sequencing library characteristics and quality control thresholds were listed as follows:

      Sequencing Library Characteristics: We utilized the Illumina TruSeq RNA Library Construction Kit, generating libraries with an insert fragment length of approximately 400-500 bp.

      Quality Control Standards include the following:

      Alignment and Mapping Rates: The read data for all samples underwent preliminary quality control using FastQC (v0.11.9) and were aligned using HISAT2 (v2.2.1). The average mapping rate for each sample was over 90%.

      Sequencing Depth and Coverage: Each sample had a sequencing depth of 30M-40M paired reads to ensure sufficient transcript coverage. Detailed alignment statistics have been provided in the supplementary materials.

      Other Quality Control Measures: During the analysis, we also utilized RSeQC (v3.0.1) to evaluate the transcript coverage and GC bias of the sequencing data.

      The corresponding method description and reference were added in the Page 19-20, Line 546-558 of the revised manuscripts.

      (9) Patients with periodontitis are labeled as diagnosed with "chronic periodontitis". I would like to know how the authors defined this chronic state of the disease in their inclusion criteria.

      Thank you very much for the reviewer’s question, which gives us the opportunity to further clarify the definition and diagnosis of chronic periodontitis. The diagnostic criteria for patients with chronic periodontitis in this study are based on the 1999 International Workshop for a Classification of Periodontal Diseases and Conditions (Reference 1). Chronic periodontitis is a type of periodontal disease distinct from aggressive periodontitis, and it is not diagnosed based on the rate of disease progression. Clinically, the diagnosis of chronic periodontitis is primarily based on clinical attachment loss (CAL) ≥ 4 mm or probing depth (PD) ≥ 5 mm as one of the criteria for diagnosis.

      Reference

      (1) Armitage G. C. (2000). Development of a classification system for periodontal diseases and conditions. Northwest dentistry, 79(6), 31–35.

      (10) There is no detail about the age and sex of the donors for the healthy gingival fibroblast experiments. Are they some of the patients mentioned in Supplementary Table 1? Please clarify the source and number of independent primary cultures.

      Thank you very much to the reviewer for allowing us to further clarify the source and number of independent primary cultures. In the cell experiments, we used gingival fibroblasts derived from gingival tissue of two healthy volunteers and two patients with periodontitis as experimental subjects. This information has been listed in the Supplementary Table 1. 

      (11) Can the authors explain why their age inclusion criteria were different for the healthy and periodontitis groups according to their methods (healthy 18-50 years old: periodontitis 18-35 years old?)

      Thank you very much to the reviewer for pointing this out. We noticed that there was an error in the age range indicated for the healthy and periodontitis groups in the inclusion criteria. Based on the original inclusion criteria information, we have corrected the age range of the included population. 18-65 years old individuals were included into the both healthy and periodontitis groups. (Page 14-15, Line 396-404 in the revised manuscripts)

      (12) The methodology for inclusion is confusing and does not reflect the actual information of the recruited patients and samples thus analyzed. In the text, the healthy group appears to have included 8 young adult individuals and 8 middle-aged individuals. However, the list of recruited patients shows all healthy patients were in the young adult range (below 35 years of age) while all chronic periodontitis patients were middle-aged (above 50 years of age). Please clarify.

      Thank you very much to the reviewer for pointing out the issues in the article. This study included 8 healthy periodontal patients and 8 patients with periodontitis (Page 14, Line 396-398 and Supplementary Table 1 in the revised manuscripts). Since periodontitis has a higher prevalence in middle-aged and elderly populations, the periodontitis samples included in this study were mostly from this demographic. In contrast, the healthy gingival samples were sourced from patients undergoing wisdom tooth extraction, which primarily involves younger individuals. Therefore, due to the limited sample size, we could not enforce strict age matching. To address this, we repeated the relevant experiments in more consistent mouse models, which confirmed the increase in senescent cells in periodontal tissues (Figure 1D in the revised manuscripts). In summary, although the clinical samples were limited, the experimental results from the mouse models still support our conclusions.

      (13) The number of biological replicates for each group used in the bulk RNA-sequencing experiment is unclear. The methods state:" For those with biological duplication, we used DESeq2 [8] (version: 1.34.0) to screen differentially expressed gene sets between two biological conditions; for those without biological duplication, we used edgeR". Please clarify the number of mouse samples sequenced and the description of the groups.

      Thank you very much to the reviewer for pointing out the errors in the article. In the transcriptome sequencing, we collected gingival tissues from 3 healthy mice and gingival tissues from 3 ligature-induced periodontitis mice. Therefore, we used the DESeq2 (version: 1.34.0) method to filter for differentially expressed genes. The corresponding descriptions were revised in Page 20, Line 554-555 in the revised manuscripts.

      (14) Cluster group labels are misaligned in Figure 4C.

      Thank you very much for the reviewer's suggestion. The cluster group labels in Figure 3C of the revised manuscripts have been aligned.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments for the Authors:

      (1) I do not find the immunohistochemical staining of p16 and p21 shown in Figures 2E and F to be particularly compelling. Especially as other stains of these markers used later in the manuscript are of higher quality (i.e. Figures 3F and G). Can this staining be improved to better reflect the quantifications in Figure 2G?

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (2) On line 140, Supplementary Figure 2C, D is quoted to show "...an increase in senescence characteristics of fibroblasts with the severity of periodontitis." This figure panel does not appear to support this statement. Please revise.

      Thank you very much for pointing out the errors in the manuscript. In the revised version, we have corrected this part of the description and added that “The results showed a decline in fibroblast proportion along with increasing disease severity (Figure 2-figure supplement 1C and D)” (Page 6, Line 153-154 of the revised manuscript)

      (3) I do not find the Western Blot experiment in Figure 4L to be particularly convincing. The text states that p21, p16, and CD81 increase in a context-dependent manner upon LPS stimulation, which doesn't appear to be very evident. I recommend repeating this experiment and showing both a representative blot alongside a blot density quantification where the bars have the error shown between experiments.

      Thank you very much for the reviewer’s suggestion regarding this result. During subsequent repeated experiments, we found that the result was not reproducible, and we have removed the related results.

      (4) The results state that metabolic profiling of senescent fibroblasts shows an increase in the biosynthesis of Linoleic acid, linolenic acid, arachidonic acid, and steroid. However, in Figure 5B only arachidonic acid and steroid biosynthesis appear to be elevated in CD81+ Fibroblasts, while Linoleic and linolenic acid appear to be decreased. Can the authors comment on this discrepancy? Moreover, in Figure 5C steroid biosynthesis is unchanged between healthy and periodontitis samples, contrary to the claimed increased trend in the results text. Please revise this section. Also, in Figures 5 B and C some of the terms are highlighted in a red or blue box. This is not discussed in the figure legend. Could the significance of this be explained or could these highlights be removed from the figure?

      Thank you very much for the reviewer’s correction regarding the errors in the manuscript. In the Page 7-8, Line 186-194 of the revised manuscripts, “Pathways related to fatty acid biosynthesis, arachidonic acid metabolism, and steroid biosynthesis were significantly upregulated in CD81+ fibroblasts (Figure 4-figure supplement 1A)” was re-wrote. Moreover, we have removed the results from Figure 5C, and the highlights in Figures 5B and C of the previous manuscripts. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (5) The authors state that arachidonic acid can be converted to prostaglandins and leukotrienes through COXs (which are expressed in their CD81+ Fibroblasts), accentuating inflammatory responses. Have the authors profiled for the expression of prostaglandins and leukotrienes in their CD81+ Fibroblasts or between healthy and periodontitis samples? Such data would be a great inclusion in the manuscript.

      Thank you very much for the reviewer’s suggestion. Our results indicated that CD81+ gingival fibroblasts expressed higher levels of PTGS1 and PTGS2 compared to other fibroblast subpopulations. These genes encode proteins that are COX-1 and COX-2, which are key enzymes in prostaglandin biosynthesis (Figure 4-figure supplement 1 of the revised manuscript). Additionally, previous studies have reported high levels of prostaglandins and leukotrienes in periodontal tissues, and these pro-inflammatory mediators contribute to tissue destruction in periodontitis (Reference 1 and 2).

      Reference

      (1) Van Dyke, T. E., & Serhan, C. N. (2003). Resolution of inflammation: a new paradigm for the pathogenesis of periodontal diseases. Journal of dental research, 82(2), 82–90.

      (2) Hikiji, H., Takato, T., Shimizu, T., & Ishii, S. (2008). The roles of prostanoids, leukotrienes, and platelet-activating factor in bone metabolism and disease. Progress in lipid research, 47(2), 107–126.

      (6) Lines 199 and 200 state "...the cellular senescence of CD81+ fibroblasts could be attributed to disturbances in lipid metabolism". While altered lipid metabolic profiles are shown in Figure 5 to correlate with senescent fibroblasts/periodontitis tissue, no evidence is shown to suggest that they are the driver or cause of fibroblast senescence. Could this sentence be amended to better reflect the conclusions that can be drawn from the data presented?

      Thank you very much for the reviewer’s suggestion. We have revised the related statements and believed that “lipid metabolism might play a role in cellular senescence of the gingival fibroblasts” in the Page 7, Line 189 of the revised manuscripts.  

      Minor Comments for the Authors:

      (1) There are some sentences without references that I feel would warrant referencing: - Line 112 - "Metformin, an anti-aging drug has shown potential in inhibiting cell senescence in various disease models (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page10, Line 267-271 of the revised manuscripts.

      Reference

      (1) Soukas, A. A., Hao, H., & Wu, L. (2019). Metformin as Anti-Aging Therapy: Is It for Everyone?. Trends in endocrinology and metabolism: TEM, 30(10), 745–755.

      (2) Kodali, M., Attaluri, S., Madhu, L. N., Shuai, B., Upadhya, R., Gonzalez, J. J., Rao, X., & Shetty, A. K. (2021). Metformin treatment in late middle age improves cognitive function with alleviation of microglial activation and enhancement of autophagy in the hippocampus. Aging cell, 20(2), e13277.

      - Line 210 - "Previous studies have demonstrated the importance of sustained neutrophil infiltration in the progression of periodontitis (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page 8, Line 211-214 of the revised manuscripts.

      Reference

      (1) Song, J., Zhang, Y., Bai, Y., Sun, X., Lu, Y., Guo, Y., He, Y., Gao, M., Chi, X., Heng, B. C., Zhang, X., Li, W., Xu, M., Wei, Y., You, F., Zhang, X., Lu, D., & Deng, X. (2023). The Deubiquitinase OTUD1 Suppresses Secretory Neutrophil Polarization And Ameliorates Immunopathology of Periodontitis. Advanced science (Weinheim, Baden-Wurttemberg, Germany), 10(30), e2303207.

      (2) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (3) Ando, Y., Tsukasaki, M., Huynh, N. C., Zang, S., Yan, M., Muro, R., Nakamura, K., Komagamine, M., Komatsu, N., Okamoto, K., Nakano, K., Okamura, T., Yamaguchi, A., Ishihara, K., & Takayanagi, H. (2024). The neutrophil-osteogenic cell axis promotes bone destruction in periodontitis. International journal of oral science, 16(1), 18.

      (2) To improve the quality of several of the authors' claims I would recommend some further quantification of their experimental analyses. Namely:

      - Figures 3 F and G

      - Figures 4 I, J and K

      - Figures 6 F and G

      - Supplementary Figures 4 A, B, and C

      Thank you for the reviewer's suggestion. We have supplemented the quantitative analysis results for some images based on the reviewer's recommendations, specifically in Figure. 2G, Figure. 3G, Figure 5-figure supplement 1A, B, Figure 5-figure supplement 2A and Figure 7figure supplement 3A-D in the revised manuscripts. 

      (3) Figure 1L has missing x-axis annotation.

      Thank you for the reminder from the reviewer. The X-axis label has been added in Figure 1-figure supplement 1D for the GO term annotation. 

      (4) Line 117 is missing a reference for the experimental schematic shown in Figure 2A.

      Thank you for the reminder from the reviewer. The experimental schematic shown in Figure 7A has been referenced in Page 10, Line 275-277.

      (5) The "BV/TV ratio" and "CEJ-ABC distance" should be briefly explained in the results test (Lines 118 and 119).

      Thank you for the reviewer's suggestion. We have added the explanation of "BV/TV ratio" and "CEJ-ABC distance." In Page 10-11, Line 279-281 in the revised manuscripts.

      (6) Figure 2 could be improved by having some annotation for the anatomical regions shown.

      Thank you for the reviewer’s valuable suggestion. We have labeled the relevant anatomical structures to enhance clarity in Figure 7 in the revised manuscripts. 

      (7) The positive signal for p16 and p21 is difficult to interpret in Figure 2. Could the clarity of this be improved either by using more evident images or annotation with arrowheads indicating positive cells?

      Thank you for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure. 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (8) Figure 2Gc, d, and e are not mentioned in the results text. Please include references to these panels at the appropriate points.

      Thank you for the reminder. In the revised manuscripts, Figures 2G c, d, and e in the previous manuscripts have been mentioned in the text in the Page 11, Line 284-289 of the revised manuscript. 

      (9) Scale bars are missing in Supplementary Figure 2E.

      Thank you for the suggestion. The scale bar has been added in the Figure 7-figure supplement 2B in the revised manuscripts. 

      (10) The order of the figure panels is not always mentioned in the order they are referred to in the text. For example, Figure 3 is presented in the order of A, B, D then C. Could this be changed to reflect the order in the results text?

      Thank you for the feedback. We have renumbered the figures according to the order mentioned in the original manuscript (Page 6, Line 146-149, Figure 2 in the revised manuscripts).

      (11) To improve reader clarity it would be good to briefly introduce the gene expression datasets analysed, such as GSE152042. I.e. what the experimental condition is from which it is derived.

      Thank you for the suggestion. We have included a brief description of the information and sources of the samples from GSE152042 in Page 6, Line 140-142 of the revised manuscripts. 

      (12) To improve reader clarity I would recommend signifying clearly in the figure if the data shown is from mouse or human samples. For example in Figure 3F and G.

      Thank you for the suggestion. We have moved all the results from the mouse experiments to the figures supplement (Figure 5-figure supplement 1 and 2 in the revised manuscripts).

      (13) The images shown in Figure 3H for SA-beta-Gal do not seem very convincing. Could this be improved?

      Thank you for the suggestion. To further illustrate the differences in SA-beta-Gal results between the groups, we have provided images at higher magnification in the Figure 2-figure supplement 2 of the revised manuscripts.  

      (14) Supplementary Figure 2E would benefit from small experimental schematics that would allow the reader to appreciate the timings of the treatment for this experiment.

      Thank you for the suggestion. We have added a schematic diagram in Figure 7-figure supplement 2A of the revised manuscripts to illustrate the LPS treatment, metformin treatment, and the timing of the assessments. 

      (15) Figure 4K would benefit from showing the merged image and single channels of each of the stains to better assess the degree of colocalisation.

      Thank you for the suggestion. We have included each individual fluorescence channel in Figure 5-figure supplement 1C of the revised manuscripts. 

      (16) The writing on the X-axis of Figure 6B is almost illegible to me, although this may just be a compression artefact. This makes the interpretation of the data quite difficult. Also, for Figures 6 B and C, the meaning of the (H) and (P) annotations should be clear on either the figure or figure legend. I surmise that they represent "Healthy" and "Periodontic" samples respectively.

      Thank you for the suggestion. In the revised manuscript, we have enlarged Figure 6B in the previous manuscripts to better display the X-axis as shown in the Figure 5B of the revised manuscripts. Additionally, we have fully labeled "Healthy" and "Periodontitis" in Figure 5C of the revised manuscripts.

      (17) MPO-positive cells are introduced on line 216, however, no explanation is provided for what population or state the expression of this protein marks. I surmise the authors are using it to detect Neutrophil populations. If so, could the authors briefly state this the first time it is used?

      Thank you for the suggestion. In the revised manuscript, we have added an introduction to MPO. MPO, or myeloperoxidase, is considered one of the markers for neutrophils. (Page 9, Line 240-242 of the revised manuscripts)

      (18) Supplementary Figure 3D does not appear to be mentioned or discussed in the results text.

      Thank you for the reminder. We have referenced Supplementary Figure 3D in the previous manuscripts in Page 9, Line 240-242 shown as Figure 5-figure supplement 2C of the revised manuscript.  

      (19) Figure 6E showing increased C3 expression in periodontic samples is not very convincing and differences in expression are not evident. Can the authors provide an image that more convincingly matches their quantification?

      Thank you for the suggestion. In the revised manuscript, we have provided more representative images shown in Figure 5E of the revised manuscript.

      (20) Figure 6I shows the expression of CD81 and SOD2 in healthy and periodontic tissue. The associated results texts (Lines 220 to 223) discuss the spatial coincidence of CD81 and MPO. Can the authors address this discrepancy in either the results text or the figure panel? Moreover, can Figure 6H and I be annotated to show the location of the gingival lamina propria to improve clarity?

      Thank you for the reminder. We have revised the relevant statements in the text: "Interestingly, spatial transcriptomic analysis of gingival tissue revealed that the regions expressing CD81 and SOD2, a neutrophil marker, in periodontitis overlapped in the gingival lamina propria, showing a high spatial correlation" in Page 9, Line 223-226 of the revised manuscripts. Additionally, we have labeled the gingival lamina propria (LP) in Figure 5H of the revised manuscripts.

      (21) I am confused about the purpose of Supplementary Figure 3E and what evidence it provides. Can the authors comment on this?

      Thank you for the reminder. To avoid any potential misunderstanding by readers, we have deleted Supplementary Figure 3 image in the revised manuscripts

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Wang et al show that differentiated peridermal cells of the zebrafish epidermis extend cytoneme-like protrusions toward the less differentiated, intermediate layer below. They present evidence that expression of a dominant-negative cdc42, inhibits cytoneme formation and leads to elevated expression of a marker of undifferentiated keratinocytes, krtt1c19e, in the periderm layer. Data is presented suggesting the involvement of Delta-Notch signaling in keratinocyte differentiation. Finally, changes in expression of the inflammatory cytokine IL-17 and its receptors is shown to affect cytoneme number and periderm structure in a manner similar to Notch and cdc42 perturbations.

      Strengths:

      Overall, the idea that differentiated cells signal to underlying undifferentiated cells via membrane protrusions in skin keratinocytes is interesting and novel, and it is clear that periderm cells send out thin membrane protrusions that contain a Notch ligand. Further, perturbations that affect cytoneme number, Notch signaling, and IL-17 expression clearly lead to changes in periderm structure and gene expression.

      Weaknesses:

      More work is needed to determine whether the effects on keratinocyte differentiation are due to a loss of cytonemes themselves, or to broader effects of inhibiting cdc42. Moreover, more evidence is needed to support the claim that periderm cytonemes deliver Delta ligands to induce Notch signaling below. Without these aspects of the study being solidified, understanding how IL-17 affects these processes seems premature.

      Reviewer #2 (Public Review):

      Summary:

      The aim of the study was to understand how cells of the skin communicate across dermal layers. The research group has previously demonstrated that cellular connections called airinemes contribute to this communication. The current work builds upon this knowledge by showing that differentiated keratinocytes also use cytonemes, specialized signaling filopodia, to communicate with undifferentiated keratinocytes. They show that cytonemes are the more abundant type of cellular extension used for communication between the differentiated keratinocyte layer and the undifferentiated keratinocytes. Disruption of cytoneme formation led to the expansion of the undifferentiated keratinocytes into the periderm, mimicking skin diseases like psoriasis. The authors go on to show that disruption of cytonemes results in perturbations in Notch signaling between the differentiated keratinocytes of the periderm and the underlying proliferating undifferentiated keratinocytes. Further, the authors show that Interleukin-17, also known to drive psoriasis, can restrict the formation of periderm cytonemes, possibly through the inhibition of Cdc42 expression. This work suggests that cytoneme-mediated Notch signaling plays a central role in normal epidermal regulation. The authors propose that disruption of cytoneme function may be an underlying cause of various human skin diseases.

      Strengths:

      The authors provide strong evidence that periderm keratinocytes cytonemes contain the notch ligand DeltaC to promote Notch activation in the underlying intermediate layer to regulate accurate epidermal maintenance.

      Weaknesses:

      The impact of the study would be increased if the mechanism by which Interlukin-17 and Cdc42 collaborate to regulate cytonemes was defined. Experiments measuring Cdc42 activity, rather than just measuring expression, would strengthen the conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Leveraging zebra fish as a research model, Wang et al identified "cytoneme-like structures" as a mechanism for mediating cell-cell communications among skin epidermal cells. The authors further demonstrated that the "cytoneme-like structures" can mediate Notch signaling, and the "cytoneme-like structures" are influenced by IL17 signaling.

      Strengths:

      Elegant zebrafish genetics, reporters, and live imaging.

      Weaknesses: (minor)

      This paper focused on characterizing the "cytoneme-like structures" between different layers and the NOTCH signaling. However, these "cytoneme-like structures" observed in undifferentiated KC (Figure 2B), although at a slightly lower frequency, were not interpreted. In addition, it is unclear if these "cytoneme-like structures" can mediate other signaling pathways than NOTCH.

      We are currently investigating the role of cytoneme-like protrusions extended from undifferentiated keratinocytes and their role is still under investigation. We believe that addressing the function of undifferentiated keratinocyte cytonemes and exploring whether peridermal cytoneme can mediate other signaling pathways is beyond the scope of the current manuscript. However, we hope to publish our discoveries about them soon. It is worth noting that cytonemes mediate other morphogenetic signals, such as Hh, Wnt, Fgf, and TGFbeta in other contexts.

      Overall, this is a solid paper with convincing data reporting the "cytoneme-like structures" in vivo, and with compelling data demonstrating the roles in NOTCH signaling and the regulation by IL17.

      These findings provide a foundation for future work exploring the "cytoneme-like structures" in the mammalian system and other epithelial tissue types. This paper also suggests a potential connection between the "cytoneme-like structures" and psoriasis, which needs to be further explored in clinical samples.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      - In general, representative images from each experiment should accompany the graphs shown. The inclusion of still frames from time-lapse imaging experiments in the main figures would help the reader understand the morphology and dynamics of these protrusions in control, cdc42, and IL-17 manipulations.

      Thank you for the comments. We appreciate your suggestion to include representative images alongside the graphs to better illustrate the morphology and dynamics of these protrusions.

      In response, we have made the following additions to our main figures.

      Figure 3A now includes still images from time-lapse movies for both control and cdc42 manipulations.

      Figure 5A and 6A,C now include still images for il17 manipulations.

      - Data in Figure 3 is crucial as it demonstrates that cdc42DN selectively impairs cytoneme extensions without affecting other actin-based structures. It also shows that cdc42DN leads to upregulation of krtt1c19e in periderm. Therefore, these data should be presented in a comprehensive way. Still, frames of high mag views of time-lapse images from control and cdc42DN should be included in the figure. Similarly, a counter label (E-Cadherin, perhaps) showing the presence of all three layers and goblet cells at different focal planes capturing the different layers of the skin should be included. It is stated that the goblet cell number is unaffected, but they seem to be absent in the image shown in Figure 3B.

      In this revised version, we have included magnified cross-sectional views. In addition to the images of the peridermal layer from the original version, we have now included the underlying intermediate and basal stem cell layers (Figure 3C-C”). We hope these data convincingly show that peridermal keratinocytes in cytoneme inhibited animals co-express krt4 and krtt1c19e markers, suggesting that peridermal keratinocytes are not fully differentiated.

      We agree that the goblet cells in this particular image of experimental group appear largely absent, however, as we quantified many animals, the number of goblet cells was not significantly different between controls and experimental (Figure S2).

      - The effects on periderm architecture upon broad cdc42 inhibition may not be directly due to a loss of cytonemes. Performing this experiment in a mosaic manner to determine if the effects are local and in the range of cytoneme protrusion would strengthen the conclusions. Adding a secondary perturbation to inhibit cytoneme formation in periderm cells would also strengthen the conclusions that defects are not related specifically to cdc42 inhibition, but cytonemes themselves.

      Thank you for the suggestion. We confirmed that mosaic expression of cdc42DN in peridermal keratinocytes elicited local disorganization, and elevated krtt1c19e expression as we seen in transgenic lines. Also, the cdc42DN expressing cells exhibited significantly lower cytoneme extension frequency.

      In addition, we found that like cdc42DN, rac1DN expressing keratinocytes exhibited significant decrease in cytoneme extension frequency, but rhoabDN show no effects (new Figure S3). These data suggest that cytoneme extension is regulated by cdc42 and rac1 but not rhoab. Further investigation is required however, at least these data suggest that the effects we observe is likely the loss of cytonemes not just specifically to cdc42 inhibition.

      - Figure 4. The inclusion of an endogenous reporter of Notch activity, like Hes or Hey immunofluorescence, would strengthen the conclusion that the intermediate layer is Notch responsive.

      Thank you for the suggestion. In this revised version, we have included immunostaining data in Figure 4D demonstrating that Her6 (the orthologous to human HES1) protein is expressed in the intermediate layer.

      - It is not clear where along a differentiation trajectory Notch signaling and cytonemes are needed. What happens to the intermediate layer when Notch signaling or cdc42 is inhibited? Do the cells become more basal-like? Or failing to become periderm? Meaning - is Notch promoting the basal to intermediate fate transition, or the intermediate to periderm transition? A more comprehensive characterization of basal, intermediate, and periderm differentiation with markers selective to each layer would help define which step in the process is being altered.

      Notch signaling is known to regulate keratinocyte terminal differentiation. Thus, it requires in the process from intermediate to peridermal transition. We observed peridermal keratinocytes still strongly express krt19 suggesting their terminal differentiation is inhibited when cytoneme mediated Notch signaling is compromised.

      As seen on Figure 3C”, peridermal keratinocytes express both krt4 and krtt1c19e markers and they are located at the peridermal layer suggesting that they are not fully differentiated keratinocytes. As we included the images of intermediate and basal layers, we do not observe any noticeable defects in basal stem cells or complete depletion of intermediate keratinocytes (Fig 3C-C”). These observations suggest that notch signaling, activated by cytonemes, is required for the differentiation of undifferentiated intermediate keratinocytes into peridermal keratinocytes.

      We included this interpretation in the main text.

      - A number of times in the text it is suggested that cytonemes, Notch, and IL-17 signaling are essential for keratinocyte differentiation and proliferation, but proliferation (% cells in S-phase and M-phase) is not measured. Also, #of keratinocytes @ periderm is not an accurate way to report the number of cells in the periderm unless every cell in the larvae has been counted. It should be # cells/unit area.

      In this revised version, we confirmed that the number of Edu+ cells among peridermal keratinocytes are significantly increased when cytonemes are inhibited (Figure 3F-G). Also, as indicated in the methods section, we indeed counted the cells in 290um x 200um square. We believe both of the data sufficiently suggest that the number of keratinocytes in periderm is significantly increased due to the lack of proper cytoneme mediated signaling.

      - If the model is correct that Delta ligands from the periderm signal to intermediate cells to promote their differentiation and inhibit their proliferation, then depletion of Delta from Krt4 expressing cells should recapitulate the periderm phenotype.

      It is a great suggestion. However, zebrafish skin express multiple delta ligands and we do not know what specific combination of Deltas are delivered via cytonemes. In this manuscript we identified Dlc is expressed along the cytonemes and krt4+ cells (revised Figure S4), however we are unsure whether other Delta ligands involve the notch activation. However, cytoneme inhibition is performed specifically in krt4+ cells and the downregulation of Notch activation are observed in krtt1c19e+ undifferentiated keratinocytes. In this revised version, we found that a Notch responsive protein Her6 is exclusively expressed in the cytoneme target keratinocytes, and cytoneme extending cells (krt4+) do not express Notch receptors.

      - rtPCR data in Figure S3 is not properly controlled. Each gene should be tested in both krt4 and krtt1c19e expressing cells to determine their relative expression levels in different skin layers that are proposed to signal to one another. Are Notch ligands present in basal cells? These could be activating Notch in the intermediate layer.

      Our intention was to merely confirm the Notch signaling components are expressed in cytoneme extending and receiving cells. Based on the new panel of RT-PCRs for notch signaling components, we confirmed again that dlc is expressed in cytoneme extending cells but not in receiving cells. Basal cells are also krtt1c19e+ but we did not detect dlc from them. Interestingly, we found that notch 2 is exclusively expressed in krtt1c19e+ cells but not from krt4+ cytoneme extending cells (now new Figure S4).

      - It is not intuitive why NICD (activation) and SuHDN (inhibition) of Notch signaling should result in a similar effect on the periderm. What is the effect of NICD expression on the TP1:H2BGFP reporter? Does it hyperactivate as expected?

      We agree reviewer’s concerns. It is well studied that psoriasis patients exhibits either loss or gain of notch signaling (Ota et al., 2014 Acta Histochecm Cytochem, Abdou et al., 2012 Annals of Diagnostic Pathology). However, it remains unknown the underlying mechanisms. We merely intended to showcase our zebrafish experimental manipulations recapitulate human patients’ case. However, we believe this data doesn’t require for drawing the overall conclusion but need further investigation to explain it. Thus, if the reviewers agree we want to omit it in this manuscript and leave it for future studies.

      - Due to the involvement of immune signaling in hyperproliferative skin diseases the paper then investigates the role of IL-17 on cytoneme formation by overexpressing two IL-17 receptors in the periderm. Fewer cytonemes were present in the receptor over-expressing periderm cells. The rationale for overexpressing the receptors was unclear. If relevant to endogenous cytokine signaling, the periderm would be expected to express IL-17 receptors normally and respond to elevated levels of IL-17.

      The rationale behind the reason of why we overexpress the IL-17 receptors is to test its autonomy of krt4+ peridermal cells. There is a debate that whether the onset of psoriasis is autonomous to keratinocytes or non-autonomous effects of immune malfunction. In addition to the overexpression of IL-17 receptors, we showed that the IL-17 ligand overexpression shows the sample effects on cytoneme extension (Fig. 6A-B).

      - Experiments overexpressing IL-17 in macrophages are also suggested to limit cytoneme number whereas heterozygous deletion elevates them. Representative images and movies should be included to support the data. Western blots or immunofluorescence showing that IL-17 and its receptors are indeed overexpressed in the relevant layers/cell types should also be included as controls. Knockout of IL-17 protein in the new Crispr deletion mutant should also be shown.

      In response to the reviewer’s comments, we have included representative images of peridermal keratinocytes in IL-17 ligand overexpressed and il17 CRISPR KO animals (Fig. 6A,C).

      We have confirmed the overexpression of Il17rd, Il17ra1a and Il17a in the transgenic animals. For the il17 receptors, we FACS-sorted differentiated keratinocytes and performed qRT-PCR. Similarly, for the il17 ligand, we isolated skin tissue and conducted qRT-PCR (new Figure S7).

      Additionally, we confirmed that IL-17 protein expression is undetectable in il17a CRISPR KO fish (Fig. S8C).

      - Evidence that the effect of IL-17 upregulation on periderm architecture is via cytonemes is suggestive but not conclusive. Can the phenotype be rescued by a constitutively active cdc42?

      We appreciate the reviewer’s suggestion. We are unsure whether constitutively active cdc42 expression can rescue IL-17 overexpression mediated reduction of cytoneme extension frequency. It is well expected that cdc42CA will stabilize actin polymerization in turn more cytonemes. However, it is also known sustained cdc42 activation can paradoxically lead to actin depolymerization. Thus, we concern it will be likely uninterpretable. Also, we need to generate a new transgenic line for this experiment and the baseline control experiments and validations take substantial amount of time and efforts with no confidence.

      We and others believe that the cdc42 is a final effector molecule to regulate cytoneme extension given its role in actin polymerization. we provided the evidence that IL-17 overexpression significantly reduced cdc42 and rac1 expression (Figure 6E) and co-manipulation with IL17 overexpression and cdc42DN led to further down-regulation of cytoneme extension frequency in peridermal keratinocytes (Figure 6H).

      - In a final experiment, the authors mutate a psoriasis-associated gene, clint1a gene and show an effect on cytonemes, Notch output, and periderm structure. More information about what this gene encodes, where the mRNA is expressed, and where the cell the protein should localize would help place this result in context for the reader.

      In this revised manuscript we included more information about the clint1.

      “The clathrin interactor 1 (clint1), also referred to as enthoprotin and epsinR functions as an adaptor molecule that binds SNARE proteins and play a role in clathrin-mediated vasicular transport (Wasiak, 2002). It has also been reported that clint1 is expressed in epidermis and play an important role in epidermal homeostasis and development in zebrafish (Dodd et al., 2009)”.

      Minor points

      - The architecture of zebrafish skin is notably distinct from that of humans and other mammals and whether parallels can be drawn with regards to cytoneme mediated signaling requires further investigation. For this reason, I believe the title should include the words 'in zebrafish skin'.

      In this version, we changed the title as ‘Cytoneme-mediated intercellular signaling in keratinocytes essential for epidermal remodeling in zebrafish’.

      - More details about the timing of cdc42 inhibition should be given in the main text to interpret the data. How many hours of days are the larvae treated? How does this compare to the rate of division and differentiation in the zebrafish larval epidermis?

      We apologize for omitting the detailed experimental conditions for cytoneme inhibition. We have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

      - What are the genotypes of animals in Figure 4B where 'Notch expression' is being measured upon cdc42DN inhibition? Is this the TP1:H2B-GFP reporter? Again, details of the timing of this experiment are needed to evaluate the results.

      We indicated the reference supplement figure for the Notch activity measure in the figure legend S4. And we added the following sentence in the main text. “Similar to the effects on the epidermis after cytoneme inhibition (Figure 3), it takes 3 days to observe a significantly reduction in Notch signal in the undifferentiated keratinocytes.”

      Reviewer #2 (Recommendations For The Authors):

      - Figure 2B: the authors indicate that the undifferentiated keratinocytes (krtt1c19e+) do extend some cytonemes. Although this behavior is not a focus of the study, it would be helpful to see an image of krtt1c19e:lyn-tdTomato cytonemes. The discussion ends with an interesting statement about downward pointed protrusions coming off the undifferentiated keratinocytes. A representative image of this should be included in Figure 2.

      In this revised version, we included an image of krtt1c19e positive cell that extend cytonemes in Figure 2C.

      - The evidence for hyperproliferation of the undifferentiated keratinocytes would be strengthened by quantifying proliferation. Most experiments result in increased expression of krtt1c19e in the periderm layer, but it is unclear whether this is invasion, remodeling, or incomplete differentiation of the cells. Notch suppression with krtt1c19e:SuHDN and overactivation with krtt1c19e:NICD phenocopy each other. Are there differences in proliferation vs differentiation rates in these two genotypes that result in a similar phenotype?

      We appreciate the reviewer’s comments. In response to the feedback, we included Edu experiments that show increased cell proliferation in keratinocytes in periderm in experimental groups. Additionally, we observed co-expressed of both differentiated marker krt4 and undifferentiated marker krtt1c19e in the keratinocytes in periderm. Since we did not observe depletion of intermediate layer, we believe it is reasonable to conclude that the phenotype represents incomplete differentiation (new Figure 3). For the krtt1c19e:NICD question, please refer to our response to reviewer #1’ comment.

      - Do Cdc42DN and il17rd or il17ra1a work in parallel or in a hierarchy of signaling events to regulate cytoneme formation?

      Cdc42 is widely recognized as a final effector in cytoneme extension, given its well-established role in actin polymerization, which is critical for cytoneme extension. Our data support a model where il17 signaling acts upstream of cdc42. We showed that the overexpression of il17rd or il17ra1a significantly reduced the expression of Cdc42 (Figure 6E). In double transgenic fish overexpressing il17rd and cdc42DN, we observed a more marked decrease in cytoneme extension compared to single transgenic (Figure 6H). These results collectively indicate that, at least partially, Cdc42 functions downstream of il17 signaling in the context of cytoneme formation. However, we acknowledge that additional regulatory mechanisms may be involved, given the complexity of cellular signaling networks.  

      - Figure 6C: Are the effects of overexpression of il17rd specific to Cdc42, or are other Rho family GTPases like Rac and Rho also affected? Is the microridge defect (Figure 6D) also present in Tg(krt4:TetGBDTRE-v2a-cdc42DN) when induced, or could this be regulated by Rho/Rac?

      We used the microridge formation as a readout to evaluate the effects of il17receptor overexpression on actin polymerization. In this revision, we demonstrate that the expression of other small GTPases is also decreased in il17rd or il17ra1a overexpressed keratinocytes (Figure 6E). Also, we confirmed that microridges exhibit significantly shorter branch length when cdc42DN or rac1DN is overexpressed (new Figure S9). It is note that we have shown that the effects on cytonemes are regulated by cdc42 and rac1 (new Figure S3).

      - Please change the color of the individual data points from black to grey or another color so readers may better visualize the mean and error bars.

      We agree with this comment, and in response, we have revised the figures by changing the color of the individual data points to empty circles and now the error bars are better visualized.

      - Figure 1: What were the parameters used to identify an extension as a cytoneme? Please include the minimal length and max-width used in the analysis in the methods.

      Thank you for the comments. We have now included the method of how we defined cytonemes and measured as follows. In zebrafish keratinocytes, lamellipodial extensions are the dominant extension type, and most filopodial extensions are less than 1µm in length, both are not easily visible at the confocal resolution we used for this study. Thus, it is easy to distinguish filopodia from cytonemes, as cytonemes have a minimum length of 4.36µm in our observations. We did not use the width parameter since there are no other protrusions except cytonemes. We calculated the cytoneme extension frequency by counting how many cytonemes extended from a cell per hour. We analyzed movies with 3-minute intervals over a total of 10 hours, as described in the section above.

      - Line 149-150, (Figure S1) ML141 is a Cdc42 inhibitor, please correct the wording. Would the use of an actin polymerization inhibitor like Cytochalasin B or a depolymerizing agent (Latrunculin) increase the reduction in cytoneme formation?

      Thank you for pointing it out. We have revised it in this version. We have tried Cytochalasin B or Latrunculin and the treatments killed the animals.

      - Figure 2: What is the depth of the Z-axis images? Does the scale bar apply to the cross-sectional images as well? It may be beneficial to readers to expand the Z scale of the cross-section images for Figure 2C.

      Sure, we enlarged the cross-sectional images. Yes, the scale bar should apply to the cross-sectional images.

      - Figure 3B-B' cross-section images should be added to confirm images shown represent the periderm layer. Are there folds in the epidermis due to cdc42DN expression or are differentiated keratinocytes absent?

      In response, we have included z-stack images in the revised figure 3. We found that the epidermal tissue is not flat as compared to controls, presumably due to broad cdc42DN expression (Figure 3C”).

      - Figure S3: Do the EGFP+ and tdTomato+ cells have noticeable differential gene expression? The inclusion of RT-PCR analysis of all genes analyzed for both cell populations would bolster statements on lines 230-231 and 254-256.

      We agree the reviewer’s comment and we have revised the RT-PCR panel in this revised version (Figure S4).

      - Figure 4D-D', Please include cross-section images to indicate the focal plane for analysis.

      We included cross-section images in this revised version (Figure 4E-E”).

      - Figure 5B: Complimentary images visualizing the reduction of Notch would be helpful.

      We are sorry not to include the data. In this revised version, we included notch reporter expression data that comparing WT, Tg(krt4:il17rd), and Tg(krt4:il17ra1a) in Figure S5E.

      - Line 432-433: "Moreover, we have demonstrated that IL-17 can influence cytoneme extension by regulating Cdc42 GTPases, ultimately affecting actin polymerization." This claim would be strengthened by assaying for Cdc42 activity.

      It is a great idea, and we were trying to address this issue. However, we realized that activity measure with biosensors, especially in vivo, required significant amount of time and effort and validations which seem to take a substantial amount of work needed, and no confidence to work in our end. And, it seems the current methods works for in vitro samples still has many limitations such as sensitivity issues. Although, we agree cdc42 activity measure will bolster our findings, it seems very challenging to apply it to zebrafish in vivo system.

      - Line 445-447: "Clint1(Clathrin Interactor 1) plays an important role in vesicle trafficking, and it is well established that endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery (Kalthoff et al., 2002)." Please add references to the "endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery" portion of the sentence.

      We revised the sentence. It is “well established” -> it is “suggested”, and added a reference (Daly et al., 2022).

      Reviewer #3 (Recommendations For The Authors):

      The details of the "cytoneme inhibition" experiments need to be better clarified. How long was the dox treatment? How soon did the cells start to show "disorganization"? How soon did the KC in the periderm start to show increased proliferation?

      Thank you for the valuable comment and in response, we have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with _mis_HCR and _mis_HCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to _mis_HCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction of nanobody and peptide ligation technology is a key highlight of this study. To strengthen the manuscript, the authors should provide a more detailed discussion of the principles and applications of HCR in the Introduction or Discussion sections.

      We have added a brief discussion of the HCR reaction to the revised manuscript.

      (2) It would also be beneficial to include results and/or discussion on how the affinity of nanobody binding to IgG influences the success and accuracy of the technique.

      We have added a brief discussion of the IgG nanobodies we used in MaMBA to the revised manuscript.

      (3) Additionally, a more detailed explanation of the recognition specificity of the AEP peptide ligase used in this study should be included in the Discussion section. Prior studies have reported on the specificity of amino acid residues positioned at the C-terminus of target A (-5 to -1) and the N-terminus of target B (1 to 3) in AEP-mediated ligation, and integrating this context would enhance clarity.

      We have added a brief discussion of the AEP-mediated ligation to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment 

      The authors utilize a valuable computational approach to exploring the mechanisms of memorydependent klinotaxis, with a hypothesis that is both plausible and testable. Although they provide a solid hypothesis of circuit function based on an established model, the model's lack of integration of newer experimental findings, its reliance on predefined synaptic states, and oversimplified sensory dynamics, make the investigation incomplete for both memory and internal-state modulation of taxis.  

      We would like to express our gratitude to the editor for the assessment of our work. However, we respectfully disagree with the assessment that our investigation is incomplete, if the negative assessment is primarily due to the impact of AIY interneuron ablation on the chemotaxis index (CI) which was reported in Reference [1]. It is crucial to acknowledge that the CI determined through experimental means incorporates contributions from both klinokinesis and klinotaxis [1]. It is plausible that the impact of AIY ablation was not adequately reflected in the CI value. Consequently, the experimental observation does not necessarily diminish the role of AIY in klinotaxis. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. These findings provide substantial evidence supporting the validity of the presented minimal neural network responsible for salt klinotaxis.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This research focuses on C. elegans klinotaxis, a chemotactic behavior characterized by gradual turning, aiming to uncover the neural circuit mechanism responsible for the context-dependent reversal of salt concentration preference. The phenomenon observed is that the preferred salt concentration depends on the difference between the pre-assay cultivation conditions and the current environmental salt levels. 

      We would like to express our gratitude for the time and consideration you have dedicated to reviewing our manuscript.

      The authors propose that a synaptic-reversal plasticity mechanism at the primary sensory neuron, ASER, is critical for this memory- and context-dependent switching of preference. They build on prior findings regarding synaptic reversal between ASER and AIB, as well as the receptor composition of AIY neurons, to hypothesize that similar "plasticity" between ASER and AIY underpins salt preference behavior in klinotaxis. This plasticity differs conceptually from the classical one as it does not rely on any structural changes but rather synaptic transmission is modulated by the basal level of glutamate, and can switch from inhibitory to excitatory. 

      To test this hypothesis, the study employs a previously established neuroanatomically grounded model [4] and demonstrates that reversing the ASER-AIY synapse sign in the model agent reproduces the observed reversal in salt preference. The model is parameterized using a computational search technique (evolutionary algorithm) to optimize unknown electrophysiological parameters for chemotaxis performance. Experimental validity is ensured by incorporating constraints derived from published findings, confirming the plausibility of the proposed mechanism. 

      Finally. the circuit mechanism allowing C. elegans to switch behaviour to an exploration run when starved is also investigated. This extension highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      We would like to thank the reviewer for the appropriate summary of our work. 

      Strengths and weaknesses: 

      The authors' approach of integrating prior knowledge of receptor composition and synaptic reversal with the repurposing of a published neuroanatomical model [4] is a significant strength. This methodology not only ensures biological plausibility but also leverages a solid, reproducible modeling foundation to explore and test novel hypotheses effectively.

      The evidence produced that the original model has been successfully reproduced is convincing.

      The writing of the manuscript needs revision as it makes comprehension difficult.  

      We would like to thank the reviewer for recognizing the usefulness of our approach. In the revised version, we improved the explanation according to your suggestions.  

      One major weakness is that the model does not incorporate key findings that have emerged since the original model's publication in 2013, limiting the support for the proposed mechanism. In particular, ablation studies indicate that AIY is not critical for chemotaxis, and other interneurons may play partially overlapping roles in positive versus negative chemotaxis. These findings challenge the centrality of AIY and suggest the model oversimplifies the circuit involved in klinotaxis.

      We would like to express our gratitude for the constructive feedback we have received. We concur with some of your assertions. In fact, our model is the minimal network for salt klinotaxis, which includes solely the interneurons that are connected to each other via the highest number of synaptic connections. It is important to note that our model does not consider redundant interneurons that exhibit overlapping roles. Consequently, the model is not applicable to the study of the impact of interneuron ablation. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. The experimentally determined CI value incorporates the contributions from both klinokinesis and klinotaxis. Consequently, it is plausible that the impact of AIY ablation was not significantly reflected in the CI value. The experimental observation does not necessarily diminish the role of AIY in klinotaxis. 

      Reference [1] also shows that ASER neurons exhibit complex, memory- and context-dependent responses, which are not accounted for in the model and may have a significant impact on chemotactic model behaviour. 

      As the reviewer has noted, our model does not incorporate the context-dependent response of the ASER. Instead, the impact of the salt concentration-dependent glutamate release from the ASER [S. Hiroki et al. Nat Commun 13, 2928 (2022)] as the result of the ASER responses was in detail examined in the present study.

      The hypothesis of synaptic reversal between ASER and AIY is not explicitly modeled in terms of receptor-specific dynamics or glutamate basal levels. Instead, the ASER-to-AIY connection is predefined as inhibitory or excitatory in separate models. This approach limits the model's ability to test the full range of mechanisms hypothesized to drive behavioral switching.  

      We would like to express our gratitude to the reviewer for their constructive feedback. As you correctly noted, the hypothesized synaptic reversal between ASER and AIY is not explicitly modeled in terms of the sensitivity of the receptors in the AIY and the glutamate basal levels by the ASER. On the other hand, in the present study, under considering a substantial difference in the sensitivity of the two glutamate receptors on the AIY, we sought to endeavored to elucidate the impact of salt-concentration-dependent glutamate basal levels on klinotaxis. To this end, we conducted a comprehensive examination of the full range gradual change in the ASER-to-AIY connection from inhibitory to excitatory, as illustrated in Figures S4 and S5.

      While the main results - such as response dependence on step inputs at different phases of the oscillator - are consistent with those observed in chemotaxis models with explicit neural dynamics (e.g., Reference [2]), the lack of richer neural dynamics could overlook critical effects. For example, the authors highlight the influence of gap junctions on turning sensitivity but do not sufficiently analyze the underlying mechanisms driving these effects. The role of gap junctions in the model may be oversimplified because, as in the original model [4], the oscillator dynamics are not intrinsically generated by an oscillator circuit but are instead externally imposed via $z_¥text{osc}$. This simplification should be carefully considered when interpreting the contributions of specific connections to network dynamics. Lastly, the complex and contextdependent responses of ASER [1] might interact with circuit dynamics in ways that are not captured by the current simplified implementation. These simplifications could limit the model's ability to account for the interplay between sensory encoding and motor responses in C. elegans chemotaxis. 

      We might not understand the substance of your assertions. However, we understand that the oscillator dynamics were not intrinsically generated by the oscillator neural circuit that is explicitly incorporated into our modeling. On the other hand, the present study focuses on how the sensory input and resulting interneuron dynamics regulate the oscillatory behavior of SMB motor neurons to generate klinotaxis. The neuron dynamics via gap junctions results from the equilibration of the membrane potential yi of two neurons connected by gap junctions rather than the zi. We added this explanation in the revised manuscript as follows.

      “The hyperpolarization signals in the AIZL are transmitted to the AIZR via the gap junction (Figs. S1d and S1f and Fig. 3d). This is because the neuron dynamics via gap junctions results from the equilibration of the membrane potential y<sub>i</sub> of two neurons connected by gap junctions rather than the z<sub>i</sub>.”

      In the limitation, we added the following sentence:

      “In the present study, the oscillator components of the SMB are not intrinsically generated by an oscillator circuit but are instead externally imposed via 𝑧<sub>i</sub><sup>OSC</sup>. Furthermore, the complex and context-dependent responses of ASER {Luo:2014et} were not taken into consideration. It should be acknowledged as a limitation of this study that these omitted factors may interact with circuit dynamics in ways that are not captured by the current simplified implementation.”

      Appraisal: 

      The authors show that their model can reproduce memory-dependent reversal of preference in klinotaxis, demonstrating that the ASER-to-AIY synapse plays a key role in switching chemotactic preferences. By switching the ASER-AIY connection from excitatory to inhibitory they indeed show that salt preference reverses. They also show that the curving/turn rate underlying the preference change is gradual and depends on the weight between ASER-AIY. They further support their claim by showing that curving rates also depend on cultivated (set-point).  

      We would like to thank the reviewer for assessing our work.

      Thus within the constraints of the hypothesis and the framework, the model operates as expected and aligns with some experimental findings. However, significant omissions of key experimental evidence raise questions on whether the proposed neural mechanisms are sufficient for reversal in salt-preference chemotaxis.  

      We agree with your opinion. The present hypothesis should be verified by experiments.

      Previous work [1] has shown that individually ablating the AIZ or AIY interneurons has essentially no effect on the Chemotactic Index (CI) toward the set point ([1] Figure 6). Furthermore, in [1] the authors report that different postsynaptic neurons are required for movement above or below the set point. The manuscript should address how this evidence fits with their model by attempting similar ablations. It is possible that the CI is rescued by klinokinesis but this needs to be tested on an extension of this model to provide a more compelling argument.  

      We would like to express our gratitude for the constructive feedback we have received. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. It is important to acknowledge that the experimentally determined CI value encompasses the contributions of both klinokinesis and klinotaxis. It is plausible that the impact of AIY ablation was not reflected in the CI value. Consequently, these experimental observations do not necessarily diminish the role of AIY in klinotaxis. The neural circuit model employed in the present study constitutes a minimal network for salt klinotaxis, encompassing solely interneurons that are connected to each other via the highest number of synaptic connections. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/cceptool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. Our model does not take into account redundant interneurons with overlapping roles, thus rendering it not applicable to the study of the effects of interneuron ablation.

      The investigation of dispersal behaviour in starved individuals is rather limited to testing by imposing inhibition of the SMB neurons. Although a circuit is proposed for how hunger states modulate taxis in the absence of food, this circuit hypothesis is not explicitly modelled to test the theory or provide novel insights.  

      As the reviewer noted, the experimentally identified neural circuit that inhibits the SMB motor neurons in starved individuals is not incorporated in our model. Instead of incorporating this circuit explicitly, we examined whether our minimal network model could reproduce dispersal behavior under starvation conditions solely due to the experimentally demonstrated inhibitory effect of SMB motor neurons.

      Impact: 

      This research underscores the value of an embodied approach to understanding chemotaxis, addressing an important memory mechanism that enables adaptive behavior in the sensorimotor circuits supporting C. elegans chemotaxis. The principle of operation - the dependence of motor responses to sensory inputs on the phase of oscillation - appears to be a convergent solution to taxis. Similar mechanisms have been proposed in Drosophila larvae chemotaxis [2], zebrafish phototaxis [3], and other systems. Consequently, the proposed mechanism has broader implications for understanding how adaptive behaviors are embedded within sensorimotor systems and how experience shapes these circuits across species.

      We would like to express our gratitude for useful suggestion. We added this argument in Discussion of the revised manuscript as follows.    

      “The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, appears to be a convergent solution for taxis and navigation across species. In fact, analogous mechanisms have been postulated in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. Consequently, the synaptic reversal mechanism highlighted in this study offers the framework for understanding how the behaviors that are adaptive to the environment are embedded within sensorimotor systems and how experience shapes these neural circuits across species.”

      Although the reported reversal of synaptic connection from excitatory to inhibitory is an exciting phenomenon of broad interest, it is not entirely new, as the authors acknowledge similar reversals have been reported in ASER-to-AIB signaling for klinokinesis ( Hiroki et al., 2022). The proposed reversal of the ASER-to-AIY synaptic connection from inhibitory to excitatory is a novel contribution in the specific context of klinotaxis. While the ASER's role in gradient sensing and memory encoding has been previously identified, the current paper mechanistically models these processes, introducing a hypothesis for synaptic plasticity as the basis for bidirectional salt preference in klinotaxis.  

      The research also highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      The methodology of parameter search on a neural model of a connectome used here yielded the valuable insight that connectome information alone does not provide enough constraints to reproduce the neural circuits for behaviour. It demonstrates that additional neurophysiological constraints are required.  

      We would like to acknowledge the appropriate recognition of our work.

      Additional Context 

      Oscillators with stimulus-driven perturbations appear to be a convergent solution for taxis and navigation across species. Similar mechanisms have been studied in zebrafish phototaxis [3], Drosophila larvae chemotaxis [2], and have even been proposed to underlie search runs in ants. The modulation of taxis by context and memory is a ubiquitous requirement, with parallels across species. For example, Drosophila larvae modulate taxis based on current food availability and predicted rewards associated with odors, though the underlying mechanism remains elusive. The synaptic reversal mechanism highlighted in this study offers a compelling framework for understanding how taxis circuits integrate context-related memory retrieval more broadly.  

      We would like to express our gratitude for the insightful commentary. In the revised manuscript, we incorporated the argument that the similar oscillator mechanism with stimulus-driven perturbations has been observed for zebrafish phototaxis [3] and Drosophila larvae chemotaxis [2] into Discussion.

      As a side note, an interesting difference emerges when comparing C. elegans and Drosophila larvae chemotaxis. In Drosophila larvae, oscillatory mechanisms are hypothesized to underlie all chemotactic reorientations, ranging from large turns to smaller directional biases (weathervaning). By contrast, in C. elegans, weathervaning and pirouettes are treated as distinct strategies, often attributed to separate neural mechanisms. This raises the possibility that their motor execution could share a common oscillator-based framework. Re-examining their overlap might reveal deeper insights into the neural principles underlying these maneuvers. 

      We would like to acknowledge your thoughtfully articulated comment. As the reviewer pointed out, the anatomical database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) shows that that the neural circuits underlying weathervaning and pirouettes in C. elegans are predominantly distinct but exhibit partial overlap. When we restrict our search to the neurons that are connected to each other with the highest number of synaptic connections, we identify the projections from the neural circuit of weathervaning to the circuit of pirouettes; however we observed no reversal projections. This finding suggests that the neural circuit of weathervaning, namely, our minimal neural network, is not likely to be affected by that of pirouettes, which consists of AIB interneurons and interneurons and motor neurons the downstream. 

      (1) Luo, L., Wen, Q., Ren, J., Hendricks, M., Gershow, M., Qin, Y., Greenwood, J., Soucy, E.R., Klein, M., Smith-Parker, H.K., & Calvo, A.C. (2014). Dynamic encoding of perception, memory, and movement in a C. elegans chemotaxis circuit. Neuron, 82(5), 1115-1128. 

      (2) Antoine Wystrach, Konstantinos Lagogiannis, Barbara Webb (2016) Continuous lateral oscillations as a core mechanism for taxis in Drosophila larvae eLife 5:e15504. 

      (3) Wolf, S., Dubreuil, A.M., Bertoni, T. et al. Sensorimotor computation underlying phototaxis in zebrafish. Nat Commun 8, 651 (2017). 

      (4) Izquierdo, E.J. and Beer, R.D., 2013. Connecting a connectome to behavior: an ensemble of neuroanatomical models of C. elegans klinotaxis. PLoS computational biology, 9(2), p.e1002890. 

      Reviewer #2 (Public review): 

      Summary: 

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.  

      We would like to express our gratitude for the time and consideration the reviewer has dedicated to reviewing our manuscript.

      Strengths: 

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      We would like to thank the reviewer for appreciating our research. 

      Weaknesses: 

      While the model successfully replicates some behaviors observed in previous experiments, many key assumptions lack direct biological validation. As to the model output readouts, the model considers only endpoint behaviors (chemotaxis index) rather than the full dynamics of navigation, which limits its predictive power. Moreover, some results presented in the paper lack interpretation, and many descriptions in the main text are overly technical and require clearer definitions.  

      We would like to thank the reviewer for the constructive feedback. As the reviewer noted, the fundamental assumptions posited in the study have yet to be substantiated by biological validation, and consequently, these assumptions must be directly assessed by biological experimentation. The model performance for salt klinotaxis has been evaluated by multiple factors, including not only a chemotaxis index but also the curving rate vs. bearing (Fig. 4a, the bearing is defined in Fig. A3) and the curving rate vs. normal gradient (Fig. 4c). These two parameters work to characterize the trajectory during salt klinotaxis. In the revised version, we meticulously revised the manuscript according to the reviewer’s suggestions. We would like to express our sincere gratitude for your insightful review of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      An interesting and engaging methodology combining theoretical and computational approaches. Overall I found the manuscript up to discussion a difficult read, and I would suggest revising it. I would also recommend introducing the general operating principle of the oscillator with sensory perturbations before jumping into the implementation details of signal propagation specific to C.

      elegans.  

      In order to elucidate the relation between the general operating principle of the oscillator with sensory perturbations and the results shown by the two graphs from the bottom in Fig. 3d, the following statement was added on page 12.

      “It is remarkable that this regulatory mechanism derived via the optimization of the CI has been observed in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, therefore, may serve as a convergent solution for taxis and navigation across species.”

      The abstract could benefit from a clarification of terms to benefit a broader audience:  The term "salt klinotaxis" is used without prior introduction or definition. It would be beneficial to briefly explain this term, as it may not be familiar to all readers. 

      Due to the limitation of the word number in the abstract, the explanation of salt klinotaxis could not be included.

      Although ASER is introduced as a right-side head sensory neuron, AIY neurons are not similarly introduced. It may also benefit to introduce here that ASER integrates memory with current salt gradients, tuning its output to produce context-appropriate behaviour.  

      Due to the limitation of the word number in the abstract, we could add no more the explanations. 

      "it can be anticipated that the ASER-AIY synaptic transmission will undergo a reversal due to alterations in the basal glutamate Release": Where is this expectation drawn from? Is it derived from biophysical or is it a functional expectation to explain the network's output constraints?  

      As delineated before this sentence, it is derived from a comprehensive consideration of the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons, in conjunction with varying the basal level of glutamate transmission from ASER.

      The statement that the model "revealed the modular neural circuit function downstream of ASE" could be more explicit. What specific insights about the downstream circuit were uncovered?

      Highlighting one or two key findings would strengthen the impact.  

      Due to the limitation of the word number in the abstract, no more details could be added here, while the sentence was revised as “revealed that the circuit downstream of ASE functions as a module that is responsible for salt klinotaxis.” This is because the salt-concentration dependent behaviors in klinitaxis can be reproduced through the modulation of the ASRE-AIY synaptic connections alone, despite the absence of alterations in the neural circuit downstream of AIY.

      I believe the authors should cite Luo et al. 2014, which also studies how chemotactic behaviours arise from neural circuit dynamics, including the dynamic encoding of salt concentration by ASER, and the crucial downstream interaction with AIY for chemotactic actions. 

      We would like to express our gratitude for useful suggestion. We cited Luo et al. 2014 in the discussion on the limitation of our work. 

      The introduction could also be improved for clarity. Specifically in the last paragraph authors should clarify how the observed synchrony of ASER excitation to the AIZ (Matsumoto et al., 2024), validates the resulting network.  

      We would like to express our gratitude for useful suggestion. We added the following explanation in the last paragraph of the introduction.

      “Specifically, the synchrony of the excitation of the ASER and AIZ {Matsumoto:2024ig} taken together with the experimentally identified inhibitory synaptic transmission between the AIY and AIZ revealed that the ASER-AIY synaptic connections should be inhibitory, which was consistent with the network obtained from the most evolved model.”

      In addition, we added the following explanation after “It was then hypothesized that the ASER-AIY inhibitory synaptic connections are altered to become excitatory due to a decrease in the baseline release of glutamate from the ASER when individuals are cultured under C<sub>cult</sub> < C<sub>test</sub>.”

      This is due to the substantial difference in the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons.

      I would also strongly recommend replacing the term "evolved model", with "Optimized Model" or "Best-Performing Model" to clarify this is a computational optimization process with limitations - optimization through GAs does not guarantee finding global optima.  

      We revised "evolved model" as "optimized model" in the main and SI text.

      The text overall would benefit from editing for clarity and expression.  

      According to the revisions mentioned above, we revised “best optimized model” as “most optimized model” in the main and SI text.

      The font size on the plot axis in Figures 3 c&d should be increased for readability on the printed page. Label the left/right panel to indicate unconstrained / constrained evolution.  

      As you noted, the font size of the subscript on the vertical axis in Figs 3c and 3d was too small. We have revised the font size of the subscript in Figs. 3c and 3d and also in Fig. 5e. At your suggestion, “unconstrained” and “constrained” have been added as labels to the left and right panels in Fig. 3.

      There is no input/transmission to AIYR to step input in either model shown in Figure 3? 

      As shown in Fig. S1e and S1f, there are the transmissions to the AIYR from the ASEL and ASER. 

      Supplementary Figure 1 attempts to explain the interactions. There are inconsistent symbols used for inhibition and excitation between network schema (colours) and the z response plots (arrows vs circles), combined with different meanings for red/blue making it very confusing. 

      We could not address the inconsistency in the color of arrows and lines with an ending between Figs. S1c and S1d and Figs. S1a and S1b. On the other hand, Figs. S1e and S1f were revised so that the consistent symbols were used for inhibition, excitation, and electrical gap connections in Figs. S1c-S1f. The same revisions were made for Fig. S7c-S7f.

      Model parameters are given to 15 decimal precision, which seems excessive. Is model performance sensitive to that order? We would expect robustness around those values. The authors should identify relevant orders and truncate parameters accordingly. 

      We examined the influence of the parameter truncation on the trajectory and decided that the parameters with four decimal places were appropriate. According to this, we revised Table A4.

      Figure 3 caption typo "step changes I the salt concentration".  

      The typo was revised in Fig. 3 caption. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Overall, the language of the paper is not properly organized, making the paper's logic and purpose hard to follow. In the Results Section, many observations or findings lack explicit interpretation. To address this issue, the authors should consider (1) adopting the contextcontent-conclusion scheme, (2) optimizing the logic flow by clearly identifying the context and goals prior to discussing their results and findings, (3) more explicitly interpreting their results, especially in a biological context.  

      We would like to express our gratitude for helpful suggestion. According to your suggestion listed below, we revised the main and SI texts.

      (2) In Figure 2, trajectories from the model with AIY-AIZ constraints show a faster convergence than those from the constraint-free model. However, in the corresponding texts in the Results section, the authors claimed no significant difference. It seems that the authors made this argument only based on CI (Chemotaxis Index). Therefore, in order to address such inconsistency, the authors need more explanation on why only relying on CI, which is an endpoint metric, instead of the whole navigation.  

      I would like to thank you for the helpful comment. In the present study, not only the CI but also the curving rate shown in Fig. 4 were applied to characterize the behavior in klinotaxis.

      According to your comments, we revised the related description in the main text as follows:

      “The difference between these CI values is slight, while the model optimized with the constraints exhibits a marginally accelerated attainment of the salt concentration peak, as shown by the trajectories. The slightly higher chemotaxis performance observed in the constrained model is not essentially attributed to the introduction of the AIY-AIZ synaptic constraints but rather depends on the specific individuals selected from the optimized individuals obtained from the evolutionary algorithm. In fact, even when the AIY-AIZ constraints are taken into consideration, the model retains a significant degree of freedom to reproduce salt klinotaxis due to the presence of a substantial parameter space. Consequently, the impact of the AIY-AIZ constraints on the optimization of the CI is expected to be negligible.”

      (3) In Figures 3a and b, some inter-neuron connections are relatively weak (e.g., AIYR to AIZR in Figure 3a) - thus it is unclear whether the polarity of such synapses would significantly influence the behavioral outcome or not. The authors could consider plotting the change of the connection strengths between neurons over the course of model optimization to get a sense of confidence in each inter-neuron connection. 

      In the evolutional algorithm, the parameters of individuals are subject to discontinuous variation due to the influence of selection, crossover, and mutations. Consequently, it is not straightforward to extract information regarding parameter optimization from parameter changes due to the non-systematic nature of parameter variation..

      (4) In Figure 3, the order of individual figure panels is incorrect: in the main text, Figure 3 a and b were mentioned after c and d. Also, the caption of Figure 3c "negative step changes I the" should be "in".  

      The main text underwent revision, with the description of Figures 3a and 3b being presented prior to that of Figures 3c and 3d. The typo was revised.

      (5) In Figure 4, the order of individual figure panels is messed up: in the main text, Figure 4 a was mentioned after b.  

      The main text underwent revision, with the description of Figure 4a being presented prior to that of Figure 4b.

      (6) Also in Figure 4, the authors need to provide a definition/explanation of "Bearing" and "Translational Gradient". In Figure 4d, the definition of positive and negative components is not clear.  

      Normal and Translational Salt Concentration Gradient in METHOD was referenced for the definition and explanation of the bearing and the translational gradient. We added the following explanation on the positive and negative components.

      “The positive and negative components of the curving rate are respectively sampled from the trajectory during leftward turns (as illustrated in Fig. 4b) and rightward turns, respectively.”

      (7) Figure 5: the authors need to explain why c has an error bar and how they were calculated, as this result is from a computational model. Figure 5d is experimental results - the authors need to add error bars to the data points and provide a sample size. 

      As explained in Analysis of the Salt Preference Behavior in Klinotaxis in METHOD, the ensemble average of these quantities was determined by performing 100,000 sets of the simulation with randomized initial orientation for a simulation time of T_sim=200 sec. The error bars for the experimental data were added in Figs. 5c, 6a, and S9a.

      (8) On Page 14, the authors said, "To this end, this end, we used the best evolved network with the constraints, in which we varied the synaptic connections between ASER and AIY from inhibitory to excitatory." How did the model change the ASER-AIY signaling specifically? The authors should provide more explanation or at least refer to the Methods Section.  

      The caption of Fig. S4 was referred as the explanation on the detailed method. 

      (9) Page 15: "a subset a subset exhibited a slight curve...". This observation from the model simulation is contradictory to experiments. However, their explanation of that is hard to understand.  

      I would like to thank you for the helpful comment. To improve this, we added the following explanation:

      “In the case of step increases in 𝑧OFF as illustrated in the second right panel from the bottom in Fig.3d, the turning angle φ is increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward lower salt concentrations. On the other hand, in the case of step increases in 𝑧ON as illustrated in the second left panel from the bottom in Fig.3d, the turning angle φ is again increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward higher salt concentrations. The behaviors that are consistent with these analyses are observed in the trajectory illustrated in Fig. S8b.”

      (10) Last result session: inhibited SMB in starved worms is due to a mechanism unrelated to their neural network model upstream to SMB. Therefore, their results recapitulating the worms' dispersal behaviors cannot strengthen the validity of their model.  

      We agree with your opinion. We think that the findings from the study of starved worms do not provide evidence to validate the neural network model upstream of SMB.   

      (11) Discussion: "in contrast, the remaining neurons...". This argument lacks evidence or references.  

      This argument is based on the results obtained from the present study. This sentence was revised as follows:

      “This regulatory process enables the reproduction of salt concentration memory-dependent reversal of preference behavior in klinotaxis, despite the remaining neurons further downstream of the ASER not undergoing alterations and simply functioning as a modular circuit to transmit the received signals to the motor systems. Consequently, the sensorimotor circuit allows a simple and efficient bidirectional regulation of salt preference behavior in klinotaxis.”

      (12) To increase the predictive power of their model, can the authors perform simulations on mutant worms, like those with altered glutamate basal level expression in ASER?  

      We would like to express our gratitude for useful suggestion. The simulations, in which the weight of the ASER-AIY synaptic connection is increased from negative (inhibitory connection) to positive (excitatory connection), as illustrated in Figure S4, provide valuable insights into the relationship between varying glutamate basal levels from ASER and behavior in klinotaxis, such as the chemotaxis index.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABA<sub>A</sub>R α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA<sup>-/-</sup> mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1<sup>-/-</sup> brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABA<sub>A</sub>R clustering phenotype in EndoA1<sup>-/-</sup> neurons by surface GABA<sub>A</sub>R γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For all of the electrophysiology experiments, only the number of neurons recorded is stated, but not the number of independent animals that these neurons were obtained from. The number of independent animals used should be stated for each panel. At least 3 independent animals should be used in each group, otherwise, more data needs to be added.

      We apologize for missing the information in the original manuscript. For all electrophysiological experiments, data were obtained from more than 3 experimental animals. The figure legends were updated to include the number of independent animals used for each panel.

      (2) For the cell culture experiments analyzing dendritic puncta at GABAergic synapses, the number of data points analysed appears to be the number of dendritic segments quantified, regardless of whether they originate from the same neuron or not. This analysis method is not valid, since dendritic segments from the same neuron cannot be counted as statistically independent samples. The authors need to average the values for all dendritic segments from one neuron, such that one neuron equals one data point. This alteration should be made for Figures 2B, 2D, 4H, 4J, 5B, 5C, 5E, 5J, 5L, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D. In addition, the number of independent cultures from which the neurons were obtained should be stated for each panel. At least 3 independent cultures should be used in each group, otherwise, more data need to be added.

      Thanks for the criticism. We reanalyzed the data throughout the manuscript as suggested and updated the figure legends accordingly. Moreover, we increased the number of neurons from independent experiments to further confirm the results in our revised manuscript.

      In the revised manuscript, we averaged the values for all dendritic segments from a single neuron and updated the data in Figure 3B, 3D, 4H, 4J, 5B, 5C, 5E, 5K, 5M, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D.

      Neurons analyzed in each group were derived from at least 3 independent cultures. Due to very low efficiency of sparse transfection in primary cultured hippocampal neurons, multiple experimental repetitions were necessary to obtain the sufficient number of neurons for analysis. We described statistical analysis in “Material and Methods” section in the original manuscript as follows:

      “For all biochemical, cell biological and electrophysiological recordings, at least three independent experiments were performed (independent cultures, transfections or different mice).”

      (3) Individual data points should be shown on all graphs, particularly in Figures 2C, 2F, 2I, 3F, 3K, and 3L.

      Thank you for the suggestion. We replaced the original graphs with scatterplots and mean ± S.E.M. in new Figures.

      (4) For each experiment, the authors should state explicitly in the methods section whether that experiment was conducted blind to genotype.

      Thank you for the suggestion. We have modified the description of blind analysis for each experiment in methods section to “Seizure susceptibility was measured blindly by rating seizures on a scale of 0 to 7 as follows…”, “Quantification of immunostaining were carried out blindly…” in our revised manuscript.

      (5) For each experiment, the authors should state whether they used male or female mice, and what age the mice were at the time of the experiment

      Thanks a lot for the suggestion. We usually use male and female mice for neuron culture and behavioral test. We observed no sex-related differences in PTZ-induced behaviors, so the results were pooled together.

      For mice ages, P0 pups were used for hippocampal neuron cultures and virus injection in electrophysiological recording assays or FingR probes assays. P14-21 mice were used for electrophysiological recording, immunofluorescent staining and FingR probes detection in brain slice, while adult mice (P60) for behavioral tests, immunofluorescent staining in brain slice and biochemical assays. We have modified the description in genders and ages of mice in methods section to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates were intraperitoneally administered… ”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “Hippocampi of female or male pups (P0) were rapidly dissected under sterile conditions…”, “PSD fractions from adult mouse brain were prepared as previously described…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…” in our revised manuscript.

      (6) For each experiment involving WT and KO mice, please state whether WTs and KOs were bred as littermates from heterozygous breeders

      Sorry for the confusion. In our study, EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders. We added the information in methods section as follows in our revised manuscript, “EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders…”, “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates…”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “For co-IP from brain lysates, the whole brain from 8-10-week-old WT and KO littermates were dissected…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…”.

      (7) For experiments comparing three or more groups, the authors claim in the methods section to have used a one-way ANOVA for statistical analysis. However, no ANOVA values are given, only the post-hoc tests. Please add the ANOVA values for each experiment before stating the values of the post-hoc analysis.

      Sorry for the missing information. We used one-way ANOVA for comparing three or more groups in the original manuscript and have changed to two-way ANOVA for behavior data analysis in our revised manuscript as suggested in Recommendations (18). We added the ANOVA values (F & p values) for each experiment in new figures. For example, see Figure 1C.

      (8) In Figure 1A-C, seizure susceptibility was compared in EEN+/+ and EEN-/- mice, but the methods section states that seizure susceptibility was evaluated in 8-10-week-old male C57BL/6N mice (line 513). Was this meant to indicate that the EEN+/+ and EEN-/- mice were on a C57BL/6N background? How does this match with the statement that EEN1 -/- mice were generated on a C57BL/6J background (line 467)?

      We apologize for the mistake. In our study, EEN1<sup>-/-</sup> mice were generated on a C57BL/6J background, as stated in our previously published papers (Yang et al., 2021; Yang et al., 2018) and in “Animals” in Material and Methods of our original manuscript. We had corrected the statement to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates…” in Material and Methods of the revised manuscript.

      (9) In the electrophysiology experiments in Figure 1E-O, it is not clear to me which neurons were recorded in the control group. The methods section states that "Whole-cell recordings were performed on an AAV-infected neuron and a neighboring uninfected neuron" (line 736). However, the figure legends states that recordings were obtained from "10 control (Ctrl, mCherry alone) and 10 EEN1 KO (mCherry and Cre) pyramidal neurons" (line 1079), which would indicate that the controls are not uninfected neurons from the same animal, but AAV-mCherry infected neurons from a different animal. Please clarify which of the two descriptions is accurate.

      Thanks for catching the error! In all electrophysiological experiments, a neighboring uninfected neuron was used as the control in Figure 1E-O. This was incorrectly stated in the figure legend of the original manuscript. In the revised manuscript, the information has been corrected in figure legends of new Figure 1 (E-F).

      (10) The authors show that in Endophilin A1 KO animals, eIPSCs are reduced, but mIPSC frequency and amplitude are unaltered. How do they explain this finding in the context of the fact that gephyrin and GABAAR1.

      We apologize for the confusion about the data of electrophysiological recording. Compared with eIPSC, which are recorded in the presence of electrically evoked action potential that elicited a substantial release of neurotransmitter, mIPSCs are small, spontaneous currents recorded in the presence of TTX during patch-clamp experiments, resulting from the release of neurotransmitters from presynaptic terminals in the absence of action potential. The amplitude of mIPSCs typically reflects the quantal release of neurotransmitters, while their frequency can vary depending on synaptic activity and the state of the neuron.

      A number of molecules fine-tune presynaptic neurotransmitter release and functions of inhibitory postsynaptic receptors. In our study, inhibitory postsynapses were partially affected in endophilin A1 knockout neurons, while presynaptic endophilin A1 remained intact during electrophysiological recordings. Conceivably, the observed deficits in endophilin A1 knockout mice were mild. Following endophilin A1 depletion, inhibitory postsynaptic receptors appeared sufficient to respond to spontaneous neurotransmitter release but may be inadequate to large amounts of neurotransmitter release evoked by action potential. Meanwhile, spontaneous synaptic activity and the state of the neuron were not obviously affected under basic state by endophilin A1 depletion during postnatal stages. Consequently, mIPSC frequency and amplitude remain unaltered but eIPSCs were reduced compared to the control neurons. This finding was consistent with behavioral experiments, where aggressive epileptic behaviors were induced by PTZ rather than spontaneous epilepsy in endophilin A1 knockout mice.

      (11) Distribution of gephyrin, VGAT, and GABAARg2 differs substantially between the different layers of hippocampal area CA1, and the same goes for the other regions of the hippocampus. However, in Figure 2, it is not clear to me from the sample images which layers of each subregion the authors quantified, or indeed whether they paid attention to which layers they included in their analysis. This can lead to a substantial skewing of the data if different layers were preferentially included in the two genotypes. Please clarify which layers were analysed, and how comparability between WTs and KOs was ensured. This is particularly important given the authors' claim that Endophilin A1 acts equally at all subtypes of GABAergic synapses (lines 373- 376).

      Thanks for the cautiousness! We distinguished each hippocampal subregion based on the anatomical structure in brain slices. Quantification of fluorescent mean intensity of each synaptic protein in all layers of each subregion, as shown in new Figure 2 and Figure S2A-F, revealed that GABAergic synaptic proteins were impaired in both P21 and P60 KO mice.

      We further analyzed the fluorescent signal of core postsynaptic component, gephyrin, in individual layers of each subregion in the hippocampus of mature WT and KO mice, as presented in new Figures S2G-H. Our findings demonstrated a decrease in gephyrin levels across all layers of each subregion in KO mice. Additionally, we examined gephyrin clustering across the soma, axon initial segment (AIS), and dendrites in cultured mature endophilin A1 knockout hippocampal neurons, as shown in new Figure S5E-H. The results showed that gephyrin was affected in all subcellular regions following endophilin A1 knockout.

      Collectively, these data suggest that endophilin A1 functions across all subtypes of GABAergic postsynapses.

      (12) In Figure 3E-F, the authors state that there was no change in the total level of synaptic neurons in EEN1 KO neurons (line 188). However, there is no quantification of the total level of synaptic neurons shown, and based on the immunoblot in Figure 3E, it looks like there is a substantial reduction in NR1, NL2, and g2. The authors should present a quantification of the total levels of these proteins and adjust their statement accordingly if necessary.

      Thanks a lot for your comments. We quantified the total protein levels in Figure 3E and added the result to new Figure 3F, showing that total protein levels were not obviously affected in cultured KO neurons. When normalized to total protein levels, the surface levels of GABA<sub>A</sub> receptors were significantly compromised compared to surface GluN1 and NL2. Furthermore, the total protein levels were not affected in brains of KO mice, as shown in Figures 3K (input) and 3L (S1). Collectively, there was no change in the total level of synaptic proteins in KO neurons.

      (13) In Figure 3G-I, the authors claim, based on super-resolution images as presented here, that Endophilin A1 colocalizes with gephyrin and g2. However, no quantification of this colocalization is presented. The authors should add this quantification to support their claim and indicate how many GABAergic synapses contain Endophilin A1.

      Thank you for the thoughtful comments. The resolution of the images is significantly improved by super-resolution microscopy. As a result, the overlap between the two proteins will become smaller or even disappear. Since no two proteins can occupy the same physical space, they would show lower colocalization and instead exhibit proximal localization. As expected, in Figures 3G and 3H, we observed only small overlap or proximal localization of endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2. To further confirm the localization of endophilin A1 in inhibitory synapses, we co-stained endophilin A1 with both pre- and post-synaptic proteins, gephyrin and Bassoon. Then we quantified the colocalization of endophilin A1 with gephyrin or with Bassoon using the method for super-resolution images described in the reference (Andrew D. McCall. Colocalization by cross-correlation, a new method of colocalization suited for super-resolution microscopy. McCall BMC Bioinformatics (2024) 25:55). The percentage of gephyrin or Bassoon puncta that were in close proximity with endophilin A1 was also calculated, as shown in new video 5 and new Figure S4B-G. These data have been added in the revised manuscript as follows, “We further detected the localization of endophilin A1 to inhibitory synapses by co-immunostaining with both pre- and post-synaptic markers (Figure. S4B and Video 5). Quantitative analysis of super-resolution localization maps revealed that ~ 47 % puncta of gephyrin or Bassoon were proximal to endophilin A1 (Figure. S4G, n \= 14), with a mean distance between endophilin A1- and gephyrin-positive pixels of ∼ 120 nm, or between endophilin A1- and Bassoon-positive pixels of ∼ 130 nm (Figure. S4C-F).”

      (14) In the quantification shown in Figure 3K-L, there are no error bars in the WT data sets. This presumably means that all values were normalized to WT. However, since this artificially eliminates the variance in the WT group, a t-test is no longer valid, since this assumes a normal distribution and normal variance, which are no longer given. The authors should either change the way they normalize their data to maintain the variance in the WT group or perform a different statistical test that can account for the artificial lack of variance in one of the groups.

      Thank you for the suggestions! We modified our analysis approach. Specifically, we used mean value of WTs to normalize data to preserve the variance in the WT group and performed unpaired t-tests to assess statistical significance in Figure 3K-L. Additionally, we replaced the bar graphs with modified graphs showing individual data points. Please see Response to Recommendation (12).

      (15) What is the difference between the coIP experiment in Figure 4E and 3J, right panel? In both cases, an Endophilin A1 IP is performed, and gephyrin, GABAARg2, and GABAARa1 are assessed. However, Figure 3J's right panel indicates that Endophilin A1 does interact with the GABAAR subunits, whereas Figure 4E shows that it does not. How do the authors explain this discrepancy? Were these experiments performed more than once?

      Sorry for the confusion. Figure 3J and Figure 4E show data from immunoisolation assay and conventional co-immunoprecipitation (co-IP), respectively. Immunoisolation allows for the rapid and efficient separation of subcellular membrane compartments using antibodies conjugated to magnetic beads. In Figure 3J, we used antibodies against GABA<sub>A</sub>R α1 subunit or endophilin A1 to isolate the inhibitory postsynaptic membranes or endophilin A1-associated membranous compartments. In contrast, co-immunoprecipitation detects direct protein-protein interactions in detergent-solubilized lysates. For Figure 4E, we applied antibodies against endophilin A1 to precipitate its interaction partners. The results in Figure 3J and Figure 4E demonstrate that endophilin A1 is localized in the inhibitory postsynaptic compartment and directly interacts with gephyrin, but not with GABA<sub>A</sub>Rs. Detailed information regarding the methods used for co-IP and immunoisolation can be found in “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Material and Methods” section of original manuscript.

      These experiments were repeated multiple times to ensure reliability. In fact, consistent data showing endophilin A1 localization in the inhibitory postsynaptic compartment were observed in Figure 3K, showing the quantified data as well.

      (16) For the colocalization analysis in Figure 5A-C, what percentage of gephyrin puncta contain g2 in the WT and Endophilin A1 KO? Currently, only a correlation coefficient is provided, but not the degree of overlap. Please add this information to the figure.

      Thanks for the comments on the colocalization analysis. We analyzed the percentage of gephyrin puncta overlapping with GABA<sub>A</sub>R γ2 and added the graphs in new Figure 5C.

      (17) Figure 6 investigates how actin depolarization affects GABAergic synapse function, but does not assess how Endophilin A1 contributes to this process. The authors then provide an extremely short statement in the discussion, stating that their data are contradictory to a previous study (lines 412 - 417). This section of the discussion should be expanded to address the specific role of Endophilin A1 in the consequences of actin depolymerization.

      Thanks a lot for the advice. In the original manuscript, we discussed the specific role of endophilin A1 at inhibitory postsynapses as follows in Discussion:

      “As membrane-binding and actin polymerization-promoting activities of endophilin A1 are both required for its function in enhancing iPSD formation and g2–containing GABA<sub>A</sub>R clustering to iPSD, we propose that membrane-bound endophilin A1 promotes postsynaptic assembly by coordinating the plasma membrane tethering of the postsynaptic protein complex and its stabilization with the actin cytomatrix”

      Following your advice, we added a statement in the revised manuscript addressing the role of endophilin A1 in actin polymerization at inhibitory postsynapses, shown as follows, “In the present study, the impaired clustering of gephyrin and GABA<sub>A</sub> γ2 by F-actin depolymerization underscores the essential role of F-actin in the assembly and stabilization of the inhibitory postsynaptic machinery. Membrane-bound endophilin A1 promotes F-actin polymerization beneath the plasma membrane through its interaction with p140Cap, an F-actin regulatory protein, thereby facilitating and/or stabilizing the clustering of gephyrin and γ2-containing GABA<sub>A</sub> ​receptors at postsynapses.”

      (18) Which statistical analysis was conducted in Figure 7F? Given the nature of the data, a repeated measures ANOVA would be necessary to accurately assess the statistical accuracy.

      Sorry for the confusion. We conducted one-way ANOVA followed by Tukey post hoc test at each time point in original Figure 7F. We have employed the method of repeated measures ANOVA followed by Tukey post hoc test as suggested in new Figure 7F. Meanwhile, we reanalyzed data in new Figure 1C with the same method. We also modified the description in “Statistical analysis” and Figure legends for new Figure1C and 7F in revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Data presentation:

      (1) Figures 2A, B, D, E, G, H. Figures S2A, B, D:

      Add P21 or P60 labels to these figures so that the difference between similarly stained samples (e.g. Figures 2A, B) is obvious to the reader.

      Thanks! We added “P21” or “P60” labels in new Figure 2 and Figure S2 as suggested.

      (2) Figures 4C, D:

      The authors must make their coIP data annotation consistent. In Figure 4C, they use actual microgram amounts when, e.g., describing how much input was present, yet in Figure 4D they use + and -. The authors should pick one.

      Thanks for the comments. We labeled the consistent data annotation in new Figure 4C and 4D, we also changed the label in 4F for the consistent data annotation.

      (3) Figure 5A

      GFP is gray in this figure, but in all other figures, it is blue. Consider changing for presentation reasons.

      Thanks a lot for pointing out the problem. We replaced gray with blue color to indicate GFP in new Figure 5A.

      (4) Figures 6A, C, E, G

      Label graphs as either short-term or long-term drug treatment.

      Thanks for the suggestion. We labeled the graphs as 60 min for short-term or 120 min for long-term drug treatment in new Figure 6A, C, E, G for convenient reading.

      Annotation, grammar, spelling, typing errors:

      (1) Figure 4G:

      Merge and GFP labels are seemingly swapped.

      Thanks a lot for sharp eye. We corrected the labels in new Figure 4G.

      (2) Fig 4I:

      The authors use "Gephryin" instead of GPN. They should be consistent and choose one.

      Sorry for the mistake. We changed the label consistent with other figures in new Figure 4I and rearranged the images in figures for good looking.

      (3) "One-hour or two-hour treatment of mature neurons with nocodazole..."

      Thanks for your advice. We modified the sentence to “Treatment of mature neurons with nocodazole, a microtubule depolymerizing reagent, for one hour (short-term) or two hours (long-term), caused…”.

      (4) The authors should indicate that one-hour is their short-term treatment and that two-hour is their long-term treatment so that when these terms are used later to describe LatA experiments, it is clearer to the reader.

      Thanks for your comments. We modified the statement as seen in Response to Recommendation (3), it is clearer to the reader.

      (5) EEA1. The authors should use a more conventional term EndoA1 so that the manuscript can be searched easily.

      Thanks a lot for the suggestion. We replaced all of the term “EEN1” with “EndoA1” in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Major Points

      (1) The number of observations for the electrophysiology experiments in Figure 1 (dots are neurons) is very low and it is not clear whether the data shown is derived from different mice. The same criticism applies to the data shown in Figures 7G-K.

      We apologize for the low neuron number in electrophysiology experiments. In the patch-clamp experiments, the number of neurons recorded was higher than what is shown in the figures. However, neurons with a membrane resistance (Rm) below 500 MΩ, indicating unstable seals or poor conditions, were excluded from the analysis. Additionally, we added the number of mice from which the data derived in each group in the figure legends for Figure 1, 7 and S1, this point was also raised by Reviewer #1 (Please see Response to Recommendation (1)).

      (2) Images in Figure 2 are shown at low magnification, statements on changes in intensity of inhibitory synaptic markers in the hippocampal region are impossible to interpret. Analysis of inhibitory synapses in vivo would require sparse neuronal labeling and 3D reconstruction, for instance using gephyrin-FingRs (Gross et al., Neuron 2013).

      Thanks for your insightful suggestion. We obtained pCAG_PSD95.FingR-eGFP-CCR5TC and pCAG_GPN.FingR-eGFP-CCR5TC constructs from Addgene (plasmid # 46295 & #46296). We attempted in utero electroporation (IUE) to introduce the DNAs into cortical neurons or hippocampal neurons at E14.5, unfortunately with no success. Following the repetitive operation for numerous times, we could eventually obtain newborn pups of ICR mice after IUE. However, we failed to obtain any newborn pups of C57BL/6J mice due to abortion following the procedure. Furthermore, pregnant C57BL/6J mice (WTs or KOs) did not survive or remained in a poor state of health after surgery. Therefore, we were unable to analyze synapses through sparse labeling and 3D reconstruction by IUE. Alternatively, we obtained commercial AAVs carrying rAAV-EF1a-PSD95.FingR-eGFP-CCR5TC and rAAV-EF1a-mRuby2-Gephyrin.FingR-IL2RGTC, then injected into the CA1 region of EndoA1<sup>fl/fl</sup> mice at P0. Mice were fixed and detected the fluorescent signals in CA1 regions at P21. Consistent with immunostaining with antibodies, decreased mRuby2-Gephyrin.FingR or PSD95.FingR-eGFP was observed in dendrites of KO neurons at P21, as shown in new Figure S3. In combination with electrophysiological recording, PSD fractionation and immunoisolation from brains, these data support our conclusion regarding the effects of endophilin A1 knockout on the inhibitory synapses.

      Additionally, we transfected DIV12 cultured hippocampal neurons with pCAG_PSD95.FingR-eGFP-CCR5TC or pCAG_GPN.FingR-eGFP-CCR5TC and observed fluorescent signals on DIV16. Both the signal intensity and number of GPN.FingR-eGFP clusters were also significantly attenuated, with no obvious changes in PSD95.FingR-eGFP clusters in dendrites of mature neurons, as shown in new Figure S5A-D. We are very pleased that the result further strengthened our original conclusion. We have added the new pieces of data in our revised manuscript.

      (3) Figure 3: surface labeling of GluA1 or the GABAAR gamma 2 subunit is difficult to interpret: the patterns are noisy and the numerous puncta appear largely non-synaptic although this is difficult to judge in the absence of additional synaptic markers. It appears statistics are done on dendritic segments rather than the number of neurons. The legend does not mention how many independent cultures this data is derived from. In their previous study (Yang et al., Front Mol Neurosci 2018), the authors noted a decrease in surface GluA1 levels in the absence of endophilin A1. How do they explain the absence of an effect on surface GluA1 levels in the current study?

      Sorry for the concern and thanks for your comments. First, we assessed changes in the surface levels of excitatory and inhibitory receptors by co-immunostaining in cultured WT and KO hippocampal neurons. Given the very low transfection efficiency of neurons in high density culture, numerous puncta of receptors from adjacent non-transfected neurons were also detected. This approach may contribute to the noisy pattern observed in Figure 3A. Besides, the projections of z-stack for higher magnified dendrites may likely introduced higher background signals. We have now replaced the original images with the newest repeat in new Figure 3A. Moreover, we confirmed a decrease in the surface expression of GABA<sub>A</sub>R γ2 by the biotinylation assay, as shown in Figure 3E. Indeed, we agree that some puncta for surface labeling of receptors seemed to be non-synaptic localization. In order to reflect the decrease in synaptic proteins at synapses, we isolated PSD fraction by biochemical assay and found that gephyrin and GABA<sub>A</sub>R γ2, two major inhibitory postsynaptic components, were reduced in the PSD fraction from KO brains, as shown in Figure 3L. Their colocalization was also attenuated in the absence of endophilin A1, as shown in Figure 5A-C. Combined with electrophysiological recording, these data from multiple assays indicate GluA1 at synapses was not obviously affected but GABA<sub>A</sub>R γ2 at synapses was impaired in endophilin A1 KO neurons in the present study.

      We have corrected the way that the number of samples is defined for statistical analysis as suggested. This point was also raised by Reviewer #1 (Recommendation (2)). We averaged the values from all dendritic segments of a single neuron, such that one neuron equaled one data point. We had replaced the original Figure 3B and 3D (please see Response to Recommendation (2) by Reviewer #1). Additionally, we added the number of independent cultures these data were derived from to figure legends in revised manuscript.

      Previously, we observed a small decrease in surface GluA1 levels in spines under basal conditions and a more pronounced suppression of surface GluA1 accumulation in spines upon chemical LTP in endophilin A1 KO neurons from EndoA1<sup>-/-</sup> mice that knockout endophilin A1 since embryonic development stages (Figure 5C,H. Yang et al., Front Mol Neurosci, 2018). In Figure 3A and B in current study, we analyzed surface receptor levels in GFP-positive dendrites, rather than spines, under basal conditions when endophilin A1 was depleted at the later developmental stage. We found a decrease in surface GABA<sub>A</sub>R γ2 levels but no significant effects on surface GluA1 levels in dendrites. These findings indicate that endophilin A1 primarily affects excitatory synaptic proteins in spines during synaptic plasticity and inhibitory synaptic proteins in dendrites under basal conditions in mature neurons.

      (4) Super-resolution images in Figure 3G, H, I: endophilin A1 puncta look different in panel 3I compared to 3G and 3H, which are very noisy. It is difficult to interpret how specific these EEN1 puncta are. Previous images showing EEN1 distribution in dendrites look different (Yang et al., Front Mol Neurosci 2018); is the same KO-verified antibody being used here? Colocalization of EEN1 with gephyrin or the GABAAR gamma 2 subunit is difficult to interpret; gephyrin mostly does not seem to colocalize with EEN1 in the example shown.

      Sorry for your concerns. As stated previously in Major Points (3), transfection efficiency was very low in cultured neurons and our cultured neurons were at relative high density. As a result, numerous puncta of proteins located in the adjacent non-transfected neurons were also detected, which may contribute to noisy signals observed in Figure 3G-I.

      In our previous paper, we confirmed the specificity of the antibody against endophilin A1 (5A,B. Yang et al., Front Mol Neurosci, 2018). We used the same antibody (rabbit anti-endophilin A1, Synaptic Systems GmbH, Germany) in the current study. While the previous images were obtained using confocal microscopy, the current images in Figures 3G, H, and I were acquired using super-resolution microscopy (SIM). The different patterns observed in the dendrites may be attributed to the difference in image resolution, antibodies dilution and reaction time.

      Reviewer #1 also points out the quantification of colocalization of gephyrin and GABA<sub>A</sub>R γ2 with endophilin A1. Please see Response to Recommendation (13) by Reviewer #1.

      (5) The interaction of gephyrin and endophilin A1 is based on coIP experiments in cells and brain tissue. To convincingly demonstrate that these proteins interact, biophysical experiments with purified proteins are necessary.

      Thanks a lot for your great suggestions on the interaction of endophilin A1 with gephyrin. To convincingly demonstrate their interaction, we performed pull-down assay with purified recombinant proteins and the result shows that both G and E domains of gephyrin were involved in the interaction with endophilin A1. The data has been added to the revised manuscript as new Figure 5I. We also modified the statement about the data and figure legends in the revised manuscript.

      (6) Figure 4G: the gephyrin images are not convincing; the inhibitory postsynaptic element typically looks somewhat elongated; these puncta are very noisy and do not appear to represent iPSDs. The same criticism applies to the images shown in Figures 5 and 7.

      Thanks for the comment. The gephyrin puncta in our images exhibited heterogeneous shapes and sizes, with some appearing somewhat elongated. To address this, we compared the puncta pattern of gephyrin with that shown in the reference. As illustrated in the figure from the reference, gephyrin puncta also displayed distinct shapes and sizes, Figure 3A-F, Neuron 78, 971–985, June 19, 2013). Please note that the images were z-stack projections at higher magnification, as described in the "Materials and Methods" section. This approach may likely introduce higher background signals and may contribute to the much more heterogeneous appearance of the puncta in Figures 4, 5, and 7. As mentioned previously, the numerous gephyrin puncta located in the adjacent non-transfected neurons may also contribute to some of the noisy signals observed. We have replaced the original images with new images in new Figure 4G, 5 and 7.

      Moreover, in order to confirm the effects of endophilin A1 KO on the gephyrin clustering, we also detected the endogenous clusters of gephyrin or PSD95 visualized by GPN.FingR-eGFP or PSD95.FingR-eGFP in cultured mature neurons. The results were consistent with immunostaining with antibodies against gephyrin. Please see Response to Recommendation (2)

      (7) Figure 7E, F: the rescue (Cre + WT) appears to perform better than the control (mCherry + GFP) in the PTZ condition; how do the authors explain this? Mixes of viral vectors were injected, would this approach achieve full rescue?

      Thanks for the thoughtful comment. Mixed viruses were injected bilaterally into the hippocampal CA1 regions. The results showed a full rescue effect by WT endophilin A1 in knockout mice during the early days, with even a little bit better rescue effect than the control group in the later days under the PTZ condition, as shown in Figures 7E and 7F. In the current study, overexpression of endophilin A1 increased the clustering of gephyrin and GABA<sub>A</sub>R γ2 in cultured neurons, as shown in Figures 4I-J and 5D-E. Presumably, the slightly better rescue effects observed in the behavioral tests was likely attributed to the enhanced clustering and/or stabilization of gephyrin/GABA<sub>A</sub>R γ2 by WT endophilin A1 expression in KO neurons in vivo. Moreover, the electrophysiological recording also showed full rescue effects on eIPSC by WT endophilin A1 in KO neurons (Figure 7G-K).

      Minor Points

      (1) The authors mention that they previously found a decrease in eEPSC amplitude in EEN1 KO mice (Yang et al., Front Mol Neurosci 2018). The data in Fig. 1E suggests a decrease in eEPSC amplitude but is not significant here, likely due to the small number of observations. If both eEPSC and iEPSC amplitude are reduced in the absence of EEN1. Would the E/I ratio still be significantly changed?

      We apologize for the confusion. In our previous study, AMPAR-mediated excitatory postsynaptic currents (eEPSCs) were found to be slightly but significantly reduced compared to the control group, while NMDAR-mediated excitatory postsynaptic currents showed no significant difference (Figure 4N,O. Yang et al., Front Mol Neurosci, 2018). In the current study, we adopted a different recording protocol, simultaneously measuring eEPSCs and eIPSCs from the same neuron to calculate the E/I ratio. Unlike previous studies, we did not use inhibitors to suppress GABA receptor activity. As a result, the recorded signals did not distinguish AMPAR-mediated or NMDAR-mediated excitatory postsynaptic currents to reflect total eEPSCs, which may explain the non-significant reduction observed compared to control neurons in this study.

      It is possible that the eEPSC amplitude would show a significant reduction if a larger number of neurons were recorded. Nevertheless, the larger suppression of eIPSCs in the absence of endophilin A1 indicates that the E/I ratio is significantly altered.

      (2) Page 7: the authors mention they aim to exclude effects on presynaptic terminals of deleting endophilin A1 in cultured neurons, is this because of a sparse transfection approach?

      Please clarify.

      Sorry for the confusion. In cultured neurons, we always observed sparse transfection due to the very low transfection efficiency (~ 0.5%). Therefore, we could examine the effects of endophilin A1 knockout specifically in the specific CamKIIa promoter-driven Cre-expressing postsynaptic neurons, while endophilin A1 remained intact in the non-transfected presynaptic neurons.

      (3) The representative blot of the surface biotinylation experiment (Figure 3E) suggests that loss of endophilin A1 also affects GluN1 and Nlgn2 levels, and error bars in panel 3F (lacking individual data points) suggest these experiments were highly variable.

      Sorry for the confusion. Reviewer #1 also raised the question and we quantified the total level of GluN1 and NL2 in Figure 3E. And we replaced the original graphs with scatterplots and means ± S.E.M. Please see the Response to Recommendation (3) & (12) by Reviewer #1.

      (4) Have other studies analyzing inhibitory synapse composition identified endophilin A1 as a component? The rationale for this study seems to be primarily based on the presence of epileptic seizures and E/I imbalance.

      Thank you for your questions. To date, no other studies investigated endophilin A1 as an inhibitory postsynaptic component. We observed the proximal localization of endophilin A1 with inhibitory postsynaptic proteins using super-resolution microscopy (SIM) and quantification results showed ~ 47% puncta of gephyrin correlated with endophilin A1 (Figure 3G-I and S4B-G). We further immunoisolated the inhibitory postsynaptic fraction using GABA<sub>A</sub> receptors and found that endophilin A1 was present in the isolated fraction, and vice versa (Figure 3J). Additionally, we demonstrated that endophilin A1 directly interacted with gephyrin through co-IP and pull-down assays (Figure 5J-I). Together with data from immunolabeling, biochemical assays, electrophysiological recordings, and behavioral tests, these results identified endophilin A1 as an inhibitory postsynaptic component.

      (5) Figure 3J: what are S100 and P100 labels? Is Nlgn2 part of the EEN1 complex? If it is, why are Nlgn2 surface levels not affected by EEN1 loss (Figure 3E, F, K)? Why does EEN1 not interact with Nlgn2 in HEK cells (Figure 4D)?

      Sorry for the confusion. The detailed information regarding S100 and P100 can be found in the “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Materials and Methods” section. S100 contains soluble proteins, while P100 refers to the membrane fraction after high speed (100,000xg) centrifugation.

      Figures 3J-K and 4C-F showed the data from immunoisolation and conventional co-immunoprecipitation assays, respectively. Immunoisolation, which uses antibodies coupled to magnetic beads, allows for the rapid and efficient separation of subcellular membrane compartments. In Figure 3J-K, we used antibodies against GABA<sub>A</sub>R α1 to isolate membrane protein complexes from the inhibitory postsynaptic fraction. In contrast, co-immunoprecipitation typically detects direct interactions between proteins solubilized by detergent treatment. For Figure 4C-F, FLAG beads were used in HEK293 lysates, or antibodies against endophilin A1 were employed in brain lysates to precipitate direct interaction partners. Combined with the results from Figure 3J-L, the data in 4C-F indicated that endophilin A1 was localized in the inhibitory postsynaptic compartment and directly bound to gephyrin but not to either GABA<sub>A</sub> receptors or Nlgn2 (NL2). This binding promoted the clustering of gephyrin and GABA<sub>A</sub>R γ2 at synapses, facilitating GABA<sub>A</sub>R assembly.

      Nlgn2 (NL2) is a key inhibitory postsynaptic component but does not directly bind to endophilin A1. Consequently, endophilin A1 failed to co-immunoprecipitate with NL2 in the presence of detergent in HEK293 cell lysates (Figure 4D). Furthermore, the surface levels of NL2 or its distribution in PSD fraction were unaffected by the loss of endophilin A1 (Figure 3E, F, K, L). This suggests that mechanisms independent of endophilin A1 orchestrate the surface expression and synaptic distribution of NL2.

      (6) How do the authors interpret the finding that endophilin A1, but not A2 or A3, binds gephyrin? What could explain these differences?

      Thanks for the thoughtful comment. Endophilin As contain BAR and SH3 domains. While the amino acid sequences in the BAR and SH3 domains are highly conserved, the intrinsically disordered loop region between BAR and SH3 domains is highly variable. A study by the Verstreken lab revealed that a human mutation in the unstructured loop region of endophilin A1 increases the risk of Parkinson's disease. They also demonstrated that the disordered loop region controls protein flexibility, which fine-tunes protein-protein and protein-membrane interactions critical for endophilin A1 function (Bademosi et al., Neuron 111, 1402–1422, May 3, 2023). Our previous study showed that endophilin A1 and A3, but not A2, bind to p140Cap through their SH3 domains, despite the high sequence homology in the SH3 domains among these proteins (Figure2A,B. Yang et al., Cell Research, 2015). These findings indicate that each endophilin A likely interacts with specific partners due to distinct key amino acids.

      Additionally, endophilin A1 is expressed at much higher levels than A2 and A3 in neurons, with distinct distribution of them across different brain regions. Our lab demonstrated that the function of A1 at postsynapses (both excitatory and inhibitory synapses) cannot be compensated by A2 or A3. Therefore, it is reasonable that endophilin A1, rather than A2 or A3, binds to gephyrin, even though the underlying mechanisms remain unclear.

      (7) Figure 4G: panels are mislabeled (GFP vs merge).

      Thanks for careful reading and sorry for the mistake. We corrected the label in new Figure 4G. Please see Response to Annotation, grammar, spelling, typing errors:(1) by Reviewer #2.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      The reviewer points out some excellent caveats regarding the morphant experiments. We agree that at least some of the effects of the podxl morpholino may be related to its effects on kidney development and/or gross developmental defects that impede liver development. Because of these limitations, we focused our experiments on analysis of CRISPant and mutant phenotypes, including showing that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effects on HSC number when injected with sgRNA#1. We did not observe any gross morphologic defects in podxl CRISPants. Liver size was not significantly altered in podxl CRISPants (Figure 2A). We will add brightfield images of podxl CRISPant larvae to the supplemental data for the revised manuscript.

      We agree with the reviewer that HRMA is not quantitative with respect to the fraction of alleles with indels and that capillary electrophoresis likely underestimates mutagenesis efficiency. Nonetheless, even with 100% mutation efficiency, podxl CRISPant knockdown, like most CRISPR knockdowns, would not represent complete loss of function:  ~1/3 of alleles will contain in-frame mutations and likely retain at least some gene function, so ~1/3*1/3 = 1/9 of cells will have no out-of-frame indels and contain two copies of at least partially functional podxl and ~2/3*2/3 = 4/9 of cells will have one out-of-frame indel and one copy of at least partially functional podxl. Thus, the decreased HSCs we observe with podxl CRISPant likely represents a partial loss-of-function phenotype in any case.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

      The reviewer proposes elegant experiments to address the specificity of the morpholino. For the revision, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s first concern. Please see our response above. In general, we agree that correlating phenotype penetrance with level of loss-of-function is a very good way to support conclusions regarding specificity in knockdown experiments. Unfortunately, because the phenotype we are examining (HSC number) has a relatively large standard deviation even in control/wildtype larvae (for example, 63 ± 19 (mean ± standard deviation) HSCs per liver in uninjected control siblings in Figure 1) it would be technically very difficult to do this experiment for podxl.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s second concern. Please see our response above. We acknowledge that some of the effects of the podxl morpholino may be non-specific. To address this concern in the revised manuscript, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      We appreciate the reviewer’s acknowledgement of the controls we performed to demonstrate the specificity of the CRISPant phenotypes. The proposed experiments (rescue, assessment of Podxl levels) would help bolster our conclusions but are technically difficult due to the relatively large standard deviation for the HSC number phenotype even in wildtype larvae and the lack of well-characterized zebrafish antibodies against Podxl.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      Thank you for this suggestion. We were referring in these sections to genes that are near the podxl locus with respect to three-dimensional chromatin structure; such genes would not necessarily be near the podxl locus on chromosome 4. We will clarify the text in this paragraph for the revised manuscript. At the same time, we will examine our transcriptomic data to check expression of mkln1, cyb5r3, and other nearby genes on chromosome 4 as suggested and include this analysis in the revised manuscript.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

      Thank you for this suggestion. We will modify these figures to clarify our results.

      Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Thank you very much for appreciating the hard work that went into this manuscript.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

      This reviewer makes an excellent point. Our finding that the largest changes in gene expression were in extracellular matrix (ECM) genes and ECM modulation is a major function of HSCs supports the hypothesis that genetic compensation is occurring in adults. Nonetheless, we agree that compensatory changes in adults may not fully reflect the compensatory changes during development, so it would bolster the conclusions of the paper to perform the RNA sequencing and qPCR experiments on zebrafish larval livers.

      We tried very hard to do this experiment proposed by Reviewer #3. In our hands, obtaining sufficient high-quality RNA for robust gene expression analysis typically requires pooling of ~10-15 larval livers. These larvae need to be obtained from a heterozygous in-cross in order to have matched wildtype sibling controls. Livers must be dissected from freshly euthanized (not fixed) zebrafish. Thus, this experiment requires genotyping live, individual larvae from a small amount of tissue (without sacrificing the larvae) before dissecting and pooling the livers. Unfortunately we were unable to confidently and reproducibly genotype individual live podxl larvae with these small amounts of tissue despite trying multiple approaches. Therefore we were not able to perform gene expression analysis on podxl mutant larval livers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decision-making when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented.

      (6) Some figures are not clear (see Figure 4 A & B).

      We will be trying to improve the quality of this image in the next version of the manuscript.

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

      The conflict-of-interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.

      Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      Weaknesses:

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed.

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors. We will be including a more explicit discussion of the limitations of SSF in urban environmental settings with human participants in the next version of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Overview of reviewer's concerns after peer review: 

      As for the initial submission, the reviewers' unanimous opinion is that the authors should perform additional controls to show that their key findings may not be affected by experimental or analysis artefacts, and clarify key aspects of their core methods, chiefly:  

      (1) The fact that their extremely high decoding accuracy is driven by frequency bands that would reflect the key press movements and that these are located bilaterally in frontal brain regions (with the task being unilateral) are seen as key concerns, 

      The above statement that decoding was driven by bilateral frontal brain regions is not entirely consistent with our results. The confusion was likely caused by the way we originally presented our data in Figure 2. We have revised that figure to make it more clear that decoding performance at both the parcel- (Figure 2B) and voxel-space (Figure 2C) level is predominantly driven by contralateral (as opposed to ipsilateral) sensorimotor regions. Figure 2D, which highlights bilateral sensorimotor and premotor regions, displays accuracy of individual regional voxel-space decoders assessed independently. This was the criteria used to determine which regional voxel-spaces were included in the hybridspace decoder. This result is not surprising given that motor and premotor regions are known to display adaptive interhemispheric interactions during motor sequence learning [1, 2], and particularly so when the skill is performed with the non-dominant hand [3-5]. We now discuss this important detail in the revised manuscript:

      Discussion (lines 348-353)

      “The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21,35], while the regional voxel-space decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32,36,37], particularly pertinent when the skill is performed with the non-dominant hand [38-40].”

      We now also include new control analyses that directly address the potential contribution of movement-related artefact to the results.  These changes are reported in the revised manuscript as follows:

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

      Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

      (2) Relatedly, the use of a wide time window (~200 ms) for a 250-330 ms typing speed makes it hard to pinpoint the changes underpinning learning, 

      The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

      Results (lines 258-261):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

      Results (lines 310-312):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

      Discussion (lines 382-385):

      “This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-bytrial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      (3) These concerns make it hard to conclude from their data that learning is mediated by "contextualisation" ---a key claim in the manuscript; 

      We believe the revised manuscript now addresses all concerns raised in Editor points 1 and 2.

      (4) The hybrid voxel + parcel space decoder ---a key contribution of the paper--- is not clearly explained; 

      We now provide additional details regarding the hybrid-space decoder approach in the following sections of the revised manuscript:

      Results (lines 158-172):

      “Next, given that the brain simultaneously processes information more efficiently across multiple spatial and temporal scales [28, 32, 33], we asked if the combination of lower resolution whole-brain and higher resolution regional brain activity patterns further improve keypress prediction accuracy. We constructed hybrid-space decoders (N = 1295 ± 20 features; Figure 3A) combining whole-brain parcel-space activity (n = 148 features; Figure 2B) with regional voxel-space activity from a datadriven subset of brain areas (n = 1147 ± 20 features; Figure 2D). This subset covers brain regions showing the highest regional voxel-space decoding performances (top regions across all subjects shown in Figure 2D; Methods – Hybrid Spatial Approach). 

      […]

      Note that while features from contralateral brain regions were more important for whole-brain decoding (in both parcel- and voxel-spaces), regional voxel-space decoders performed best for bilateral sensorimotor areas on average across the group. Thus, a multi-scale hybrid-space representation best characterizes the keypress action manifolds.”

      Results (lines 275-282):

      “We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction (Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

      Discussion (lines 341-360):

      “The initial phase of the study focused on optimizing the accuracy of decoding individual finger keypresses from MEG brain activity. Recent work showed that the brain simultaneously processes information more efficiently across multiple—rather than a single—spatial scale(s) [28, 32]. To this effect, we developed a novel hybridspace approach designed to integrate neural representation dynamics over two different spatial scales: (1) whole-brain parcel-space (i.e. – spatial activity patterns across all cortical brain regions) and (2) regional voxel-space (i.e. – spatial activity patterns within select brain regions) activity. We found consistent spatial differences between whole-brain parcel-space feature importance (predominantly contralateral frontoparietal, Figure 2B) and regional voxel-space decoder accuracy (bilateral sensorimotor regions, Figure 2D). The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21, 35], while the regional voxelspace decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32, 36, 37], particularly pertinent when the skill is performed with the non-dominant hand [38-40]. The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].  The hybrid-space decoder which achieved an accuracy exceeding 90%—and robustly generalized to Day 2 across trained and untrained sequences— surpassed the performance of both parcel-space and voxel-space decoders and compared favorably to other neuroimaging-based finger movement decoding strategies [6, 24, 42-44].”

      Methods (lines 636-647):

      “Hybrid Spatial Approach.  First, we evaluated the decoding performance of each individual brain region in accurately labeling finger keypresses from regional voxelspace (i.e. - all voxels within a brain region as defined by the Desikan-Killiany Atlas) activity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we then constructed a “hybridspace” decoder by incrementally concatenating regional voxel-space activity of brain regions—starting with the top-ranked region—with whole-brain parcel-level features and assessed decoding accuracy. Subsequently, we added the regional voxel-space features of the second-ranked brain region and continued this process until decoding accuracy reached saturation. The optimal “hybrid-space” input feature set over the group included the 148 parcel-space features and regional voxelspace features from a total of 8 brain regions (bilateral superior frontal, middle frontal, pre-central and post-central; N = 1295 ± 20 features).”

      (5) More controls are needed to show that their decoder approach is capturing a neural representation dedicated to context rather than independent representations of consecutive keypresses; 

      These controls have been implemented and are now reported in the manuscript:

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Results (lines 385-390):

      “Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

      Discussion (lines 408-423):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than withinsubject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4). 

      Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      (6) The need to show more convincingly that their data is not affected by head movements, e.g., by regressing out signal components that are correlated with the fiducial signal;  

      We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD).  Further, the requested additional control analyses have been carried out and are reported in the revised manuscript:

      Results (lines 204-211):

      “Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shupling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C). An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D). “

      (7) The offline neural representation analysis as executed is a bit odd, since it seems to be based on comparing the last key press to the first key press of the next sequence, rather than focus on the inter-sequence interval

      While we previously evaluated replay of skill sequences during rest intervals, identification of how offline reactivation patterns of a single keypress state representation evolve with learning presents non-trivial challenges. First, replay events tend to occur in clusters with irregular temporal spacing as previously shown by our group and others.  Second, replay of experienced sequences is intermixed with replay of sequences that have never been experienced but are possible. Finally, and perhaps the most significant issue, replay is temporally compressed up to 20x with respect to the behavior [6]. That means our decoders would need to accurately evaluate spatial pattern changes related to individual keypresses over much smaller time windows (i.e. - less than 10 ms) than evaluated here. This future work, which is undoubtably of great interest to our research group, will require more substantial tool development before we can apply them to this question. We now articulate this future direction in the Discussion:

      Discussion (lines 423-427):

      “A possible neural mechanism supporting contextualization could be the emergence and stabilization of conjunctive “what–where” representations of procedural memories [64] with the corresponding modulation of neuronal population dynamics [65, 66] during early learning. Exploring the link between contextualization and neural replay could provide additional insights into this issue [6, 12, 13, 15].”

      (8) And this analysis could be confounded by the fact that they are comparing the last element in a sequence vs the first movement in a new one. 

      We have now addressed this control analysis in the revised manuscript:

      Results (Lines 310-316)

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      It also seems to be the case that many analyses suggested by the reviewers in the first round of revisions that could have helped strengthen the manuscript have not been included (they are only in the rebuttal). Moreover, some of the control analyses mentioned in the rebuttal seem not to be described anywhere, neither in the manuscript, nor in the rebuttal itself; please double check that. 

      All suggested analyses carried out and mentioned are now in the revised manuscript.

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning…

      We have now included all the requested control analyses supporting “an early, swift change in the brain regions correlated with sequence learning”:

      The addition of more control analyses to rule out that head movement artefacts influence the findings, 

      We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD).  Further, we have implemented the requested additional control analyses addressing this issue:

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

      Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

      and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We have edited the manuscript to clarify that the degree of representational differentiation (contextualization) parallels skill learning.  We have no evidence at this point to indicate that “offline contextualization during short rest periods is the basis for improvement in performance”.  The following areas of the revised manuscript now clarify this point:  

      Summary (Lines 455-458):

      “In summary, individual sequence action representations contextualize during early learning of a new skill and the degree of differentiation parallels skill gains. Differentiation of the neural representations developed during rest intervals of early learning to a larger extent than during practice in parallel with rapid consolidation of skill.”

      Additional control analyses are also provided supporting a link between offline contextualization and early learning:

      Results (lines 302-318):

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equaling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”  

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: 

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. 

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      Weaknesses:  

      A formal analysis and quantification of how head movement may have contributed to the results should be included in the paper or supplemental material. The type of correlated head movements coming from vigorous key presses aren't necessarily visible to the naked eye, and even if arms etc are restricted, this will not preclude shoulder, neck or head movement necessarily; if ICA was conducted, for example, the authors are in the position to show the components that relate to such movement; but eye-balling the data would not seem sufficient. The related issue of eye movements is addressed via classifier analysis. A formal analysis which directly accounts for finger/eye movements in the same analysis as the main result (ie any variance related to these factors) should be presented.

      We now present additional data related to head (Figure 3 – figure supplement 3; note that average measured head movement across participants was 1.159 mm ± 1.077 SD) and eye movements (Figure 4 – figure supplement 3) and have implemented the requested control analyses addressing this issue. They are reported in the revised manuscript in the following locations: Results (lines 207-211), Results (lines 261-268), Discussion (Lines 362-368).

      This reviewer recommends inclusion of a formal analysis that the intra-vs inter parcels are indeed completely independent. For example, the authors state that the inter-parcel features reflect "lower spatially resolved whole-brain activity patterns or global brain dynamics". A formal quantitative demonstration that the signals indeed show "complete independence" (as claimed by the authors) and are orthogonal would be helpful.

      Please note that we never claim in the manuscript that the parcel-space and regional voxelspace features show “complete independence”.  More importantly, input feature orthogonality is not a requirement for the machine learning-based decoding methods utilized in the present study while non-redundancy is [7] (a requirement satisfied by our data, see below). Finally, our results show that the hybrid space decoder out-performed all other methods even after input features were fully orthogonalized with LDA (the procedure used in all contextualization analyses) or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

      Relevant to this issue, please note that if spatially overlapping parcel- and voxel-space timeseries only provided redundant information, inclusion of both as input features should increase model over-fitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplement 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, wholebrain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybrid-space decoder performance supers when parceltime series that spatially overlap with the included regional voxel-spaces are removed from the input feature set. 

      We state in the Discussion (lines 353-356)

      “The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

      To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.  

      Author response image 1.

      Matrix rank computed for whole-brain parcel- and voxel-space time-series in individual subjects across the training run. The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxel-space input features (rank = 267 ± 17 SD), on the other hand, approached the number of useable MEG sensor channels (n = 272). Although not full rank, the voxel-space rank exceeded the parcel-space rank for all participants. Thus, some voxel-space features provide additional orthogonal information to representations at the parcel-space scale.  An expression of this is shown in the correlation distribution between parcel and constituent voxel time-series in Figure 2—figure Supplement 2.

      Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

      Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal.  This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].  

      Reviewer #2 (Public review): 

      Summary: 

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%). This part seems driven almost by pure engineering interest in gaining as high decoding accuracy as possible. 

      In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked more scientific questions about how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond. 

      Strengths: 

      Each part has its own strength. For the first part, the use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. For the second part, the finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea. 

      Weaknesses: 

      Despite the strengths raised, the specific goal for each part of the current paper, i.e., achieving high decoding accuracy and answering the scientific question of early skill learning, seems not to harmonize with each other very well. In short, the current approach, which is solely optimized for achieving high decoding accuracy, does not provide enough support and interpretability for the paper's interesting scientific claim. This reminds me of the accuracy-explainability tradeoff in machine learning studies (e.g., Linardatos et al., 2020). More details follow. 

      There are a number of different neural processes occurring before and after a key press, such as planning of upcoming movement and ahead around premotor/parietal cortices, motor command generation in primary motor cortex, sensory feedback related processes in sensory cortices, and performance monitoring/evaluation around the prefrontal area. Some of these may show learning-dependent change and others may not.  

      In this paper, the focus as stated in the Introduction was to evaluate “the millisecond-level differentiation of discrete action representations during learning”, a proposal that first required the development of more accurate computational tools.  Our first step, reported here, was to develop that tool. With that in hand, we then proceeded to test if neural representations differentiated during early skill learning. Our results showed they did.  Addressing the question the Reviewer asks is part of exciting future work, now possible based on the results presented in this paper.  We acknowledge this issue in the revised Discussion:  

      Discussion (Lines 428-434):

      “In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

      Given the use of whole-brain MEG features with a wide time window (up to ~200 ms after each key press) under the situation of 3~4 Hz (i.e., 250~330 ms press interval) typing speed, these different processes in different brain regions could have contributed to the expression of the "contextualization," making it difficult to interpret what really contributed to the "contextualization" and whether it is learning related. Critically, the majority of data used for decoder training has the chance of such potential overlap of signal, as the typing speed almost reached a plateau already at the end of the 11th trial and stayed until the 36th trial. Thus, the decoder could have relied on such overlapping features related to the future presses. If that is the case, a gradual increase in "contextualization" (pattern separation) during earlier trials makes sense, simply because the temporal overlap of the MEG feature was insufficient for the earlier trials due to slower typing speed.  Several direct ways to address the above concern, at the cost of decoding accuracy to some degree, would be either using the shorter temporal window for the MEG feature or training the model with the early learning period data only (trials 1 through 11) to see if the main results are unaffected would be some example. 

      We now include additional analyses carried out with decoding time windows ranging from 50 to 250ms in duration, which have been added to the revised manuscript as follows: 

      Results (lines 258-261):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

      Results (lines 310-312):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

      Discussion (lines 382-385):

      “This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by trial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      Several new control analyses are also provided addressing the question of overlapping keypresses:

      Reviewer #3 (Public review):

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements.

      Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. 

      Strengths: 

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybridspace approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers. 

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design (mainly the use of a single sequence) and which are described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.  

      Please, see below for detailed response to each of these points.

      Specifically: The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4).

      A crucial difference between our present study and the elegant study from Kornysheva et al. (2019) in Neuron highlighted by the Reviewer is that while ours is a learning study, the Kornysheva et al. study is not. Kornysheva et al. included an initial separate behavioral training session (i.e. – performed outside of the MEG) during which participants learned associations between fractal image patterns and different keypress sequences. Then in a separate, later MEG session—after the stimulus-response associations had been already learned in the first session—participants were tasked with recalling the learned sequences in response to a presented visual cue (i.e. – the paired fractal pattern). 

      Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12].  Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not. While Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

      The revised manuscript states our findings related to the Day 2 Control data in the following locations:

      Results (lines 117-122):

      “On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as on the single-trial performance of 9 different untrained control sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-23-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t = 7.21, p < 0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect (Figure 1 – figure supplement 1A).”

      Results (lines 212-219):

      “Utilizing the highest performing decoders that included LDA-based manifold extraction, we assessed the robustness of hybrid-space decoding over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; Figure 3 – figure supplement 4). Thus, index finger classifiers constructed using the hybrid decoding approach robustly generalized from Day 1 to Day 2 across trained and untrained keypress sequences.”

      Results (lines 269-273):

      “On Day 2, incorporating contextual information into the hybrid-space decoder enhanced classification accuracy for the trained sequence only (improving from 87.11% for 4-class to 90.22% for 5-class), while performing at or below-chance levels for the Control sequences (≤ 30.22% ± SD 0.44%). Thus, the accuracy improvements resulting from inclusion of contextual information in the decoding framework was specific for the trained skill sequence.”

      As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. 

      Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution, and, of course, provides no evidence of absence. The authors also pointed out that such "mixing" would hamper the discriminability of the two ordinal positions of the index finger, given that "ordinal position 5" is systematically followed by "ordinal position 1". This is a valid point which, however, cannot rule out that "contextualization" nevertheless reflects the described "mixing".

      The revised manuscript contains several control analyses which rule out this potential confound.

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Results (lines 385-390):

      “Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

      Discussion (lines 408-423):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4). 

      Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      During the review process, the authors responded to my concern that training of a single sequence introduces the potential confound of "mixing" described above, which could have been avoided by training on several sequences, as in Kornysheva et al. (Neuron 2019), by arguing that Day 2 in their study did include control sequences. However, the authors' findings regarding these control sequences are fundamentally different from the findings in Kornysheva et al. (2019), and do not provide any indication of effector-independent ordinal information in the described contextualization - but, actually, the contrary. In Kornysheva et al. (Neuron 2019), ordinal, or positional, information refers purely to the rank of a movement in a sequence. In line with the idea of competitive queuing, Kornysheva et al. (2019) have shown that humans prepare for a motor sequence via a simultaneous representation of several of the upcoming movements, weighted by their rank in the sequence. Importantly, they could show that this gradient carries information that is largely devoid of information about the order of specific effectors involved in a sequence, or their timing, in line with competitive queuing. They showed this by training a classifier to discriminate between the five consecutive movements that constituted one specific sequence of finger movements (five classes: 1st, 2nd, 3rd, 4th, 5th movement in the sequence) and then testing whether that classifier could identify the rank (1st, 2nd, 3rd, etc) of movements in another sequence, in which the fingers moved in a different order, and with different timings. Importantly, this approach demonstrated that the graded representations observed during preparation were largely maintained after this cross decoding, indicating that the sequence was represented via ordinal position information that was largely devoid of information about the specific effectors or timings involved in sequence execution. This result differs completely from the findings in the current manuscript. Dash et al. report a drop in detected ordinal position information (degree of contextualization in figure 5C) when testing for contextualization in their novel, untrained sequences on Day 2, indicating that context and ordinal information as defined in Dash et al. is not at all devoid of information about the specific effectors involved in a sequence. In this regard, a main concern in my public review, as well as the second reviewer's public review, is that Dash et al. cannot tell apart, by design, whether there is truly contextualization in the neural representation of a sequence (which they claim), or whether their results regarding "contextualization" are explained by what they call "mixing" in their author response, i.e., an overlap of representations of consecutive movements, as suggested as an alternative explanation by Reviewer 2 and myself.

      Again, as stated in response to a related comment by the Reviewer above, it is not surprising that our results differ from the study by Kornysheva et al. (2019) . A crucial difference between the studies that the Reviewer fails to recognize is that while ours is a learning study, the Kornysheva et al. study is not. Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12].  Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not, since it was not concerned with learning dynamics. The strengths of the elegant Kornysheva study highlighted by the Reviewer—that the pre-planned sequence queuing gradient of sequence actions was independent of the effectors or timings used—is precisely due to the fact that participants were selecting between sequence options that had been previously—and equivalently—learned. The decoders in the Kornynsheva study were trained to classify effector- and timing-independent sequence position information— by design—so it is not surprising that this is the information they reflect.

      The questions asked in our study were different: 1) Do the neural representations of the same sequence action executed in different skill (ordinal sequence) locations differentiate (contextualize) during early learning?  and 2) Is the observed contextualization specific to the learned sequence? Thus, while Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - figure supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - figure supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. 

      The aim of the between-subject regression analysis presented in the Results (see below) and in Figure 5—figure supplement 7 (previously Figure 5—figure supplement 3) of the revised manuscript, was to rule out a general effect of tapping speed on the magnitude of contextualization observed. If temporal overlap of neural representations was driving their differentiation, then participants typing at higher speeds should also show greater contextualization scores. We made the decision to use a between-subject analysis to address this issue since within-subject skill speed variance was rather small over most of the training session. 

      The Reviewer’s request that we additionally carry-out a “regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects)” is essentially the same request of Reviewer 2 above. That request was to perform a modified simple linear regression analysis where the predictor is the sum the 4-4 and 4-1 transition times, since these transitions are where any temporal overlaps of neural representations would occur.  A new Figure 5 – figure supplement 6 in the revised manuscript includes a scatter plot showing the sum of adjacent index finger keypress transition times (i.e. – the 4-4 transition at the conclusion of one sequence iteration and the 4-1 transition at the beginning of the next sequence iteration) versus online contextualization distances measured during practice trials. Both the keypress transition times and online contextualization scores were z-score normalized within individual subjects, and then concatenated into a single data superset. As is clear in the figure data, results of the regression analysis showed a very weak linear relationship between the two (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3). Thus, contextualization score magnitudes do not reflect the amount of overlap between adjacent keypresses when assessed either within- or between-subject.

      The revised manuscript now states:

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The revised manuscript now addresses specifically the question of mixing of temporally overlapping information:

      Results (Lines 310-328)

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3). Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7). “

      Discussion (Lines 417-423)

      “Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).  

      The revised manuscript now addresses specifically the question of pre-planning:

      Results (lines 310-318):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. During the review process, the authors reported a confusion matrix from a classification of asterisks position based on eye tracking data recorded during the task and concluded that the classifier performed at chance level and gaze was, thus, apparently not biased by the visual stimulation. However, the confusion matrix showed a huge bias that was difficult to interpret (a very strong tendency to predict one of the five asterisk positions, despite chance-level performance). Without including additional information for this analysis (or simply the gaze position as a function of the number of astersisk on the screen) in the manuscript, this important control analysis cannot be properly assessed, and is not available to the public.  

      We now include the gaze position data requested by the Reviewer alongside the confusion matrix results in Figure 4 – figure supplement 3.

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).”

      The rationale for the task design including the asterisks is presented below:

      Methods (Lines 500-514)

      “The five-item sequence was displayed on the computer screen for the duration of each practice round and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants' present position within the sequence. Inclusion of this feedback minimizes working memory loads during task performance [73]. Following the completion of a full sequence iteration, the asterisk returned to the first sequence item. The asterisk did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five "X" symbols displayed on the computer screen, which remained for the duration of the rest break. Participants were instructed to focus their gaze on the screen during this time. The behavior in this explicit, motor learning task consists of generative action sequences rather than sequences of stimulus-induced responses as in the serial reaction time task (SRTT). A similar real-world example would be manually inputting a long password into a secure online application in which one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.”

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, this does not address the question whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - i.e., the question whether performance changes (micro-offline gains) are less pronounced across rest periods for which the change in "contextualization" is relatively low. The single-subject correlation between contextualization changes "during" rest and micro-offline gains (Figure 5 - figure supplement 4) addresses this question, however, the critical statistical test (are correlation coefficients significantly different from zero) is not included. Given the displayed distribution, it seems unlikely that correlation coefficients are significantly above zero. 

      As recommend by the Reviewer, we now include one-way right-tailed t-test results which provide further support to the previously reported finding. The mean of within-subject correlations between offline contextualization and cumulative micro-offline gains was significantly greater than zero (t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76; see Figure 5 – figure supplement 4, left), while correlations for online contextualization versus cumulative micro-online (t = -1.14, p = 0.8669, df = 25, Cohen's d = -0.22) or micro-offline gains t = -0.097, p = 0.5384, df = 25, Cohen's d = -0.019) were not. We have incorporated the significant one-way t-test for offline contextualization and cumulative micro-offline gains in the Results section of the revised manuscript (lines 313-318) and the Figure 5 – figure supplement 4 legend.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains. 

      We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated. 

      In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second-long practice periods interleaved with ten 10-second-long rest breaks; 3 min 30 sec total training duration).  

      Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

      “Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

      The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.  

      Separately, there are important issues regarding the Das et al. study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test. 

      The Das et al. results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

      On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

      We made the following manuscript revisions related to these important issues: 

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that micro offline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Next, in the Methods, we articulate important constrains formulated by Pan and Rickard and Bonstrup et al for meaningful measurements:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      We finally discuss the implications of neglecting some or all of these recommendations:

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      Along these lines, the authors' claim, based on Bönstrup et al. 2020, that "retroactive interference immediately following practice periods reduces micro-offline learning", is not supported by that very reference. Citing Bönstrup et al. (2020), "Regarding early learning dynamics (trials 1-5), we found no differences in microscale learning parameters (micro online/offline) or total early learning between both interference groups." That is, contrary to Dash et al.'s current claim, Bönstrup et al. (2020) did not find any retroactive interference effect on the specific behavioral readout (micro-offline gains) that the authors assume to reflect consolidation. 

      Please, note that the Bönstrup et al. 2020 paper abstract states: 

      “Third, retroactive interference immediately after each practice period reduced the learning rate relative to interference after passage of time (N = 373), indicating stabilization of the motor memory at a microscale of several seconds.”

      which is further supported by this statement in the Results: 

      “The model comprised three parameters representing the initial performance, maximum performance and learning rate (see Eq. 1, “Methods”, “Data Analysis” section). We then statistically compared the model parameters between the interference groups (Fig. 2d). The late interference group showed a higher learning rate compared with the early interference group (late: 0.26 ± 0.23, early: 2.15 ± 0.20, P=0.04). The effect size of the group difference was small to medium (Cohen’s d 0.15)[29]. Similar differences with a stronger rise in the learning curve of a late interference groups vs. an early interference group were found in a smaller sample collected in the lab environment (Supplementary Fig. 3).”

      We have modified the statement in the revised manuscript to specify that the difference observed was between learning rates: Introduction (Lines 30-32)

      “During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11].”

      The authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods (see, e.g., abstract). However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).  

      The Reviewer raises again the issue of a potential confound of “pre-planning” on our contextualization measures as in the comment above: 

      “Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).”

      The cited studies by Ariani et al. indicate that effects of pre-planning are likely to impact the first 3 keypresses of the initial sequence iteration in each trial. As stated in the response to this comment above, we conducted a control analysis of contextualization that ignores the first sequence iteration in each trial to partial out any potential preplanning effect. This control analyses yielded comparable results, indicating that preplanning is not a major driver of our reported contextualization effects. We now report this in the revised manuscript:

      We also state in the Figure 1 legend (Lines 99-103) in the revised manuscript that preplanning has no effect on the behavioral measures of micro-offline and micro-online gains in our dataset:

      The Reviewer also raises the issue of possible effects stemming from “fatigue” and “reactive inhibition” which inhibit performance and are indeed relevant to skill learning studies. We designed our task to specifically mitigate these effects. We now more clearly articulate this rationale in the description of the task design as well as the measurement constraints essential for minimizing their impact.

      We also discuss the implications of fatigue and reactive inhibition effects in experimental designs that neglect to follow these recommendations formulated by Pan and Rickard in the Discussion section and propose how this issue can be better addressed in future investigations.

      To summarize, the results of our study indicate that: (a) offline contextualization effects are not explained by pre-planning of the first action sequence iteration in each practice trial; and (b) the task design implemented in this study purposefully minimize any possible effects of reactive inhibition or fatigue.  Circling back to the Reviewer’s proposal that “contextualization…may just as well reflect a change that occurs "online"”, we show in this paper direct empirical evidence that contextualization develops to a greater extent across rest periods rather than across practice trials, contrary to the Reviewer’s proposal.  

      That is, the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes. This becomes strikingly clear in the recent Nature paper by Griffin et al. (2025), who computed micro-offline gains as the difference in average performance across the first five sequences in a practice period (a block, in their terminology) and the last five sequences in the previous practice period. Averaging across sequences in this way minimises the chance to detect online performance changes and inflates changes in performance "offline". The problem that "online" gains (or contextualization) is actually computed from data entirely generated online, and therefore subject to processes that occur online, is inherent in the very definition of micro-online gains, whether, or not, they computed from averaged performance.

      We would like to make it clear that the issue raised by the Reviewer with respect to averaging across sequences done in the Griffin et al. (2025) study does not impact our study in any way. The primary skill measure used in all analyses reported in our paper is not temporally averaged. We estimated instantaneous correct sequence speed over the entire trial. Once the first sequence iteration within a trial is completed, the speed estimate is then updated at the resolution of individual keypresses. All micro-online and -offline behavioral changes are measured as the difference in instantaneous speed at the beginning and end of individual practice trials.

      Methods (lines 528-530):

      “The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress.”

      The instantaneous speed measure used in our analyses, in fact, maximizes the likelihood of detecting changes in online performance, as the Reviewer indicates.  Despite this optimally sensitive measurement of online changes, our findings remained robust, consistently converging on the same outcome across our original analyses and the multiple controls recommended by the reviewers. Notably, online contextualization changes are significantly weaker than offline contextualization in all comparisons with different measurement approaches.

      Results (lines 302-309)

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Results (lines 316-318)

      “Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Results (lines 318-328)

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      We disagree with the Reviewer’s statement that “the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes”.  From a strictly behavioral point of view, it is obviously true that one can only measure skill (rather than the absence of it during rest) to determine how it changes over time.  While skill changes surrounding rest are used to infer offline learning processes, recovery of skill decay following intense practice is used to infer “unmeasurable” recovery from fatigue or reactive inhibition. In other words, the alternative processes proposed by the Reviewer also rely on the same inferential reasoning. 

      Importantly, inferences can be validated through the identification of mechanisms. Our experiment constrained the study to evaluation of changes in neural representations of the same action in different contexts, while minimized the impact of mechanisms related to fatigue/reactive inhibition [13, 14]. In this way, we observed that behavioral gains and neural contextualization occurs to a greater extent over rest breaks rather than during practice trials and that offline contextualization changes strongly correlate with the offline behavioral gains, while online contextualization does not. This result was supported by the results of all control analyses recommended by the Reviewers. Specifically:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      And Discussion (Lines 444-448):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent.”

      Next, we show that offline contextualization is greater than online contextualization and predicts offline behavioral gains across all measurement approaches, including all controls suggested by the Reviewer’s comments and recommendations. 

      Results (lines 302-318):

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Results (lines 318-324)

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      We then show that offline contextualization is not explained by pre-planning of the first action sequence:

      Results (lines 310-316):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

      Discussion (lines 409-412):

      “This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A).”

      In summary, none of the presented evidence in this paper—including results of the multiple control analyses carried out in response to the Reviewers’ recommendations— supports the Reviewer’s position. 

      Please note that the micro-offline learning "inference" has extensive mechanistic support across species and neural recording techniques (see Introduction, lines 26-56). In contrast, the reactive inhibition "inference," which is the Reviewer's alternative interpretation, has no such support yet [15].

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6].

      Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      That said, absence of evidence, is not evidence of absence and for that reason we also state in the Discussion (lines 448-452):

      A simple control analysis based on shuffled class labels could lend further support to the authors' complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance-level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). During the review process, the authors reported this analysis to the reviewers. Given that readers may consider following the presented decoding approach in their own work, it would have been important to include that control analysis in the manuscript to convince readers of its validity. 

      As requested, the label-shuffling analysis was carried out for both 4- and 5-class decoders and is now reported in the revised manuscript.

      Results (lines 204-207):

      “Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shuffling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C).”

      Results (lines 261-264):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C).”

      Furthermore, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - it is unclear what the authors refer to when they talk about the sign of the "average source", line 477). 

      The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedures implemented:

      Methods (lines 604-611):

      “Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      Comments on the revision: 

      The authors have made large efforts to address all concerns raised. A couple of suggestions remain: 

      - formally show if and how movement artefacts may contribute to the signal and analysis; it seems that the authors have data to allow for such an analysis  

      We have implemented the requested control analyses addressing this issue. They are reported in: Results (lines 207-211 and 261-268), Discussion (Lines 362-368):

      - formally show that the signals from the intra- and inter parcel spaces are orthogonal. 

      Please note that, despite the Reviewer’s statement above, we never claim in the manuscript that the parcel-space and regional voxel-space features show “complete independence”. 

      Furthermore, the machine learning-based decoding methods used in the present study do not require input feature orthogonality, but instead non-redundancy [7], which is a requirement satisfied by our data (see below and the new Figure 2 – figure supplement 2 in the revised manuscript). Finally, our results already show that the hybrid space decoder outperformed all other methods even after input features were fully orthogonalized with LDA or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

      We also highlight several additional results that are informative regarding this issue. For example, if spatially overlapping parcel- and voxel-space time-series only provided redundant information, inclusion of both as input features should increase model overfitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplements 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, whole-brain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybridspace decoder performance supers when parcel-time series that spatially overlap with the included regional voxel-spaces are removed from the input feature set.  We state in the Discussion (lines 353-356)

      “The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

      To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.  

      Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

      Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal.  This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].

      Reviewer #2 (Recommendations for the authors):  

      I appreciate the authors' efforts in addressing the concerns I raised. The responses generally made sense to me. However, I had some trouble finding several corrections/additions that the authors claim they made in the revised manuscript: 

      "We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4, and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62).  We now include this new negative control analysis in the revised manuscript."  

      This approach is now reported in the manuscript in the Results (Lines 324-328 and Figure 5-Figure Supplement 6 legend.

      "We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue." 

      Discussion (Lines 436-441)

      “One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. Determining if representational contextualization is exhibited across multiple finger movements embedded within for example longer sequences (e.g. – two index finger and two little finger keypresses performed within a short piece of piano music) will be an important extension to the present results.”

      "We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context."  

      Discussion (Lines 441-444)

      “While a supervised manifold learning approach (LDA) was used here because it optimized hybrid-space decoder performance, unsupervised strategies (e.g. - PCA and MDS, which also substantially improved decoding accuracy in the present study; Figure 3 – figure supplement 2) are likely more suitable for real-time BCI applications.”

      and 

      "The Reviewer makes a good point. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript." 

      Results (lines 275-282)

      “We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction (Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

      Where are they in the manuscript? Did I read the wrong version? It would be more helpful to specify with page/line numbers. Please also add the detailed procedure of the control/additional analyses in the Method. 

      As requested, we now refer to all manuscript revisions with specific line numbers. We have also included all detailed procedures related to any additional analyses requested by reviewers.

      I also have a few other comments back to the authors' following responses: 

      "Thus, increased overlap between the "4" and "1" keypresses (at the start of the sequence) and "2" and "4" keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the "4-4" transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization- related changes to the underlying neural representations."  "We also re-examined our previously reported classification results with respect to this issue. 

      We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, "4" keypresses would be more likely to be misclassified as "1" or "2" keypresses (or vice versa) than as "3" keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3-figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization." 

      "Based upon the increased overlap between adjacent index finger keypresses (i.e. - "4-4" transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features."  

      As the time window for MEG feature is defined after the onset of each press, it is more likely that the feature overlap is the current and the future presses, rather than the current and the past presses (of course the three will overlap at very fast typing speed). Therefore, for sequence 41324, if we note the planning-related processes by a Roman numeral, the overlapping features would be '4i', '1iii', '3ii', '2iv', and '4iv'. Assuming execution-related process (e.g., 1) and planning-related process (e.g., i) are not necessarily similar, especially in finer temporal resolution, the patterns for '4i' and '4iv' are well separated in terms of process 'i' and 'iv,' and this advantage will be larger in faster typing speed. This also applies to the other presses. Thus, the author's arguments about the masking of contextualization and misclassification due to pattern overlap seem odd. The most direct and probably easiest way to resolve this would be to use a shorter time window for the MEG feature. Some decrease in decoding accuracy in this case is totally acceptable for the science purpose.  

      The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

      Results (lines 258-268):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2). As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (crossvalidated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C).”

      Results (lines 310-316):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R² = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). “

      Discussion (lines 380-385):

      “The first hint of representational differentiation was the highest false-negative and lowest false-positive misclassification rates for index finger keypresses performed at different locations in the sequence compared with all other digits (Figure 3C). This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by-trial increase in 2class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      "We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence" 

      For regression analysis, I recommend to use total keypress time per a sequence (or sum of 4-1 and 4-4) instead of specific transition intervals, because there likely exist specific correlational structure across the transition intervals. Using correlated regressors may distort the result.  

      This approach is now reported in the manuscript:

      Results (Lines 324-328) and Figure  5-Figure Supplement 6 legend.

      "We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of tradeoffs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memoryrelated processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4-figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach." 

      I recommend that the authors add this paragraph or a paragraph like this to the Discussion. This perspective is very important and still missing in the revised manuscript. 

      We now included in the manuscript the following sections addressing this point:

      Discussion (lines 334-338)

      “The main findings of this study during which subjects engaged in a naturalistic, self-paced task were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they were performed, and that the degree of representational differentiation— particularly prominent over rest intervals—correlated with skill gains. “

      Discussion (lines 428-434)

      “In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

      "The rapid initial skill gains that characterize early learning are followed by micro-scale fluctuations around skill plateau levels (i.e. following trial 11 in Figure 1B)"  Is this a mention of Figure 1 Supplement 1 A?  

      The sentence was replaced with the following: Results (lines 108-110)

      “Participants reached 95% of maximal skill (i.e. - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all total learning across participants (Figure 1B, inset) [1].”

      The citation below seems to have been selected by mistake; 

      "9. Chen, S. & Epps, J. Using task-induced pupil diameter and blink rate to infer cognitive load. Hum Comput Interact 29, 390-413 (2014)." 

      We thank the Reviewer for bringing this mistake to our attention. This citation has now been corrected.

      Reviewer #3 (Recommendations for the authors):  

      The authors write in their response that "We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis." I could not find anything along these lines in the (redlined) version of the manuscript and therefore did not change the corresponding comment in the public review.  

      The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedure implemented:

      Methods (lines 604-611):

      “Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

      The control analysis based on a multivariate regression that assessed whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times, as briefly mentioned in the authors' responses to Reviewer 2 and myself, was not included in the manuscript and could not be sufficiently evaluated. 

      This approach is now reported in the manuscript: Results (Lines 324-328) and Figure  5-Figure Supplement 6 legend.

      The authors argue that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows a large proportion of the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks with respect to the acquired skill level, despite the presence of micro-offline gains.  

      We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated. 

      In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second long practice periods interleaved with ten 10-second long rest breaks; 3 min 30 sec total training duration). Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

      “Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

      The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.  

      Separately, there are important issues regarding the Das et al study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test. 

      The Das et al results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

      On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

      We made the following manuscript revisions related to these important issues: 

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Next, in the Methods, we articulate important constraints formulated by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      We finally discuss the implications of neglecting some or all of these recommendations:

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      Personally, given that the idea of (micro-offline) consolidation seems to attract a lot of interest (and therefore cause a lot of future effort/cost public money) in the scientific community, I would find it extremely important to be cautious in interpreting results in this field. For me, this would include abstaining from the claim that processes occur "during" a rest period (see abstract, for example), given that micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition). In addition, I would suggest to discuss in more depth the actual evidence not only in favour, but also against, the assumption of micro-offline gains as a phenomenon of learning.  

      We agree with the reviewer that caution is warranted. Based upon these suggestions, we have now expanded the manuscript to very clearly define the experimental constraints under which different groups have successfully studied micro-offline learning and its mechanisms, the impact of fatigue/reactive inhibition on micro-offline performance changes unrelated to learning, as well as the interpretation problems that emerge when those recommendations are not followed. 

      We clearly articulate the crucial constrains recommended by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements and interpretation of offline gains in the revised manuscript. 

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      In the Introduction, we review the extensive evidence emerging from LFP and microelectrode recordings in humans and monkeys (including causality of neural replay with respect to micro-offline gains and early learning in the Griffin et al. Nature 2025 publication):

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Following the reviewer’s advice, we have expanded our discussion in the revised manuscript of alternative hypotheses put forward in the literature and call for caution when extrapolating results across studies with fundamental differences in design (e.g. – different practice and rest durations, or presence/absence of extrinsic reward, etc). 

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      References

      (1) Zimerman, M., et al., Disrupting the Ipsilateral Motor Cortex Interferes with Training of a Complex Motor Task in Older Adults. Cereb Cortex, 2012.

      (2) Waters, S., T. Wiestler, and J. Diedrichsen, Cooperation Not Competition: Bihemispheric tDCS and fMRI Show Role for Ipsilateral Hemisphere in Motor Learning. J Neurosci, 2017. 37(31): p. 7500-7512.

      (3) Sawamura, D., et al., Acquisition of chopstick-operation skills with the nondominant hand and concomitant changes in brain activity. Sci Rep, 2019. 9(1): p. 20397.

      (4) Lee, S.H., S.H. Jin, and J. An, The dieerence in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 2019. 9(1): p. 14066.

      (5) Grafton, S.T., E. Hazeltine, and R.B. Ivry, Motor sequence learning with the nondominant left hand. A PET functional imaging study. Exp Brain Res, 2002. 146(3): p. 369-78.

      (6) Buch, E.R., et al., Consolidation of human skill linked to waking hippocamponeocortical replay. Cell Rep, 2021. 35(10): p. 109193.

      (7) Wang, L. and S. Jiang, A feature selection method via analysis of relevance, redundancy, and interaction, in Expert Systems with Applications, Elsevier, Editor. 2021.

      (8) Yu, L. and H. Liu, Eeicient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004. 5: p. 1205-1224.

      (9) Munn, B.R., et al., Multiscale organization of neuronal activity unifies scaledependent theories of brain function. Cell, 2024.

      (10) Borragan, G., et al., Sleep and memory consolidation: motor performance and proactive interference eeects in sequence learning. Brain Cogn, 2015. 95: p. 54-61.

      (11) Landry, S., C. Anderson, and R. Conduit, The eeects of sleep, wake activity and timeon-task on oeline motor sequence learning. Neurobiol Learn Mem, 2016. 127: p. 5663.

      (12) Gabitov, E., et al., Susceptibility of consolidated procedural memory to interference is independent of its active task-based retrieval. PLoS One, 2019. 14(1): p. e0210876.

      (13) Pan, S.C. and T.C. Rickard, Sleep and motor learning: Is there room for consolidation? Psychol Bull, 2015. 141(4): p. 812-34.

      (14) , M., et al., A Rapid Form of Oeline Consolidation in Skill Learning. Curr Biol, 2019. 29(8): p. 1346-1351 e4.

      (15) Gupta, M.W. and T.C. Rickard, Comparison of online, oeline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 2024. 14(1): p. 4661.

  2. Jun 2025
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      No major weaknesses noted.

      We gratefully appreciate your positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have mainly two questions for this work.

      Main point-1:

      The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where were these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, previously, more details had been added in Materials and methods section in the revised manuscript (see Line 482-493) (Manuscript with marked changes are related to “Related Manuscript File” in submission system). We gratefully appreciate your professional comments.

      Line 482-493: “Lactic acid bacteria (LAB) and Enterococcus strains were isolated from 39 samples: 33 fermented yoghurts samples (collected from families in multiple cities of China, including Lanzhou, Urumqi, Guangzhou, Shenzhen, Shanghai, Hohhot, Nanjing, Yangling, Dali, Zhengzhou, Shangqiu, Harbin, Kunming, Puer), and 6 healthy piglet rectal content samples without pathogen infection and diarrhea in pig farm of Zhejiang province (Table 1). Ten isolates were randomly selected from each sample. De Man-Rogosa-Sharpe (MRS) with 2.0% CaCO<sub>3</sub> (is a selective culture medium to favor the luxuriant cultivation of Lactobacilli) and Brain heart infusion (BHI) broths (Huankai Microbial, Guangzhou, China) were used for bacteria isolation and cultivation. Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) method was employed to identify of bacterial species with a confidence level ≥ 90% (He et al., 2022).”

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-133). We gratefully appreciate your professional comments.

      Line 129-133: “After identified by MALDI-TOF MS, a total of 290 bacterial isolates were isolated and identified from 33 fermented yoghurts samples and 6 healthy piglet rectal content samples. Those isolates consist of 63 Streptococcus isolates, 158 Lactobacillus/Lacticaseibacillus/Limosilactobacillus isolates, and 69 Enterococcus isolates (Figure 1A, Table 1).”

      Main-point-2:

      As probiotics, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strain such as 53103.

      I am sure the authors are also interested to know if P118 is better as a probiotics candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works. Nonetheless, the door open for future research had been left in Conclusion section (see Line 477-479) “Further investigations are needed to assess whether the mechanisms observed in P118 are strain-specific or broadly applicable to other L. rhamnosus strains, or LAB species in general.”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      This reviewer appreciates the efforts from the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way which is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-133, Line 140-143, Line 325-328, Line 482-493, Line 501-502, Line 663-667, Line 709-710, Line 1003-1143). We gratefully appreciate your valuable suggestions.

      For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or cite other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 501-710).

      Another example: the figures have great resolution, but they are way too busy. The figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy for us to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1003-1024, Line 1056-1075). We gratefully appreciate your professional comments and valuable suggestions.

      More minor comments:

      line 30: spell out "C." please.

      Done as requested (see Line 29, Line 31). We gratefully appreciate your valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]. It turns out that the Clew phage are highly related, which is highlighted by the genomic comparison in the supplementary figure S1. It therefore made sense to focus our in-depth analysis on one of the phage. We have included a supplementary figure (S1A), demonstrating that the other Clew phage also require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This is now mentioned in the discussion.  

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      First off, I would like to congratulate the authors on this study and manuscript. It is very well executed and the writing and flow of the paper are excellent. The findings are intriguing and I believe the paper will be very well received by both the phage, Pseudomonas, and biofilm communities. 

      Thank you for your kind review of our work!

      I have very little to critique about the paper but I have listed a few suggestions that I believe could strengthen the paper if corrected: 

      Comments and suggestions: 

      (1) The paper initially describes 4 isolated phages but no rationale is given for why they chose to continue with CLEW-1, as opposed to CLEW-3, -6, and -10. The paper would benefit from going into more detail with phage genomics and perhaps characterize the phage receptor binding to PSL. 

      Clew-1, -3, -6, and -10 are actually quite similar to one another. The genomes are now uploaded to Genbank [accession# PQ790658.1, PQ790659.1, PQ790660.1, and PQ790661.1]. They all require an intact Psl locus for infection, we have updated Fig. S1 to show this for the remaining Clew phage. In the end, it made sense to focus on one of these related phage and characterize it in depth.

      (2) PA14 was used in some experiments but not listed in the strain table. 

      Thank you, this has been added in the resubmission.

      (3) Would have been good to see more strains/isolates used.

      We are currently characterizing the host range of Clew-1. It appears to be pretty limited, but this will likely be included in another paper that will focus on host range, not only of Clew-1, but other biofilm-tropic phage that we have isolated since then.

      (4) Could purified PSL be added to make non-PSL strain (like PA14) susceptible? 

      We have tried adding purified Psl to a psl mutant strain, but this does not result phage sensitivity. Further characterization of the Psl receptor, is something we are currently working on, but will likely be a much bigger story than can be easily accommodated in a revised manuscript.

      (5) No data on resistance development. 

      We have not done this as yet.

      (6) Alternative biofilm models. Both in vitro and in vivo. 

      We agree that exploring the interaction of Clew-1 with biofilms in greater detail is a logical next step. The revised manuscript does have data on the viability of P. aeruginosa biofilm bacteria after Clew-1 infection using either a bead biofilm model or LIVE/DEAD staining of static biofilms. However, expanding on this further (setting up flow-cell biofilms, developing reporters to monitor phage infection, etc.) is beyond the scope of this initial report and characterization of Clew-1.

      (7) There is a mistake in at least one reference. An unknown author is listed in reference 48. DA Garsin is not part of the paper. Might be worth looking into further mistakes in the reference list as I suspect this might be an issue related to the citation software.

      Thank you. Yes, odd how that extra author got snuck in. This has been corrected.

      (8) I don't seem to be able to locate a Genbank file or accession number. If it wasn't performed how was evolutionary relatedness data generated?

      The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]

      (9) No genomic information about the isolated phages. Are they temperate or virulent? This would be important information as only strictly lytic phages are currently deemed appropriate for phage therapy. 

      These phage are virulent. We have only been able to isolate resistant bacteria from plaques, but they do not harbor the phage (as detected by PCR). This matches what other researchers have found for Bruynogheviruses.

      Reviewer #2 (Recommendations for the authors): 

      Others have used different PA mutants lacking known phage receptors to pan for new phages. However, it is not totally clear how the screen here was selected for the Psl-specific phage. The authors used flagella and pili mutants and found Clew-1, -3, -6, and -10. These were all Bruynogheviruses. They also isolated a phage that uses the O antigen as a receptor. The family of this latter phage and how it is known to use this as a receptor is not described. 

      Phage Ocp-2 is a Pbunavirus. We added new supplementary figure S3, addressing the O-antigen receptor.

      The authors focused on Clew-1, but the receptor for these other Clew phages is not presented. For Clew-1 the phage could plaque on the fliF deletion mutant but not the wild-type strain. The reason for this never appears to be addressed. The authors leap to consider the involvement of c-di-GMP, but how this relates to fliF appears to be lacking. 

      We have included a supplementary figure demonstrating that all the Clew phage require Psl for infection (Fig. S1A). As noted above, we have uploaded the genomic data that underpins the comparison in our supplementary figure. The phage are all closely related. It therefore made sense to focus on one of the phage for the analysis.  

      It is particularly unclear why this phage doesn't plaque on PAO1 as this strain does make Psl. Related to this, it actually looks like something is happening to PAO1 in Figure S4 (although what units are on the x-axis is not entirely clear).

      We hypothesize that the fraction of susceptible cells in the population dictates whether the phage can make overt plaques. The supplementary figure S4 indicates that a subpopulation of the wild-type culture is susceptible and this is borne out by the fraction of wild type cells that the phage can bind to (~50%). The fliF mutation increases this frequency of susceptible cells to 80-90% (Fig. 3).

      The Tnseq screen to identify receptors is clever and identifies additional phosphodiesterase genes, the deletion of which makes PAO1 susceptible. And the screen to find resistant fliF mutants identified genes involved in Psl. However, the link between the phosphodiesterase mutants and the amount of Psl produced never appears to be established. And the statement that Psl is required for infection (line 130) is never actually tested.

      The link between c-di-GMP and Psl production is well-established in the literature. I think the requirement for Psl in infection is demonstrated multiple ways, including lack of plaque formation on psl mutant strains and lack of phage binding to strains that do not produce Psl, direct binding of the phage to affinity purified Psl.

      Figure 2C describes using a ∆fliF2 strain but how this is different (or if it is different) from ∆fliF described in the text is never explained.

      The difference in the deletions is explained in table S1, in the description for the deletion constructs used in their construction, pEXG2-∆fliF and pEXG2-∆fliF2 (∆fliF2 is smaller than ∆fliF and can be complemented completely with our complementing plasmid, pP37-fliF, which is the reason why we used the ∆fliF2 mutation going forward, rather than the ∆fliF mutation on which the phage was originally isolated).

      Similarly, there is a sentence (line 138) that "Attachment of Clew-1 is Psl-dependent" but this would appear to have no context.

      The relevant figure, Fig. 3, is cited in the next sentence and is the subject of the remaining paragraphs in this section of the manuscript.

      For Figure 3B, why wasn't the single ∆pslC mutant visualized in this analysis? Similar questions relate to the data in Figure 4.

      Analyzing the effect of the pslC deletion in the context of the ∆fliF2 mutant background, which is more permissive for phage infection, is the more stringent test.  

      The efficacy of Clew-1 in the mouse keratitis model is intriguing but it is unclear why the CFU/eye are so variable. The description of how the experiment was actually carried out is not clear. Was only one eye scratched or both? Were controls included with a scratch and no bacteria ({plus minus} phage)?

      One eye was infected. We did not conduct a no-bacteria control (just scratching the cornea is not sufficient to cause disease). The revised manuscript has an updated animal experiment in which we carried the infection forward to 72h with two phage treatments. Following this regiment, there is a significant decrease in CFU, as well as corneal opacity (disease). Variability of the data is a fairly common feature in animal experiments. There are a number of factors, such as does the mouse blink and remove some of the inoculum shortly after deposition of the bacteria or the phage after each treatment that could explain this variability.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Within the analysis we undertook we did look at paralogous blocks in pangraph, based on copy number per genome. However, this could have been clearer in the text and we will rectify this. We also focussed on duplicated/deleted blocks that were present in two of more sub-lineages. This is noted in figure 4 legend but we will make this clearer in other sections of the manuscript.

      We agree that indeed the way paralogs are handled could still be optimised, and that gene duplicates of some genes could have biological importance. The reviewer is suggesting that a synteny analysis between genomes would be best for finding specific regions that are duplicated/deleted within a genome, and if those sections are duplicated/deleted in the same regions of the genome. Since Pangraph does not give such information readily, a larger amount of analysis would be required to confirm such genome position-specific duplications. While this is indeed important, we deem this to be out of scope for the current publication, but will note this as a limitation in the discussion. However, this does not fundamentally change the main conclusions of our analysis.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph.

      Strengths:

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis.

      Weaknesses:

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable.

      We have now added a robust overview of the dataset to supplementary file 1. This is split into 3 sections: public genomes, which were assembled by others; sequenced genomes, which were created and assembled by us; the BUSCO information for all the genomes together. We did not assemble any public data ourselves but retrieved these from elsewhere. We have modified the text to be more specific on this (Line 114 onwards) and the supplementary file is updated to better outline the data.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided.

      We have now included an analysis where we looked to see if the sequencing platform influenced the resulting accessory genome size and the pseudogene count. The details of this are included in lines 207-219, and the results are outlined in lines 251-258. Essentially, we found no correlation between sequencing platform and genome characteristics, although less stringent cut-offs did suggest that PacBio SMRT-only assembled genomes may have larger accessory genomes. We do not believe this is enough to influence our larger inferences from this data. It should be noted that complete genomes, in general, give a better indication of pangenome size compared to draft genomes, as has been shown previously (e.g. Marin et al., 2024). Even with some small potential bias, this makes our analysis more robust than any previously published.

      In relation to the sequencing depth of our own data, all genomes had coverage above 30x, which Sanderson et al. (2024) has shown to be sufficient for highly accurate sequence recovery. We fixed an issue with the L9 isolate from the previous submission, which resulted in a better BUSCO score and overall quality of that isolate and the overall dataset.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done. It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies.

      We apologise for the ambiguity in the methodology. All the isolates were inputted to Pangraph to create the pangenome using this method. This is now made clearer in lines 175-177. Standard pangenome statistics (size, genome fluidity, etc.) derived from this Pangraph output are now present in the results section as well (lines 301-320).

      We then only looked at regions of difference at the sub-lineage level, meaning we grouped genomes by sub-lineage within the resulting graph and looked for blocks common between isolates of the same sub-lineage but absent from one or more other sub-lineages. We did this from both the Panaroo output and the Pangraph output and then retained only blocks found by both. The results of this are now outlined in lines 351-383.

      We focussed on these sub-lineage-specific regions to focus on long-term evolution patterns and not be influenced by single-genome short-term changes. We do not have enough genomes of closely related isolates to truly look at very recent evolution, although the small accessory genome indicates this is not substantial in terms of gene presence/absence. We also did not want potential mis-annotations in a single genome to heavily influence our findings due to the potential issues pointed out by the reviewer above. We state this more clearly in the introduction (lines 106-108), methods (lines 184-186) and results (345-347), and we indicate the limitations in the Discussion, lines 452-457 and 471-473. We also changed the title to ‘shaped’ instead of ‘driven by’.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step.

      We apologise for the confusion here as indeed the RDs terminology is very MTBC-specific. Current RDs are always relevant to H37Rv, as that is how original discovery of these regions was done and that is how RDScan works. We clarify this in the introduction (lines 67-68). If we found a large sequence polymorphism (e.g. by Pangraph) and searched for known RDs using RDScan, we then assigned a current RD name to this LSP. This uses H37Rv as a reference. If we did not find a known RD, we then classified the LSP as a new RD if it is present in H37Rv, or left the designation as an LSP if not in H37Rv, thus expanding the analysis beyond the H37Rv-centric approaches used by others previously. This is hopefully now made clearer in the methods, lines 187-194.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail.

      We have amended the introduction to explain this terminology better (lines 67-68). Naming of the RDs here came from using RDScan to assign current names to any accessory regions we found and if such a region was not a known RD, we gave it a lineage-related name, allowing for proper RD naming later (lines 187-194). Because the Bespiatyk paper is the basis for RDScan, our work implicitly compares to this throughout, as any RDs we find which were not picked up by RDScan are thus novel compared to that paper.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC.

      We have now amended the analysis to specifically designate a structural variant as a deletion if present in the majority of strains and absent in a minority, or an insertion/duplication if present in a minority and absent in a majority (lines 191-192). We also ran Panaroo without merging paralogs to examine duplication in this output; Pangraph implicitly includes paralogs already.

      From all these analyses we did not find any structural variants classed as insertions/duplications and did not find paralogs to be a major feature at the sub-lineage level (lines 377-383). While these features could be important on shorter timescales, we do not have enough closed genomes to confidently state this (limitation outlined in lines 452-457). Therefore, our assertion that deletions are a primary force shaping the long-term evolution in this group still holds.

      Reviewer #2 (Public Review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed.

      Weaknesses:

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this.

      We have included a Heaps law and genome fluidity calculation for each pangenome estimation to demonstrate that the pangenome is closed. This is detailed in lines 225-228 with results shown in lines 274-278 and 316- 320 and Supplementary Figure 2. We agree that more closely related genomes would benefit a future version of this analysis and indicate we indicate the limitations in the Discussion, lines 452-457 and 471-473.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      l. 24, "with distinct genomic features". I'm not sure what you are referring to here.

      We refer to the differences in accessory genome and related functional profiles but did not want to bloat the abstract with such additional details

      Introduction

      l. 40, "L1 to L9". A lineage 10 has been described recently: https://doi.org/10.3201/eid3003.231466.

      We have updated the text and the reference. Unfortunately, no closed genome for this lineage exists so we have not included it in the analyses. We note this in the results, like 232

      l.62/3, "caused by the absence of horizontal gene transfer, plasmids, and recombination". Recombination is not absent in the MTBC, only horizontal gene transfer seems to be, which is what the cited studies show. Indeed a few sentences later homologous recombination is mentioned as a cause of deletions.

      This has now been removed from the introduction

      l. 67, "within lineage diversity is thought to be mostly driven by SNPs". Again I'm not sure what is meant here with "driven by". Point mutations are probably the most common mutational events, but duplications, insertions, deletions, and gene conversion also occur and can affect large regions and possibly important genes, as shown in a recent preprint (https://doi.org/10.1101/2024.03.08.584093).

      We have changed the text to say ‘mostly composed of’. While indeed other SNVs may be contributing, the prevailing thought at lineage level is that SNPs are the primary source of diversity. The linked pre-print is looking at within transmission clusters and this has not been described at the lineage level, which could be done in a future work.

      l. 100/1. "that can account for variations in virulence, metabolism, and antibiotic resistance". I would phrase this conservatively since the functional inferences in this study are speculative.

      This has now been tempered to be less specific.

      Methods

      l. 108. That an assembly has a single contig does not mean that it is "closed". Many single contig assemblies on NCBI are reference-guided short-read assemblies, that is, fragments patched together rather than closed assemblies. The same could be true for long-read assemblies.

      We specifically chose those listed as closed on NCBI so rely on their checks to ensure this is true. We have stated this better in the paper, line 117.

      l. 111. From Supplementary Table 1 understand that for many genomes only the reads were available (no ASM number). Did you assemble these genomes? If yes, how? The assembly method is not indicated in the supplement, contrary to what is written here.

      All public genomes were downloaded in their assembled forms from the various sources. This is specified better in the text (line 118) and the supplementary table 1 now lists the accessions for all the assemblies.

      l. 113. How many assemblies passed this threshold? And is BUSCO actually useful to assess assembly quality in the MTBC? I assume the dynamic, repetitive gene families that cause problems for assembly and mapping in TB (PE, PPE, ESX) do not figure in the BUSCO list of single-copy orthologs.

      All assemblies passed the BUSCO thresholds for high-quality genomes as laid out in Supplementary Table 1. While indeed this does not include multi-copy genes such as PE/PPE we focussed on regions of difference at the sub-lineage level where two or more genomes represent that sub-lineage. This means any assembly issues in a single genome would need to be exactly the same in another of the same sub-lineage to be included in our results. Through this, we aimed to buffer out issues in individual assemblies.

      l. 147: Why is Panaroo used with -merge-paralogs? I understand that near-identical genes may not be too interesting from a functional perspective, but if the aim of the analysis is to make broad claims about processes driving genome evolution, paralogs should be considered.

      We chose to do so with merged paralogs to look for larger patterns of diversity beyond within-genome paralogs. Additionally, this was required to build the core phylogenetic tree. However, as the reviewer points out, this may bias our findings towards deletions and away from duplications as a primary evolutionary force.

      We repeated this without the merged paralogs option and indeed found a larger pangenome, as outlined in Table 1. However, at the sub-lineage level, this did not result in any new presence/absence patterns (lines 381-383). This means the paralogs tended to be in single genomes only. This still indicates that deletions are the primary force in the longer-term evolution of the complex but indeed on shorter spans this may be different.

      l. 153: remove the comment in brackets.

      This has been fixed and the proper URL placed in instead.

      l. 159: which genomes, and why those?

      This is now clarified to state all genomes were used for this analysis.

      l. 161, "gene blocks": since this analysis is introduced as capturing the non-coding part of the genome, maybe just call them "blocks"?

      All references to gene blocks are now changed to genomic blocks to be more specific.

      l. 162: what happens with blocks that yield no hits against RvD1, TbD1, and H37Rv?

      We named these with lineage-specific names (supplementary table 4) but did not assign RD names specifically.

      l. 164: where does the information about the regions of difference come from? How exactly were these regions determined?

      Awe have expanded this section to be more specific on the use of RDScan and new naming, along with how we determine if something is an RD/LSP.

      Results

      l. 185ff: This paragraph gives many details about the geographic origin of the samples, but what I'd expect here is a short description of assembly qualities, for example, the results of the BUSCO analysis, a description of your own Nanopore assemblies, or a small analysis of the number of indels/pseudogenes relative to sequencing technology or coverage (see comment in the public review).

      This section (lines 231-258) has been expanded considerably to give a better overview of the dataset and any potential biases. Supplementary table 1 has also been expanded to include more information on each strain.

      l. 187, "324 genomes published previously": 322 according to the methods section.

      The number has been fixed throughout to the proper total of public genomes (329).

      l. 201: define the soft core, shell, and cloud genes.

      This is now defined on line 262

      l. 228, "defined primarily by RD105 and RD207 deletions": this claim seems to come from the analysis of variable importance (Factoextra), which should be made clear here.

      This has been clarified on line 333.

      l. 237, "L8, serving as the ancestor of the MTBC": this is incorrect, equivalent to saying that the Chimpanzee is the ancestor of Homo sapiens.

      We have changed this to basal to align with how it is described in the original paper.

      l. 239, "The accessory genome of the MTBC". It is a bit confusing that the same term, 'accessory genome', is used here for the graph-based analysis, which is presented as a way to look at the non-coding part of the genome.

      We have clarified the terminology on line 347 and improved consistency throughout.

      l. 240/1, "specific to certain lineages and sublineages". What exactly do you mean by "specific" to? Present only in members of a certain lineage/sublineage? In all members of a certain lineage/sublineage? Maybe an additional panel in Figure 5, showing examples of lineage- and sublineage-specific variants, would help the reader grasp this key concept.

      We have clarified this on line 349 and the legend of what is now figure 4.

      l. 241/2, "82 lineage and sublineage-specific genomic regions ranging from 270 bp to 9.8 kb". Were "gene blocks" filtered for a minimum size, or why are there no variants smaller than 270 bp? A short description of all the blocks identified in the graph could be informative (their sizes, frequencies ...).

      Yes, a minimum of 250bp was set for the blocks to only look at larger polymorphisms. This is clarified on line 177 and 304.

      A second point: It is not entirely clear to me what Figure 6 is showing. Are you showing here a single representative strain per sublineage? Or have you somehow summarized the regions of difference shown in Figure 5 at the sublineage level? What is the tree on the left? This should be made clear in the legend and maybe also in the methods/results.

      In figure 4 (which was figure 6), because each RD is common to all members of the same sub-lineage, we have placed a single branch for each sub-lineage. This is has been clarified in the legend.

      l. 254, "this gene was classified as being in the core genome": why should a partially deleted gene not be in the core genome?

      You are correct, we have removed that statement.

      l. 258/259, "The Pangraph alignment approach identified partial gene deletion and non-coding regions of the DNA that were impacted by genomic deletion". I do not understand how you classify a structural variant identified in the pangenome graph as a deletion or an insertion.

      This has been clarified as relative to H37Rv, as this is standard practice for RDs and general evolutionary analyses in MTBC, as outlined above.

      l. 262/263 , "the accessory genome of the MTBC is small and is acquired vertically from a common ancestor within the lineage". If deletion is the main process involved here, "acquired" seems a bit strange.

      We agree and changed the header to better reflect the discussion on mis-annotation issues

      Figure 1: Good to know, but not directly relevant for the rest of the paper. Maybe move it to the supplement?

      This has been moved to Supplementary figure 1

      Figure 2: the y-axis is labeled 'Variable genome size', but from the text and the legend I figure it should be 'Number of accessory genes'?

      This has been changed to ‘accessory genes’ in Figure 1 (which was figure 2 in previous version).

      Figure 4: too small.

      We will endeavour to ensure this is as large as possible in the final version.

      Discussion

      l. 271, "MTBC accessory genome is ... acquired vertically". See above.

      Changed, as outlined above.

      l. 292, "appeared to be fragmented genes caused by misassemblies". Is there a way to distinguish "true" pseudogenes from misassemblies? This could be a relevant issue for low-coverage long-read assemblies (see public review).

      Not that we are currently aware of, but we do know other groups which are working on this issue.

      l. 300/1, "the whole-genome approach could capture higher genetic variations". Do you mean the graph approach? I'm not sure that comparing the two approaches here makes sense, as they serve different purposes. A pangenome graph is a summary of all genetic variation, while the purpose of Panaroo is to study gene absence/presence. So by definition, the graph should capture more genetic variation.

      This statement was specifically to state that much genetic variation in MTBC is outside the coding genes and so traditional “pangenome’ analyses are actually not looking at the full genomic variation.

      l. 302/3, "this method identified non-coding regions of the genome that were affected by genomic deletions". See the comments above regarding deletions versus insertions. I'd say this method identifies coding and non-coding regions that were affected by genomic deletions and insertions.

      We have undertaken additional analyses to be sure these are likely deletions, as outlined above.

      l. 305: what are "lineage-independent deletions"?

      We labelled these as convergent evolution, now clarified on line 443.

      l. 329: How is RD105 "caused" by the insertion of IS6110? I did not find RD105 mentioned in the Alonso et al. paper. Similarly below, l. 331, how is RD207 "linked" to IS6110?

      The RD105 connection was misattributed as IS6110 insertion is related to RD152, not RD105. This has now been removed.

      RD207 is linked to IS6110 as its deletion is due to recombination between two such elements. This is now clarified on line 486.

      l. 345, "the growth advantage gene group": not quite sure what this is.

      We have fixed this on line 499 to state they are genes which confer growth advantages.

      l. 373ff: The role of genetic drift in the evolution of the MTBC is an open question, other studies have come to different conclusions than Hershberg et al. (this has been recently reviewed: https://doi.org/10.24072/pcjournal.322).

      We have outlined this debate better in lines 527-531

      l. 375/6, "Gene loss, driven by genetic drift, is likely to be a key contributor to the observed genetic diversity within the MTBC." This sentence would need some elaboration to be intelligible. How does genetic drift drive gene loss?

      We have removed this.

      l. 395/6, "... predominantly driven by genome reduction. This observation underlines the importance of genomic deletions in the evolution of the MTBC." See comments above regarding deletions. I'm not convinced that your study really shows this, as it completely ignores paralogs and the processes counteracting reductive genome evolution: duplication and gene amplification.

      As outlined above, we have undertaken additional analyses to more strongly support this statement.

      l. 399, "the accessory genome of MTBC is a product of gene deletions, which can be classified into lineage-specific and independent deletions". Again, I'm not sure what is meant by lineage-independent deletions.

      We have better defined this in the text, line 443, to be related to convergent evolution.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      In lines 120-121, it is mentioned that TB-profiler v4.4.2 was used for lineage classification, but this version was released in February 2023. As I understand there have been some changes (inclusion/exclusion) of certain lineage markers. Would it not be appropriate to repeat lineage classification with a more recent version? This would of course require extensive re-analysis, so could the lineage marker database perhaps also be cited.

      We have rerun all the genomes through TB-Profiler v6.5 and updated the text to state this; the exact database used is also now stated.

      Could the authors perhaps include the sequencing summary or quality of the nanopore sequences? The L9 (Mtb8) sample had a relatively lower depth and resulted in two contigs. Yet one contig was the initial inclusion criteria. It is unclear whether these samples were excluded from some of the analyses. Mtb6 also has relatively low coverage. Was the sequencing quality adequate to accurately identify all the lineage markers, in particular those with a lower depth of coverage? Could a hybrid approach be an inexpensive way to polish these assemblies?

      We reanalysed the L9 sample and, with some better cleaning, got it to a single contig with better depth and overall score. This is outlined in the Supplementary table 1 sheets. While depth is average, it is still above the recommended 30x, which is needed for good sequence recovery (Sanderson et al., 2024). We did indeed recover all lineage markers from these assemblies.

      Recommendations for improving the writing and presentation.

      The introduction is well-written and recent MTBC pangenomic studies have been incorporated, but I am curious as to why this paper was not referred to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922483/ I believe this was the first attempt to study the pangenome, albeit with a different research question. Nearly all previous analyses largely focused on utilizing the pangenome to investigate transmission.

      Indeed this study did look at a pangenome of sorts, but specifically SNPs and not genes or regions. Since the latter is the main basis for pangenome work these days, we chose not to include this paper.

      Minor corrections to the text and figures.

      In line 129, it is explained that DNA was extracted to be suitable for PacBio sequencing, but ONT sequencing was used for the 11 new sequences. Is this a minor oversight or do the authors feel that DNA extracted for PacBio would be suitable for ONT sequencing? It is a fair assumption.

      We apologise, this is a long-read extraction approach and not specific to PacBio. We have amended the text to state this.

      In line 153, this should be removed: (Conor, could you please add the script to your GitHub page?).

      This has been fixed now.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminished by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the editors and reviewers for their constructive feedback and careful consideration of our manuscript. Despite their acknowledgment of the potential of our study to yield valuable insights into the role of CF activity in cerebellar learning and its phase-specific involvement, we have meticulously addressed all the methodological concerns raised by providing additional clarifications and explanations in this letter.

      In response to concerns regarding the efficacy of long-term optogenetic inhibition, we conducted additional in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase (Figure 2, lines 112-139). Although stable single-unit recording beyond 40 minutes was not feasible due to technical challenges, the robust suppression of CF-evoked complex spikes we observed during this period (Figure 2, lines 112–139) provides strong evidence that halorhodopsin-mediated inhibition persists over the longer irradiation intervals employed in our behavioral assays.

      Moreover, given that there is a concern regarding the CaMKII promoter also inducing expression in neighboring mossy fibers, potentially affecting simple spike activity, we have presented data in Figure 2C, which illustrates that PC simple spike firing rates remain unchanged during prolonged illumination. This finding confirms that our optogenetic manipulation selectively disrupts CF-mediated complex spikes without influencing mossy fiber to PC transmission. We have elucidated these results further in lines 128 to 136.

      Lastly, we have broadened our Discussion to consider alternative mechanisms of CF involvement in cerebellar learning, including the modulation of molecular layer interneurons (Rowan et al., 2018) and direct CF interactions with vestibular nuclear neurons (Balaban et al., 1981), thereby offering a more comprehensive perspective on the multifaceted role of CF signaling. Specific clarifications regarding these points are articulated from lines 222 to 242 and 243 to 254 in the manuscript. We are confident that these revisions adequately address the reviewers' concerns and further substantiate the specificity and significance of our study findings

      (1) Rowan, Matthew JM, et al. "Graded control of climbing-fiber-mediated plasticity and learning by inhibition in the cerebellum." Neuron 99.5 (2018): 999-1015.

      (2) Balaban, Carey D., Yasuo Kawaguchi, and Eiju Watanabe. "Evidence of a collateralized climbing fiber projection from the inferior olive to the flocculus and vestibular nuclei in rabbits." Neuroscience letters 22.1 (1981): 23-29.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Thank you very much for this positive evaluation of our work. We would like to respectfully point out to the Reviewer that our current study was conducted using zebrafish as a model and not goldfish. We have revised the paper to eliminate any gaps in the data presentation.

      Strengths

      (1) The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and a temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      (2) The use of calcium indicators with very different affinities and different intracellular calcium buffers helps provide confirmation of key results.

      Thank you very much for this positive evaluation of our work.

      Weaknesses

      (1) Multiple key points of the paper lack statistical tests or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      Thank you for this feedback. We have addressed this in our revised manuscript where possible. We now include the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & C, Fig. 3C & D, Fig. 4 C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we now include the results of an unpaired t-test. We have now included the t-test statistics information in the respective figure legends in the revised version.

      Regarding the Reviewer’s concern that “values for time to half-maximal peak fluorescence are given for one example cell, but no statistics or summary are provided,” we estimated the fluorescence rise times by only fitting the average traces to compare the overall qualitative behavior of the corresponding calcium indicator fluorescence. We did attempt to analyze the uncertainty for the rise-time estimates, but the simultaneous fitting of the rise- and decay-behavior of time traces is notoriously sensitive to noise, and therefore, a much higher signal-to-noise ratio would be required to provide reliable uncertainty estimation for the corresponding rise-time and decay-time characteristics. This is now explicitly explained in the corresponding Methods subsection.

      In Figure 8, we now show example fluorescence traces from one cell at the bottom of the A and D panels, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      (2) Figure 5 is confusing. The figure caption describes red, green, and blue traces, but the figure itself has only two traces in each panel and none are red, green, or blue. It's not possible currently to evaluate this figure.

      Thank you for pointing out this oversight. The figure shows the proximal and distal calcium signals, not the cytoplasmic ones. The figure caption was adjusted to correctly reflect what is shown in the figure.

      (3) The rise time measurements in Figure 2 are very different for low and high-affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different from the two indicators. That might suggest that the high-affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

      We agree with the Reviewer and had mentioned in the text that we do believe that the high-affinity version of the dye is at least partially saturated. This will be especially a problem for strong depolarizations and signals near the membrane. We slightly changed the corresponding description of results on page 6 to acknowledge this point: “However, it should be noted that Cal520HA will be at least partially saturated at the Ca2+ levels expected in Ca2+ microdomains relevant for vesicle exocytosis, affecting both the amplitude and the kinetics of the fluorescence signal”. 

      Recommendations:

      (1) It would be good to describe the location of calcium channels relative to the ribbon in the introduction.

      We have provided this information in the discussion (please see p. 19: “The faster, smaller, and more spatially confined Ca<sup>2+</sup> signals that are insensitive to the application of high concentrations of exogenous Ca<sup>2+</sup> buffers, referred to here as ribbon proximal Ca<sup>2+</sup> signals, could be due to Ca<sup>2+</sup> influx through Cav channel clusters beneath the synaptic ribbon”). We have now provided this information in the last paragraph of the introduction as well. 

      (2) The introduction is quite technical and would benefit from a more complete description of the findings of the paper (e.g. expanding the last sentence to a full paragraph).

      We have updated the last paragraph of the introduction as per the reviewer’s advice.

      (3) It is not clear that the capacitance measurements in Figure 1 are needed (I did not see them used anywhere else in the paper).

      We have removed the capacitance measurements from the figure.

      (4) Please add legends in the figures themselves defining different line colors and weights so that a reader does not need to search for them in the figure caption.

      We agree that such figure improvements facilitate reading. We have added legends in the figures themselves, where appropriate.

      (5) The insets with the expanded traces in many cases are too small - e.g. Figure 1F.

      We have enlarged the insets in applicable figures as much as possible to facilitate visualization. These changes can be seen in Figures 1, 2, 3, 4, 5, and 8, as well as Supplementary Figure 3.

      (6) Page 5, statistics for amplitude of calcium changes. Is p < 0.001 really correct here? The SEMs indicate an overlap of the two distributions of mean amplitudes - and later data for which you give p = 0.001 has much less overlap.

      Since the two data sets in question come from paired recordings, with a high Pearson correlation coefficient of 0.93, the p-values are in fact, correct despite this significant overlap. We conducted paired-t-tests to compare proximal vs. distal calcium signals obtained from a single calcium indicator shown in Fig. 2A & C, Fig. 3C & D, Fig.4 C & D, Fig.5A-D, and Fig. 8E&F. For experiments where we make comparisons across cells or across different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we performed an unpaired t-test. In response to the Reviewer’s comment, we now provide details on t-statistics in the respective figure legends in the revised version.

      (7) The text on page 6 describing Figure 3 appears to repeat several technical aspects of the measurements that have already been described in Figure 1. I would reduce that overlap as it is confusing for a reader.

      Since Fig.1 describes calcium measurements with free calcium indicator, whereas Fig.3 describes bound calcium indicator, we would prefer to keep the information for the sake of completeness, despite some small amount of repetition.

      (8) Figure 4A needs to be described in more detail.

      We have provided the vesicle pool details in the Supplementary Fig. 1.

      (9) The text in Figure 7 is too small.

      We have redone Fig. 7 and Supplemental Fig. 4 to ensure that the tick labels and other text are sufficiently large.

      (10) Are the units (nM) in Figure 8 correct?

      Thank you for pointing that out. The units were supposed to be µM and have been corrected in the figure.

      Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Thank you very much for this appreciation of our work.

      Weaknesses:

      Heterogeneity in the spatiotemporal dynamics of Ca2+ influx was not convincingly related to ribbon size, nor was the functional relevance of Ca2+ dynamics to rod bipolars demonstrated (e.g., exocytosis to different postsynaptic targets). In addition, the study would benefit from the inclusion of the Ca2+ currents that were recorded in parallel with the Ca2+ imaging.

      Thank you for this critique. We agree that our data do not establish the relationship between ribbon size and Ca<sup>2+</sup> signal. By analogy to the hair cell literature, we believe that it is a reasonable hypothesis, but more studies will be necessary to definitively determine whether the signal relates to ribbon size or synaptic signaling. This will be addressed in future experiments.

      We have included the calcium current recorded in parallel with calcium imaging in Fig.1, when we show a single example. We now do the same for individual examples shown in Fig. 8 A and D, bottom. The calcium imaging data shown in Figs. 2-5 and Supp. Fig. 3 is the average trace, thus we have provided the averages of the peak calcium current and statistics. Since in Figure 8D-F some ribbons only have one reading, we have not conducted statistical analysis in this case. 

      Recommendations:

      The major conclusion of the work is that within bipolar cells, heterogeneity exists between Ca2+ microdomains formed at synaptic ribbons, which is supported by the results; however, what causes this is not clear. Most of the comments below are suggestions that hopefully help the authors strengthen the association of Ca2+ domain heterogeneity with features of ribbon AZs or at least offer additional options for the authors to communicate their work.

      (1) In the current study, anatomical segregation of SRs by size does not appear to exist across the ZF rod bipolar terminal, nor has this been reported for mouse rod bipolars. In the absence of this, the current study lacks the fortuitous attributes, and thus reasoning, utilized in the hair cell (HC) studies (those cited in the current MS). Namely, the HC studies utilized the following anatomical features to compare EM, IF, and physio results: a) identified differences in ribbon synapses along a tonotopic gradient (basal to apical cochlea), b) compared ribbons on different sides of an inner HC (pillar vs. modiolar), or c) examined age-dependent changes in HC ribbons.

      Thank you for this comment. We agree that we do not show any interesting systematic relationships between ribbon size and cell position or other large-scale morphological features. We added text on page 19 to stress this (“However, in comparing our findings with studies of ribbon size heterogeneity in hair cell…”). However, to our knowledge, diversity in ribbon size has never been reported in bipolar cells. 

      (2) In the absence of intrinsic topographical segregation in ribbon size within rod bipolars, then a) the imaging data attained from dissoc cells needs to be internally as sound as possible, and b) the parameters used to define ribbon dimensions in light (LM) and electron microscopy should be as communicative/interchangeable as possible.

      Thank you for this comment. Our confocal images show a moderate correlation between ribbon size measured as fluorescence of ribeye binding peptide vs. calcium hot spots.  Similarly, SBF-SEM images demonstrate that the ribbon active zone length vs width show a moderate correlation. We have summarized these findings in Figure 11. Thus, as the Reviewer pointed out, our confocal and SBF-SEM findings support each other.

      (3) It is not entirely clear how the authors distinguish rod bipolars (a subset of On-bipolars) from all other ON-bipolars? The two different preparations: dissoc or intact retina, present distinct challenges. In the example presented in Supplementary Figure 2B, the PKCalpha stained bipolar has an axon that is approx. 25 um long, but the expected length should be approx. 50um based on ZF retinal anatomy and recent study on rbc1/2 (Hellevik et al BioRxiv 2023). One could argue rather that the enzymatic treatment or mechanical shear forces caused the axon to shrink. If that is the line of reasoning, then present a low mag field of view with an assortment of dissoc bipolars stained for PKCalpha, zoom in, and describe cell morphologies and their assignment as PKCa + or -. Then you can summarize how axon terminal size, axon length, and PKC staining are or aren't correlated. Based on the results, one might have to perform IF on each dissoc cell that was assayed under LM (Ca2+ imaging) and ephys to verify it's a rod bipolar. In the case of the EM, the authors refer to the terminals analyzed as rbcs because they have larger terminals and less branching than the cbs. Since these are really nice EM images, data-rich, with better resolution than I have ever seen for retinal SBF-EM, do due diligence by tracing the terminals of neighboring bcs (ignoring details within terminals just outline terminals) and make a visual presentation that illustrates that those you selected as rbs have larger terminals than cbs (this can also give of sense of the density distribution of terminal types). Is there a published ephysio on the ZF rbcs which has been correlated with morphology? The Hellevik et al BioRxiv 2023 study shows light responses but not necessary rbcs distinguished from other On-bcs.

      We have quantified the number of rod bipolar cells obtained from our isolation procedure using two approaches: 1. To fix the isolated bipolar cells and perform immunofluorescence with PKC alpha. 2. To isolate bipolar cells from Tg(vsx1: memCerulean)<sup>q19</sup> transgenic zebrafish, labeling rod bipolar cell type 1 (RBC1) that we recently obtained from Dr. Yoshimatsu (Hellevik et al., 2024). Of note, the circuitry of RBC1 has been shown to be similar to the mammalian rod bipolar cell pathway (Hellevik et al., 2024). Below, we list our findings:

      The average terminal size of fixed bipolar cells labeled with PKC alpha was 5.9 ± 0.2 mm, whereas the freshly isolated living bipolar cells used for our physiology experiments had an average terminal size of 6.3 ± 0.2 mm, and the rod bipolar cells from the Tg(vsx1: memCerulean)<sup>q19</sup> line had an average terminal size of 6.9 ± 0.2 mm. We also measured terminal size for fixed bipolar cells, unlabeled with PKC alpha: 3.3 ± 0.2 mm, and unlabeled cells from Tg(vsx1: memCerulean)<sup>q19</sup> cells: 4.0± 0.2 mm.

      In addition, we also pay attention to the soma shape and dendrites, as the primary dendrite of the RBC is thick and short. Connaughton and Nelson have done a thorough analysis of morphological classification. But no measurements were given. https://onlinelibrary.wiley.com/doi/10.1002/cne.20261. Since the axon length is not retained during the isolation procedure, we do not use it as an identification marker for rod bipolar cells in our experiments.

      We re-imaged vsx1 with the DIC channel to compare the terminal sizes of fluorescently labeled RBC1 terminals with those of other BPCs in the DIC channel. Below are the images that can give a sense of the density distribution of terminal types and measurements.

      Author response image 1.

      Tracing all neighboring terminals in SBF-SEM is laborious and beyond the scope of this manuscript, but we will do full reconstructions in a future publication.

      (4) How to strengthen the description of heterogeneity within the dissoc measurements? There are two places in the LM data where heterogeneity may be relevant. The first point here is that Ribbon size (TAMRA- Ribeye binding peptide) and active zone size (Cal520HA/LA-RBP) measurements depend on labelling the ribbon/Ribeye; thus, Ribbon size and AZ size should be correlated on this basis alone. I would expect Pearson's r value to show a stronger association (r > 0.7) than what is reported in Figure 11B/C (r: 0.52 or 0.32). I would interpret a moderate to weak correlation (r < 0.5 to 0.3) as an indication that ribbons are heterogeneous (variability in Ca influx per unit ribbon size). Now to the second point, in Figure 8 and Supplementary Figure 5 there is time-signal amplitude heterogeneity. >>> My curiosity is whether signal amplitude is heterogeneous in space (ribbon size, my speculation) and in time (complex, but compare ribeye bound and free Ca2+ indicator)? It seems like the data in Figure 8 and 11 should cross over and possibly offer the authors more to say.

      We appreciate the Reviewer’s insightful observation and added a sentence at the very end of the Results section reflecting the Reviewer’s argument (“we note that a large correlation between the inferred ribbon size and active zone size…”)

      The Reviewer’s second point about the connection between heterogeneity of signal amplitude in space and in time is an interesting one as well and could be grounds for an additional investigation in the future.

      (5) As the authors know, a very powerful tool for exploring Ca microdomain dynamics is to exploit the Voltage dependence of Cavs (as exemplified in the numerous HC studies that are cited). An I-V protocol would provide a valuable means to illustrate different rates of saturating the LA and HA Ca indicators. More generally, the Ca currents and associated patch clamp parameters (Gm, leak...) can tell us much about the health of the cell and provide an added metric to assess normal variability between cells. A few places in the MS currents are mentioned yet this data is missing (Figure S5 , last line: Amplitude variability between two cells with similar Ca currents.).

      Thank you for the valuable suggestion. We will include I-V protocol across several ribbons in future experiments.  We have included the calcium currents for all the calcium transient traces. We have also included the statistics to compare those currents across conditions.

      Technical comments

      (6) Since the Ribeye-Ca2+ indicator covers the entire ribbon, it will contribute to a signal gradient. The proximal signal is assumed to be closest to the base of the ribbon where presumably the Cav channels are located, and the distal signal will originate from the top (apex) of the ribbon some 200 nm from the base of the ribbon. Have you tried to measure "ribbon lengths and widths" with the HA and LA Ca indicators? My guess would be that the LA will show a gradient, and give you a better indication of the base of the ribbon; whereas the HA signal will have dimensions similar to the TAMRA-peptide.

      Due to the point spread function limitation in the light microscopy, we obtained all ribbon measurements from the SBF-SEM images only. 

      As a surrogate for size in the light microscopy, we used ribbon fluorescence, which we expect should scale with the number of ribeye molecules in the ribbon (Figure 11B) 

      (7) Normalize proximal and distal LM data to highlight kinetic differences (Fig 2-5, 8), and when describing temporal heterogeneity please use a better description that includes time, such as time-to-pk, and decay1, decay 2....

      In the current manuscript, we only focus on the amplitude as it provides the information about the number of calcium channels. We used the rise time measurements to compare the time to reach the peak amplitude at the proximal vs. distal locations, demonstrating that proximal calcium signals reach the peak faster since the calcium channels are located beneath the ribbon.

      We tried to perform fittings to the individual traces. Since they are too noisy to pick out true kinetic differences between ribbons, we would need to average several traces from each ribbon. We plan to apply our high-resolution approach established in this paper to a longer stimulus and perform the fittings as per the Reviewer’s advice for a future paper.

      We now describe on pages 6-7 the two decay components for data in Figs. 2 and 3.

      (8) Why not measure ribbon length in EM as done in confocal and then compare lengths from LM and EM. In Figure S8, you have made a nice presentation of AZ Area from EM. Make similar plots for EM ribbon length (and width?), and compare the distributions to Figure 11 LM data. Maybe use other statistical descriptions like Coeff of Var or look for different populations by using multi-distribution fits. If the differences in length or area (EM data) can be segregated into short and long distances, then a similar feature might arise from the LM data. If no such morphological segregation exists, then the heterogeneity in Ca microdomains may arise from variable Cav channel density or gating, Ca buffer, etc.

      Due to the point spread function limitation in light microscopy, the size of the ribbon dimensions in light microscopy cannot be reliably measured. As a surrogate, we used total fluorescence of the ribbon, which should correlate with the number of ribeye molecules in the ribbon. To obtain ribbon dimensions, we used measurements from the SBF-SEM images only. We summarized the distribution of ribbon width and length in Figures 11C and 11D. The distribution of the active zone size is summarized in Supplementary Figure 8. Pearson’s correlation coefficients are positive, but a weak correlation, suggesting multiple mechanisms likely to contribute to heterogeneity in the local calcium signals as the Reviewer pointed out.

      (9) Again, the quality of the EM data is great, and sufficient to make the assignment of SVs to different pools, as you have done in Fig S1. My only complaint is that the Ultrafast pool as indicated in the schematic of S1A seems to have a misassignment with respect to the green SV that is 15 nm from the PM. In the original Mennerick and Matthews 1996 study, the UF pool emptied in ~1msec. The morphological correlate for the UF has been assumed to be SVs touching the plasma membrane. 15 nm away is about 14 nm too far to be in the UF.

      Thank you for pointing that out. We have updated the vesicles labeling in Supplementary Figure 1 and Main Figure 4.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons, and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Thank you very much for this positive evaluation of our work.

      Strengths:

      The study is in principle technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Thank you very much for this appreciation.

      Weaknesses:

      Peptides may not be entirely specific, and the genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. I also feel that "Nano-physiology" is overselling, because the measured Ca is most likely the local average surrounding synaptic ribbons. With this approach, nobody knows about the real release site Ca or the Ca relevant for synaptic vesicle replenishment. It is rather "microdomain physiology" which measures the local Ca near synaptic ribbons, relatively large structures responsible for fusion, replenishment, and recycling of synaptic vesicles.

      The peptide approach has been used fairly extensively in the ribbon synapse field and the evidence that it efficiently labels the ribbon is well established, however, we do acknowledge that the peptide is in equilibrium with a cytoplasmic pool. Thus, some of the signal arises from this cytoplasmic pool. The alternative of a genetically encoded Ca-indicator concatenated to a ribbon protein would not have this problem, but would be more limited in flexibility in changing calcium indicators. We believe both approaches have their merits, each with separate advantages and disadvantages.

      As for the nano vs. micro argument, we certainly do not want to suggest that we are measuring the same nano-domains, on the spatial scale of 10s of nanometers, that drive neurotransmitter release, but we do believe we are in the sub-micrometer -- 100s of nm -- range. We chose the term based on the usage by other authors to describe similar measurements (Neef et al., 2018; https://doi.org/10.1038/s41467-017-02612-y), but we see the reviewer’s point.

      Recommendations:

      I have no recommendation for additional experiments. However, the statement of "nanophysiology" is too much, and the authors should tone done the ms recognizing some caveats.

      As we mention above, we chose the term based on the usage by other authors to describe similar measurements, and we do believe that we achieve resolution of a few hundred nanometers, and therefore would prefer to keep the current title of the manuscript. For example, Figure 5E shows that, with ribeye-bound low-affinity calcium indicator, the proximal calcium signals were preserved in the presence of BAPTA, rising and decaying abruptly, as expected for a nanodomain Ca<sup>2+</sup> elevation. Thus, we believe that this measurement in particular describes a nanodomain-scale signal. However, we acknowledge that we are not currently able to resolve the spatial distribution of Ca<sup>2+</sup> signals with a spatial resolution of 10s of nanometers.

    1. Author response:

      The following is the authors’ response to the original reviews

      Life Assessment

      The authors use a synthetic approach to introduce synaptic ribbon proteins into HEK cells and analyze the ability of the resulting assemblies to cluster calcium channels at the active zone. The use of this ground-up approach is valuable as it establishes a system to study molecular interactions at the active zone. The work relies on a solid combination of super-resolution microscopy and electrophysiology, but would benefit from: (i) additional ultrastructural analysis to establish ribbon formation (in the absence of which the claim of these being synthetic ribbons might not be supported; (ii) data quantification (to confirm colocalization of different proteins); (iii) stronger validation of impact on Ca2+ function; (iv) in depth discussion of problems derived from the use of an over-expression approach.

      We thank the editors and the reviewers for the constructive comments and appreciation of our work. Please find a detailed point-to-point response below. In response to the critique received, we have now (i) included an ultrastructural analysis of the SyRibbons using correlative light microscopy and cryo-electron tomography, (ii) performed quantifications to confirm the colocalisation of the various proteins, (iii) discussed and carefully rephrased our interpretation of the role of the ribbon in modulating Ca<sup>2+</sup> channel function and (iv) discussed concerns regarding the use of an overexpression system. 

      Public Reviews:

      Reviewer #1 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript. We have completely overhauled the manuscript taking the suggestions of the reviewer into account.

      (1) Are these truly "synthetic ribbons". The ribbon synapse is traditionally defined by its morphology at the EM level. To what extent these structures recapitulate ribbons is not shown. It has been previously shown that Ribeye forms aggregates on its own. Do these structures look any more ribbonlike than ribeye aggregates in the absence of its binding partners?

      We thank reviewer 1 for their constructive feedback and critique of the work. 

      We agree that traditionally, ribbon synapses have always been defined by the distinct morphology observed at the EM level. However, since the discovery of the core-components of ribbons (RIBEYE and Piccolino) confocal and super-resolution imaging of immunofluorescently labelled ribbons have gained importance for analysing ribbon synapses. A correspondence of RIBEYE immunofluorescent structures at the active zone to electron microscopy observations of ribbons has been established in numerous studies (Wong et al, 2014; Michanski et al, 2019, 2023; Maxeiner et al, 2016; Jean et al, 2018) even though direct correlative approaches have yet to be performed to our knowledge. We have now analysed SyRibbons using cryo-correlative electron-light microscopy. We observe that GFPpositive RIBEYE spots corresponded well with electron-dense structures, as is characteristic for synaptic ribbons (Robertis & Franchi, 1956; Smith & Sjöstrand, 1961; Matthews & Fuchs, 2010). We could also observe SyRibbons within 100 nm of the plasma membrane (see Fig. 3). We have now added this qualitative ultrastructural analysis of SyRibbons in the main manuscript (lines 272 - 294, Fig. 3 and Supplementary Fig. 3).

      (2) No new biology is discovered here. The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two. Even the membrane localization of the complexes required the introduction of a membraneanchoring motif.

      We respectfully disagree with the overall assessment. Our study emphasizes the synthetic establishment of protein assemblies that mimic key aspects of ribbon-type active zone, defining minimum molecular requirements. Numerous previous studies have described the role of the synaptic ribbon in organising the spatial arrangement of Ca<sup>2+</sup> channels, regulating their abundance and possibly also modulating their physiological properties (Maxeiner et al, 2016; Frank et al, 2010; Jean et al, 2018; Wong et al, 2014; Grabner & Moser, 2021; Lv et al, 2016). We would like to highlight that there remain major gaps between existing in vitro and in vivo data; for instance, no evidence for direct or indirect interactions between Ca<sup>2+</sup> channels and RIBEYE have been demonstrated so far. While we do indeed take advantage of previously known interactions between RIBEYE and Bassoon (tom Dieck et al, 2005); between Bassoon, RBP2 and P/Q-type Ca<sup>2+</sup> channels (Davydova et al, 2014); and between RBP2 and Ltype Ca<sup>2+</sup> channels (Hibino et al, 2002), our study tries to bridge these gaps by establishing the indirect link between the synaptic ribbon (RIBEYE) and L-type CaV1.3 Ca<sup>2+</sup> channels using a bottom-up approach, which has previously just been speculative. Our data shows how even in a synapse-naive heterologous expression system, ribbon synapse components assemble Ca<sup>2+</sup> channel clusters and even show a partial localisation of Ca<sup>2+</sup> signal. Moreover, we argue that the established reconstitution approach provides other interesting insights such as laying ground-up evidence supporting the anchoring of the synaptic ribbon by Bassoon. Finally, we expect that the established system will serve future studies aimed at deciphering the role of putative CaV1.3 or CaV1.4 interacting proteins in regulating Ca<sup>2+</sup> channels of ribbon synapses by providing a more realistic Ca<sup>2+</sup> channel assembly that has been available in heterologous expression systems used so far. In response to the reviewers comment we have augmented the discussion accordingly.  

      (3) The only thing ribbon-specific about these "syn-ribbons" is the expression of ribeye and ribeye does not seem to participate in the localization of other proteins in these complexes. Bsn, Cav1.3 and RBP2 can be found in other neurons.

      The synaptic ribbon made of RIBEYE is the key molecular difference in the molecular AZ ultrastructure of ribbon synapses in the eye and the ear. We hypothesize the ribbon to act as a superscaffold that enables AZ with large Ca<sup>2+</sup> channel assemblies and readily releasable pools. In further support of this hypothesis, the present study on synthetic ribbons shows that CaV1.3 Ca<sup>2+</sup> channel clusters are larger in the presence of SyRibbons compared to SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tetratransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, and RIBEYE, Fig. 6). In response to the reviewers comment we now added an analysis of triple-transfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon), in which CaV1.3 Ca<sup>2+</sup> channel clusters again are significantly smaller than at the SyRibbons and indistinguishable from SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters (Fig. 6E, F).

      (4) As the authors point out, RBP2 is not necessary for some Ca channel clustering in hair cells, yet seems to be essential for clustering to bassoon here.

      Here we would like to clarify that RBP2 is indeed important in inner hair cells for promoting a larger complement of CaV1.3 and RBP2 KO mice show smaller CaV1.3 channel clusters and reduced whole cell and single-AZ Ca<sup>2+</sup> influx amplitudes (Krinner et al, 2017). However, a key point of difference we emphasize on is that even though CaV1.3 clusters appeared smaller, they did not appear broken or fragmented as they do upon genetic perturbation of Bassoon (Frank et al, 2010), RIBEYE (Jean et al, 2018) or Piccolino (Michanski et al, 2023). This highlights how there may be a hierarchy in the spatial assembly of CaV1.3 channels at the inner hair cell ribbon synapse (also described in the discussion section “insights into presynaptic Ca<sup>2+</sup> channel clustering and function”) with proteins like RBP2 regulating abundance of CaV1.3 channels at the synapse and organising them into smaller clusters – what we have termed as “nanoclustering”; while Bassoon and RIBEYE may serve as super-scaffolds further organizing these CaV1.3 nanoclusters into “microclusters”. Observations of fragmented Ca<sup>2+</sup> channel clusters and broader spread of Ca<sup>2+</sup> signal seen upon Ca<sup>2+</sup> imaging in RIBEYE and Bassoon mutants (Jean et al, 2018; Frank et al, 2010; Neef et al, 2018), and the absence of such a phenotype in RBP2 mutants (Krinner et al, 2017) may be explained by such a differential role of these proteins in organising Ca<sup>2+</sup> channel spatial assembly. The data of the present study on reconstituted ribbon containing AZs are in line with these observations in inner hair cells: RBP2 appears important to tether Ca<sup>2+</sup> channels to Bassoon and these AZ-like assemblies are organised to their full extent by the presence of RIBEYE. As mentioned in the response to point 3 of the reviewer, we have now further strengthened this point by adding the analysis of SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tripletransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, Fig. 6E, F). Moreover, we have revised the discussion accordingly. 

      (5) The difference in Ca imaging between SyRibbons and other locations is extremely subtle.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of  SyRibbons and provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerably high expression throughout the membrane even in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B, where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons (for an opposing scenario, please see the cell in Fig. 6B upper panel with very localised CaV1.3 distribution underneath SyRibbons). This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a remarkably big difference in Ca<sup>2+</sup> influx due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). However, it was the spatial spread of the Ca2+ signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca2+ hotspots seen in the wild-type controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters (see also our response to points 3 and 4 of the reviewer): this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      (6) The effect of the expression of palm-Bsn, RBP2 and the combination of the two on Ca-current is ambiguous. It appears that while the combination is larger than the control, it probably isn't significantly different from either of the other two alone (Fig 5). Moreover, expression of Ribeye + the other two showed no effect on Ca current (Figure 7). Also, why is the IV curve right shifted in Figure 7 vs Figure 5?

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. As the reviewer also correctly pointed out, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. Moreover, we would like to thank reviewer 1 for pointing out the right shift in the IV curve which was due to an error in the values plotted on the x-axis. This has been corrected in the updated version of the manuscript. 

      (7) While some of the IHC is quantified, some of it is simply shown as single images. EV2, EV3 and Figure 4a in particular (4b looks convincing enough on its own, but could also benefit from a larger sample size and quantification)

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      Reviewer #2 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).

      We acknowledge this limitation highlighted by the reviewer arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. 

      (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Why didn't the authors try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell?

      This is a valid point for discussion that we also had here extensively. We indeed did consider pheochromocytoma cells (PC12 cells) for reconstitution of ribbon-type AZs and also performed initial experiments with these in the initial stages of the project. PC12 cells offer the advantage of providing synaptic-like microvesicles and also endogenously express several components of the presynaptic machinery such as Bassoon, RIM2, ELKS etc (Inoue et al, 2006) such that overexpression of exogenous AZ proteins would have to be limited to RIBEYE only. 

      However, a major drawback of PC12 cells as a model is the complex molecular background of these cells. We have also briefly described this in the discussion section (line 615 – 619). Naïve, undifferentiated PC12 cells show highly heterogeneous expression of various CaV channel types (Janigro et al, 1989); however, CaV1.3, the predominant type in ribbon synapses of the ear, does not seem to express in these cells (Liu et al, 1996). Furthermore, our attempts at performing immunostainings against CaV1.3 and at overexpressing CaV1.3 in PC12 cells did not prove successful and we decided on refraining from pursuing this further (data not shown). 

      On the contrary, HEK293 cells being “synapse-naïve” provide the advantage of serving as a “blank canvas” for performing such reconstitutions, e.g. they lack voltage-gated Ca<sup>2+</sup> channels and multidomain proteins of the active zone. Moreover, an important practical aspect for our choice was the availability of the HEK293 cell line with stable (and inducible) expression of the CaV1.3 Ca<sup>2+</sup> channel complex. Finally, as described in lines 613 – 614 of the discussion section, even though HEK293 cells lack SVs and the molecular machinery required for their release, our work paves way for future studies which could employ delivery of SV machinery via co-expression (Park et al, 2021) which could then be analyzed by the correlative light and electron microscopy workflow we worked out and added during revision. 

      (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of SyRibbons. Yes, we employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation also in regions without SyRibbons which likely reduced the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      Reviewer #3 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) The results obtained in a heterologous system (HEK293 cells) need to be interpreted with caution. They will importantly speed the generation of models and hypothesis that will, however, require in vivo validation.

      We acknowledge this limitation highlighted by Reviewer 3 arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. We employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation, even in regions without SyRibbons and this could reduce the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      (2) The authors analyzed the distribution of RIBEYE clusters in different membrane compartments and correctly conclude that RIBEYE clusters are not trapped in any of those compartments, but it is soluble instead. The authors, however, did not carry out a similar analysis for Palm-Bassoon. It is therefore unknown if Palm-Bassoon binds to other membrane compartments besides the plasma membrane. That could occur because in non-neuronal cells GAP43 has been described to be in internal membrane compartments. This should be investigated to document the existence of ectopic internal Synribbons beyond the plasma membrane because it might have implications for interpreting functional data in case Ca2+-channels become part of those internal Synribbons.

      In response to this valid concern, we have now included the suggested experiment in Supplementary Figure 1. We investigated the subcellular localisation of Palm-Bassoon and did not find Palm-Bassoon puncta to colocalise with ER, Golgi, or lysosomal markers, suggesting against a possible binding with membrane compartments inside the cell. We have added the following sentence in the results section, line 145 : “Palm-Bassoon does not appear to localize in the ER, Golgi apparatus or lysosomes (Supplementary Fig 1 D, E and F).”

      (3) The co-expression of RBP2 and Palm-Bassoon induces a rather minor but significant increase in Ca2+-currents (Figure 5). Such an increase does not occur upon expression of (1) Palm-Bassoon alone, (2) RBP2 alone or (3) RIBEYE alone (Figure 5). Intriguingly, the concomitant expression of PalmBassoon, RBP2 and RIBEYE does not translate into an increase of Ca2+-currents either (Figure 7).

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. We also highlight that, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. 

      (4) The authors claim that Ca2+-imaging reveals increased CA2+-signal intensity at synthetic ribbontype AZs. That claim is a subject of concern because the increase is rather small and it does not correlate with an increase in Ca2+-currents.

      Thanks for the comment: please see our response to your first comment and the lines 585 – 610 in the discussion section.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors should have a better discussion of problems derived from over-expression.

      Done. Please see above. 

      (2) Ideally, the authors would repeat the study using a secretory cell line, but this is of course not possible. The idea could be brought forth, though.

      As described above in our response to the public review of reviewer 2, we have discussed this idea in the discussion section (refer to lines 615 – 619), emphasizing on both the advantages and the limitations of using a secretory cell line (e.g. PC12 cells) instead of HEK293 cells as a model for performing such reconstitutions. 

      Reviewer #3 (Recommendations For The Authors):

      (1) There are several figures in which colocalization between different proteins is studied only displaying images but without any quantitative data. This should be corrected by providing such a quantitative analysis.

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      (2) The little increase in Ca2+-currents and Ca2+-influx associated to the clustering of Ca2+-channels to Synribbons is a concern. The authors should discuss if such a minor increase (found only when Palm-Bassoon and RBP2 ae co-expressed) would have or not physiological consequences in an actual synapse. They might discuss the comparison of those results and compare with results obtained in genetically modified mice in which Ca2+-currents are affected upon the removal of AZs proteins. On the other hand, they should explain why Ca2+-currents do not increase when the Synribbons are formed by RIBEYE, Palm-Bassoon and RBP2.

      Done. Please see above. 

      (3) The description of the patch-clamp experiments should be enriched by including representative currents. Did the authors measure tail currents?

      We would like to thank the reviewer for the valuable suggestion and have now added representative currents to the figures (see Supplementary Figure 5B). We agree with the reviewer on the importance of further characterizing the Ca<sup>2+</sup> currents in the presence and absence of SyRibbons by analysis of tail currents for counting the number of Ca<sup>2+</sup> channels by non-stationary fluctuation analysis but consider this to be out of scope of the current study and an objective for future studies. 

      (4) The current displayed in Figure 7 E should be explained better.

      Previous studies have shown that Ca<sup>2+</sup>-binding proteins (CaBPs) compete with Calmodulin to reduce Ca<sup>2+</sup>-dependent inactivation (CDI) and promote sustained Ca<sup>2+</sup> influx in Inner Hair Cells (Cui et al, 2007; Picher et al, 2017). In the absence of CaBPs, CaV1.3-mediated Ca<sup>2+</sup> currents show more rapid CDI as in the case here upon heterologous expression in HEK cells ((Koschak et al, 2001), see also Picher et al 2017 where co-expression of CaBP2 with CaV1.3 inhibits CDI in HEK293 cells). The inactivation kinetics of CaV1.3 are also regulated by the subunit composition (Cui et al, 2007) along with the modulation via interaction partners and given the reconstitution here we do not find the currents very surprising. 

      (5) Is the difference in Ca2+-influx still significantly higher upon the removal of the maximum value measured in positive Syribbons spots (Figure 7, panel K)?

      Yes, on removing the maximum value, the P value increases from 0.01 to 0.03 but remains statistically significant. 

      (6) In summary, although the approach pioneered by the authors is exciting and provides relevant results, there is a major concern regarding the interpretation of the modulation of Ca2+ channels.

      We have now carefully rephrased our interpretation on the modulation of Ca<sup>2+</sup> channels.  

      References

      Brandt A (2005) Few CaV1.3 Channels Regulate the Exocytosis of a Synaptic Vesicle at the Hair Cell Ribbon Synapse. Journal of Neuroscience 25: 11577–11585

      Cui G, Meyer AC, Calin-Jageman I, Neef J, Haeseleer F, Moser T & Lee A (2007) Ca2+-binding proteins tune Ca2+-feedback to Cav1. 3 channels in mouse auditory hair cells. The Journal of Physiology 585: 791–803

      Davydova D, Marini C, King C, Klueva J, Bischof F, Romorini S, Montenegro-Venegas C, Heine M, Schneider R, Schröder MS, et al (2014) Bassoon specifically controls presynaptic P/Q-type Ca(2+) channels via RIM-binding protein. Neuron 82: 181–194

      tom Dieck S, Altrock WD, Kessels MM, Qualmann B, Regus H, Brauner D, Fejtová A, Bracko O, Gundelfinger ED & Brandstätter JH (2005) Molecular dissection of the photoreceptor ribbon synapse: physical interaction of Bassoon and RIBEYE is essential for the assembly of the ribbon complex. J Cell Biol 168: 825–836

      Frank T, Rutherford MA, Strenzke N, Neef A, Pangršič T, Khimich D, Fejtova A, Gundelfinger ED, Liberman MC, Harke B, et al (2010) Bassoon and the synaptic ribbon organize Ca2+ channels and vesicles to add release sites and promote refilling. Neuron 68: 724–738

      Grabner CP & Moser T (2021) The mammalian rod synaptic ribbon is essential for Cav channel facilitation and ultrafast synaptic vesicle fusion. eLife 10: e63844

      Hibino H, Pironkova R, Onwumere O, Vologodskaia M, Hudspeth AJ & Lesage F (2002) RIM - binding proteins (RBPs) couple Rab3 - interacting molecules (RIMs) to voltage - gated Ca2+ channels. Neuron 34: 411–423

      Inoue E, Deguchi-Tawarada M, Takao-Rikitsu E, Inoue M, Kitajima I, Ohtsuka T & Takai Y (2006) ELKS, a protein structurally related to the active zone protein CAST, is involved in Ca2+-dependent exocytosis from PC12 cells. Genes to Cells 11: 659–672

      Janigro D, Maccaferri G & Meldolesi J (1989) Calcium channels in undifferentiated PC12 rat pheochromocytoma cells. FEBS Letters 255: 398–400

      Jean P, Morena DL de la, Michanski S, Tobón LMJ, Chakrabarti R, Picher MM, Neef J, Jung S, Gültas M, Maxeiner S, et al (2018) The synaptic ribbon is critical for sound encoding at high rates and with temporal precision. Elife 7: e29275

      Koschak A, Reimer D, Huber I, Grabner M, Glossmann H, Engel J & Striessnig J (2001) alpha 1D (Cav1.3) subunits can form l-type Ca2+ channels activating at negative voltages. J Biol Chem 276: 22100–22106

      Krinner S, Butola T, Jung S, Wichmann C & Moser T (2017) RIM-Binding Protein 2 Promotes a Large Number of CaV1.3 Ca2+-Channels and Contributes to Fast Synaptic Vesicle Replenishment at Hair Cell Active Zones. Front Cell Neurosci 11: 334

      Liu H, Felix R, Gurnett CA, De Waard M, Witcher DR & Campbell KP (1996) Expression and Subunit Interaction of Voltage-Dependent Ca2+ Channels in PC12 Cells. J Neurosci 16: 7557–7565

      Lv C, Stewart WJ, Akanyeti O, Frederick C, Zhu J, Santos-Sacchi J, Sheets L, Liao JC & Zenisek D (2016) Synaptic Ribbons Require Ribeye for Electron Density, Proper Synaptic Localization, and Recruitment of Calcium Channels. Cell Reports 15: 2784–2795

      Matthews G & Fuchs P (2010) The diverse roles of ribbon synapses in sensory neurotransmission. Nat Rev Neurosci 11: 812–822

      Maxeiner S, Luo F, Tan A, Schmitz F & Südhof TC (2016) How to make a synaptic ribbon: RIBEYE deletion abolishes ribbons in retinal synapses and disrupts neurotransmitter release. The EMBO Journal 35: 1098–1114

      Michanski S, Kapoor R, Steyer AM, Möbius W, Früholz I, Ackermann F, Gültas M, Garner CC, Hamra FK, Neef J, et al (2023) Piccolino is required for ribbon architecture at cochlear inner hair cell synapses and for hearing. EMBO Rep 24: e56702

      Michanski S, Smaluch K, Steyer AM, Chakrabarti R, Setz C, Oestreicher D, Fischer C, Möbius W, Moser T, Vogl C, et al (2019) Mapping developmental maturation of inner hair cell ribbon synapses in the apical mouse cochlea. PNAS 116: 6415–6424

      Neef J, Urban NT, Ohn T-L, Frank T, Jean P, Hell SW, Willig KI & Moser T (2018) Quantitative optical nanophysiology of Ca2+ signaling at inner hair cell active zones. Nat Commun 9: 290

      Park D, Wu Y, Lee S-E, Kim G, Jeong S, Milovanovic D, Camilli PD & Chang S (2021) Cooperative function of synaptophysin and synapsin in the generation of synaptic vesicle-like clusters in non-neuronal cells. Nat Commun 12

      Picher MM, Gehrt A, Meese S, Ivanovic A, Predoehl F, Jung S, Schrauwen I, Dragonetti AG, Colombo R, Camp GV, et al (2017) Ca2+-binding protein 2 inhibits Ca2+-channel inactivation in mouse inner hair cells. PNAS 114: E1717–E1726

      Robertis ED & Franchi CM (1956) Electron Microscope Observations on Synaptic Vesicles in Synapses of the Retinal Rods and Cones. J Biophys Biochem Cytol 2: 307–318

      Roberts WM, Jacobs RA & Hudspeth AJ (1990) Colocalization of ion channels involved in frequency selectivity and synaptic transmission at presynaptic active zones of hair cells. J Neurosci 10: 3664–3684

      Smith CA & Sjöstrand FS (1961) A synaptic structure in the hair cells of the guinea pig cochlea. Journal of Ultrastructure Research 5: 184–192

      Wong AB, Rutherford MA, Gabrielaitis M, Pangršič T, Göttfert F, Frank T, Michanski S, Hell S, Wolf F, Wichmann C, et al (2014) Developmental refinement of hair cell synapses tightens the coupling of Ca2+ influx to exocytosis. EMBO J 33: 247–264

      Zampini V, Johnson SL, Franz C, Lawrence ND, Münkner S, Engel J, Knipper M, Magistretti J, Masetto S & Marcotti W (2010) Elementary properties of CaV1.3 Ca(2+) channels expressed in mouse cochlear inner hair cells. J Physiol 588: 187–199

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported that their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channel's interactions with MSPs might alter its structure, possibly not accurately representing/reflecting functional states of the channel.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is a promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane mimetics that could alter channel structure. Comparing liposomal ELIC to structures in various-sized nanodiscs gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      The overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. Including comparisons of the results to the native bacterial lipid environment would provide a more encompassing discussion of how the determined liposome structures might or might not relate to the native receptor in its native environment. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, no functional studies were performed to validate this statement.

      The goal of this study was to determine structures of ELIC in the same lipid environment in which its function is characterized. However, it is also worth noting that phosphatidylethanolamine and phosphatidylglyerol, two lipids used for the liposome formation, are necessary for ELIC function (PMID 36385237) and principal lipid components of gram-negative bacterial membranes in which ELIC is expressed.

      The desensitized structure of ELIC in liposomes shows a pore diameter at the hydrophobic L240 (9’) residue of 3.3 Å, which is anticipated to pose a large energetic barrier to the passage of ions due to the hydrophobic effect. We have included a graphical representation of pore diameters from the HOLE analysis for all liposome structures in Supplementary Figure 6B. While we have not tested the role of L240 in desensitization with functional experiments, it was shown by Gonzalez-Gutierrez and colleagues (PMID 22474383) that the L240A mutation apparently eliminates desensitization in ELIC. This finding is consistent with L240 (9’) being the desensitization gate of ELIC. We have referenced this study when discussing the desensitization gate in the Results.

      Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitized, and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether, these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.

      The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weaknesses

      Core parts of the method are not described and/or discussed in enough detail. While I do believe that liposomes will be, in most cases, better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure, and should be an integral part of the report. Therefore, I strongly felt that biochemistry should be better described and discussed. The results section starts with "Optimal reconstitution of ELIC in liposomes [...] was achieved by dialysis". There is no information on why dialysis is optimal, what it was compared to, the distribution of liposome sizes using different preparation techniques, etc... Reading the title, I would have expected a couple of paragraphs and figure panels on liposome reconstitution. Similarly, potential biochemical challenges are not discussed. The methods section mentions that the sample was "dialyzed [...] over 5-7 days". In such a time window, most of the members of this protein family would aggregate, and it is therefore a protocol that can not be directly generalised. This has to be mentioned explicitly, and a discussion on why this can't be done in two days, what else the authors tested (biobeads? ... ?) would strengthen the manuscript.

      To a lesser extent, the relative lack of both technical details and of a broad discussion also pertains to the cryoEM and thallium flux results. Regarding the cryoEM part, the authors focus their analysis on reconstructions from outward-facing particles on the basis of their better resolutions, yet there was little discussion about it. Is it common for liposome-based structures? Are inward-facing reconstructions worse because of the increased background due to electrons going through two membranes? Are there often impurities inside the liposomes (we see some in the figures)? The influence of the membrane mimetics on conformation could be discussed by referring to other families of proteins where it has been explored (for instance, ABC transporters, but I'm sure there are many other examples). If there are studies in other families of channels in liposomes that were inspirational, those could be mentioned. Regarding thallium flux assays, one argument is that they give access to kinetics and set the stage for time-resolved cryoEM, but if I did not miss it, no comparison of kinetics with other techniques, such as electrophysiology, nor references to eventual pioneer time-resolved studies are provided.

      Altogether, in my view, an updated version would benefit from insisting on every aspect of the methodological development. I may well be wrong, but I see this paper more like a milestone on sample prep for cryoEM imaging than being about the details of the ELIC conformations.

      Additions have been made to the Results and Discussion sections elaborating on the following points: 1) reconstitution of ELIC in liposomes using dialysis, the advantage of this over other methods such as biobeads, and whether the dialysis protocol can be shortened for other less stable proteins; 2) the issue of separating outward- and inward-facing channels; 3) referencing the effect of nanodiscs on ABC transporters, structures of membrane proteins in liposomes, and pioneering time-resolved cryo-EM studies; and 4) comparison of the kinetics of ELIC gating kinetics with electrophysiology measurements. With regards to the first point, it should be noted that all necessary details are provided in the Methods to reproduce the experiments including the reconstitution and stopped-flow thallium flux assay. It is also important to note that the same preparation for making proteoliposomes was used for assessing function using the stopped-flow thallium flux assay and for determining the structure by cryo-EM. This is now stated in the Results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major revisions:

      (1) The authors suggest that the desensitization gate is located at the 9' region within the pore. However, as stated by the authors, the 2' residues function as the desensitization gate in related channels. In a few of their HOLE analyzed structures (e.g. Figure 2B and 4B), there seems to be a constriction also at 2', but this finding is not discussed in the context of desensitization. Further functional testing of mutated 9' and/or 2' gates would bolster the argument for the location of the desensitization gate.

      As stated above, we have included HOLE plots of pore radius in Supplementary Fig. 6B and referenced the study showing that the L240A mutation (9’) in ELIC (PMID 22474383) appears to eliminate desensitization. This result along with the narrow pore diameter at 9’ in the desensitized structure suggests that 9’ is likely a desensitization gate in ELIC. In contrast, mutation of Q233 (2’) to a cysteine in a previous study produced a channel that still desensitizes (PMID 25960405). Since Q233 is a hydrophilic residue in contrast to L240, Q233 probably does not pose the same energetic barrier to ion translocation as L240 based on the structure.

      (2) In discussing functional states of ELIC and ELIC5 in different reconstitution methods, the authors reference constriction sites determined by HOLE analysis software. These constriction sites were key evidence for the authors to determine functional state, however, it is difficult to discern pore sizes based on the figures. Pore diameters and clear color designation (ie, green vs orange) with the figures would greatly aid their discussions.

      HOLE plots are displayed in Supplementary Fig. 6B and pore diameters are not provided in the text.

      (3) The authors had an intriguing finding that ELIC dimers are found in spNW25 scaffolds. Is there any functional evidence to suggest they could be functioning as dimers?

      There is no evidence that the function of ELIC or other pLGICs is altered by the formation of dimers of pentamers. Therefore, while this result is intriguing and likely facilitated by concentrating multiple ELIC pentamers within the nanodisc, it is not clear if these interactions have any functional importance. We have stated this in the Results.

      (4) Thallium flux assay to validate channel function within proteoliposomes. Proteoliposomes are known to be generally very leaky membranes, would be good to have controls without ELIC added to determine baseline changes in fluorescence.

      We have established from multiple previous studies that liposomes composed of 2:1:1 POPC:POPE:POPG (PMID 36385237 and 31724949) do not show significant thallium flux as measured by the stopped-flow assay (PMID 29058195) in the absence of ELIC activity. Furthermore, in the present study, the data in Fig. 1A of WT ELIC shows a low thallium flux rate 60 seconds after exposure to agonist when the ion channel has mostly desensitized. Therefore, this data serves also as a control indicating that the high thallium flux rates in response to agonist (at earlier delay times) are not due to leak, but rather due to ELIC channel activity.

      Minor revisions:

      (1) Abstract and introduction. 'Liganded' should be ligand

      We removed this word and changed it to “agonist-bound” for consistency throughout the manuscript.

      (2) Inconsistent formatting of FSC graphs in Supplemental Figure 4

      The difference is a consequence of the different formatting between cryoSPARC and Relion FSC graphs.

      Reviewer #2 (Recommendations for the authors):

      Minor writing remarks:

      The present report builds on previous work from the same team, and to my eye it would be a plus if this were conveyed more explicitly. I see it as a strength to explore various developments in several papers that complement each other. E.g in the introduction when citing reference 12 (Dalal 2024), later in introducing ref 15 (Petroff 2022), I wish I was reminded of the main findings and how they fit with the new results.

      We have expanded on the Results and Discussion detailing key findings from these studies that are relevant to the current study.

      Suggestions for analysis:

      Data treatment. Maybe I missed it, but I wondered if C1 vs C5 treatment of the liposome data showed any interesting differences? When I think about the biological membrane, I picture it as a very crowded place with lots of neighbouring proteins. I would not be surprised if, similarly to what they do in discs, the receptor would tend to stick to, or bump into, anything present also in liposomes (a neighboring liposome, some undefined density inside the liposome).

      We attempted to perform C1 heterogeneous refinement jobs in cryoSPARC and C1 3D classification in Relion5. For the WT datasets, these did not produce 3D reconstructions that were of sufficient quality for further refinement. For ELIC5 with agonist, the C1 reconstructions were not different than the C5 reconstructions. Furthermore, there was no evidence of dimers of pentamers from the 2D or 3D treatments, unlike what was observed in the spNW25 nanodiscs. This is likely because the density of ELIC pentamers in the liposomes was too low to capture these transient interactions. We have included this information in the Methods.

      In data treatment, we sometimes find only what we're looking for. I wondered if the authors tried to find, for instance, the open and D conformations in the resting dataset during classifications.

      This is an interesting question since some population of ELIC channels could visit a desensitized conformation in the absence of agonist and this would not be detected in our flux assay. After extensive heterogeneous refinement jobs in cryoSPARC and 3D classification jobs in Relion5, we did not detect any unexpected structures such as open/desensitized conformations in the apo dataset.

      In the analysis of the M4 motions, is there info to be gained by looking at how it interacts with the rest of the TMD? For instance, I wondered if the buried surface area between M4 and the rest was changed. Also one could imagine to look at that M4 separately in outward-facing and inward-facing conformations (because the tension due to the bilayer will not be the same in the outer layer in both orientations - intuitively, I'd expect different levels of M4 motions)

      We have expanded our analysis of the structures as recommended. We determined the buried surface area between M4 and the rest of the channel in the liganded WT and ELIC5 structures in liposomes and nanodiscs, as well as the area between the TMD interfaces for these structures. There appears to be a pattern where liposome structures show less buried surface area between M4 and the rest of the channel, and less area at the TMD interfaces. Overall, this suggests that the liposome structures of ELIC in the open-channel or desensitized conformations are more loosely packed in the TMD compared to the nanodisc structures.

      We have also further discussed the issue of separating outward- and inward-facing conformations in the Results. The problem with classifying outward- and inward-facing orientations is that top/down or tilted views of the particles cannot be easily distinguished as coming from channels in one orientation or the other, unless there are conformational differences between outward- and inward-facing channels that would allow for their separation during 3D heterogeneous refinement or 3D classification. Furthermore, since the inward-facing reconstructions are of much lower resolution than the outward-facing reconstructions, we suspect that these particles are more heterogeneous possibly containing junk, multiple conformations, or particles that are both inward- and outward-facing. On the other hand, the outward-facing structures are of good quality, and therefore we are more confident that these come from a more homogeneous set of particles that are likely outward-facing (Note that most particles are outward facing based on side views of the 2D class averages). That said, when examining the conformation of M4 in outward- and inward-facing structures, we do not see any significant differences with the caveat that the inward-facing structures are of poor quality and that inward- and outward-facing particles may not have been well-separated.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their thorough reading and thoughtful feedback. Below, we address each of the concerns raised in the public reviews, and outline our revisions that aim to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile. 

      Public Reviews:

      Reviewer 1 (Public review): 

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the Reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, we now clarify the kinds of resources the experiment involved (lines 87-97): 

      “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle. Participants could purchase one (40 coins), two (60 coins), or three tickets (80 coins) or otherwise walk for free to the nearest location. Participants were informed that a single ticket allowed them to board only if the vehicle stopped at the station, while additional tickets provided extra chances to board even after the vehicle had left the platform. For each additional ticket, the chosen vehicle appeared moving from left to right across the screen, and participants could attempt to board it by pressing the spacebar when it reached the center of the screen. Thus, each additional ticket could increase the chance of boarding but also required a greater investment of resources—decreasing earnings, extending the trial duration, and demanding attentional effort to precisely time a button press when attempting to board.”

      In addition, in the revised discussion, we now highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains (lines 341-348):

      “Another interesting possibility is that individual elasticity biases vary across different resource types (e.g., money, time, effort). For instance, a given individual may assume that controllability tends to be highly elastic to money but inelastic to effort. Although the task incorporated multiple resource types (money, time, and attentional effort), the results may differ depending on the type of resources on which the participant focuses. Future studies could explore this possibility by developing tasks that separately manipulate elasticity with respect to different resource types. This would clarify whether elasticity biases are domain-specific or domaingeneral, and thus elucidate their impact on everyday decision-making.”

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      We thank the Reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. To test the Reviewer's suggestion, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides the same level of control as long as at least one ticket is purchased (inelastic controllability). The linear function increases control proportionally with each additional ticket (elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than our elastic controllability model (log Bayes Factor > 4100 on combined datasets). We surmise that the main advantage offered by the elastic controllability model is that it does not assume a linear increase in control as a function of resource investment – even though this linear relationship was actually true in our experiment and is required for generalizing to other ticket quantities, it likely does not match what participants were doing. We present these findings in a new section ‘Testing alternative methods’ (lines 686-701):

      “We next examined whether participant behavior would be better characterized as a continuous function approximation rather than the discrete inferences in our model. To test this, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides full control as long as at least one ticket is purchased (inelastic controllability). The linear function linearly increases control with the number of extra tickets (i.e., 0%, 50%, and 100% control for 1, 2, and 3 tickets, respectively; elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than the elastic controllability model (log Bayes Factor > 4100 on combined datasets), suggesting that participants did not assume that control increases linearly with resource investment.”

      We also refer to this analysis in our updated discussion (326-339): 

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions or experimental designs may offer a better test of this idea.”

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We acknowledge the Reviewer's important point about avoiding a potential "jangle fallacy." We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, in the revised manuscript, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources available to the agent (lines 16-20; see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modeling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Even the model suggested by the Reviewer required a dedicated variable representing elastic controllability, namely the probability of the linear controllability function. More generally, a single-process account allows that different aspects of the said process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, 'elasticity of controllability bias' and 'maximum controllability bias') is consistent with a common construct account.

      To avoid misunderstandings, we have now modified the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. Here are a few examples:

      Lines 21-28: “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Lines 45-47: “Experimental paradigms to date have conflated overall controllability and its elasticity, such that controllability was either low or elastic[16-20]. The elasticity of control, however, must be dissociated from overall controllability to accurately diagnose mismanagement of resources.”

      Lines 70-72: “These findings establish elasticity as a crucial dimension of controllability that guides adaptive behavior, and a computational marker of control-related psychopathology.”

      Lines 87-88: “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle.”

      Reviewer 2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the Reviewer for highlighting the lack of clarity about the concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and associated revisions of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case, environments can differ in the degree to which they are elastic. For further details on this formal definition, and associated revisions of the text, see our response to Reviewer 3.

      Importantly, whether an environment is more or less elastic does not fully determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability[1]: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P( S'= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability[2]: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state S, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’[3,4]. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] = 1 bit

      For the elastic environment:

      P(goal) = (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 = .33, P(non-goal) = .67 H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] = .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C = 0) = -[.2 × log<sub>2</sub> (.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub> (.6) + .4 × log<sub>2</sub> (.4)] = .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × log<sub>2</sub> (.1) + .9 × log<sub>2</sub> (.9)] = .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × log<sub>2</sub> (.2) + .8 × log<sub>2</sub> (.8)] = .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 = .52 bits

      Step 3: Calculating I(S'|A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and success/failure to reach the goal is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. These calculations are now included in the Supplementary materials (Supplementary Note 1). 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We have also revised the manuscript to clarify this distinction (lines 21-28):

      “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Reviewer 3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We clarify this in our revision of the manuscript in lines 8-15 (changes in bold): 

      “The degree of control we possess over our environment, however, may itself depend on the resources we are willing and able to invest. For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice. Likewise, the control a diner in a restaurant has over their meal may depend on how much money they have to spend. In such situations, controllability is not fixed but rather elastic to available resources (i.e., in the same sense that supply and demand may be elastic to changing prices[14]).”

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability1 as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions[2,3] would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We have added this formal definition to the manuscript (lines 15-20): 

      “To formalize how elasticity relates to control, we build on an established definition of controllability as the fraction of reward that is controllably achievable[15], 𝜒. Uncertainty about this fraction could result from uncertainty about the amount of resources that the agent is able and willing to invest, 𝑚𝑎𝑥 𝐶. Elasticity can thus be defined as the amount of information obtained about controllability by knowing the amount of available resources: 𝐼(𝜒; 𝑚𝑎𝑥 𝐶).”

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the Reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%] now specified in lines 363-366). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location (since depending on the starting location, the treasure location could have been automatically reached by walking), which was revealed together with the outcome. To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We now include this new analysis in the revised manuscript (Methods lines 648-661):

      “To ascertain that participants were truly learning latent estimates of controllability rather than simpler associations, we conducted two complementary analyses.

      First, we implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets). Second, we fitted a variant of the elastic controllability model that compared learning from control-related versus chance outcomes via separate parameters (instead of assuming no learning from chance outcomes). Chance outcomes were observed by participants in the 20% of trials where reward and control were decoupled, in the sense that participants reached the treasure regardless of whether they boarded their vehicle of choice. Results showed that participants learned considerably more from control-related, as compared to chance, outcomes (mean learning ratio=1.90, CI= [1.83, 1.97]). Together, these analyses show that participants were forming latent controllability estimates rather than direct action-outcome associations.”

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity, as now expressed in the revised discussion (lines 326-333; reproduced below in response to the Reviewer’s comment on updating controllability beliefs when losing with less than 3 tickets).

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the Reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. However, our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improves control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We have now added this to the discussion of future directions (lines 287-295):

      “Additionally, real life typically doesn’t offer the streamlined recurrence of homogenized experiences that makes learning easier in experimental tasks, nor are people systematically instructed and trained about elastic and inelastic control in each environment. These complexities introduce substantial additional uncertainty into inferences of elasticity in naturalistic settings, thus allowing more room for prior biases to exert their influences. The elasticity biases observed in the present studies are therefore likely to be amplified in real-life behavior. Future research should examine how these complexities affect judgments about the elasticity of control to better understand how people allocate resources in real-life.”

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance.

      This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations apply to the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we now report in Supplementary Figure 3 along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Additionally, participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. Most importantly, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, middle plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p=.03 permutation test; see updated Figure 6D, bottom plot) to the observed canonical correlation. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of control-related psychopathology to which SOA contributed significantly. We now report this when presenting the CCA results (lines 255-257): 

      “Loadings on the side of psychopathology were dominated by an impaired sense of agency (SOA; contribution to canonical correlation: p=.03, Figure 6D, bottom plot), along with obsessive compulsive symptoms (OCD), and social anxiety (LSAS) – all symptoms that have been linked to an impaired sense of control[22-25].”

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort7, whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression[5-6].

      We have now revised the manuscript to clarify the justification for our analytical approach (lines 236-248):

      “To examine whether the individual biases in controllability and elasticity inference have psychopathological ramifications, we assayed participants on a range of self-report measures of psychopathologies previously linked to a distorted sense of control (see Methods, pg. 24). Examining the direct correlations between model parameters and psychopathology measures (reported in Supplementary Figure 3) does not account for the substantial variance that is typically shared among different forms of psychopathology. For this reason, we instead used a canonical correlation analysis (CCA) to identify particular dimensions within the parameter and psychopathology spaces that most strongly correlate with one another.”

      We also now include a cautionary note in the discussion (lines 309-315):

      “Whereas our pre-registered CCA effectively identified associations between task parameters and a psychopathological profile, this analysis method does not directly reveal relationships between individual variables. Auxiliary analyses confirmed significant contributions of both elasticity bias and sense of agency to the observed canonical correlation, but the contribution of other measures remains to be determined by future work. Such work could employ other established measures of agency, including both behavioral indices and subjective self-reports, to better understand how these constructs relate across different contexts and populations.”

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (𝜆) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 23). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝜖<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝜖<sub>elasticity</sub> This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (figure below) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We now report on this additional analysis in the text (lines 617-627):

      “To capture prior biases that planets are controllable and elastic, we introduced parameters λ<sub>controllability</sub> and λ<sub>elasticity</sub>, each computed by multiplying the direction (λ – 0.5) and strength (ϵ) of individuals’ prior belief. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> range between 0 and 1, with values above 0.5 indicating a bias towards high controllability or elasticity, and values below 0.5 indicating a bias towards low controllability or elasticity. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> are positively valued parameters capturing confidence in the bias. Parameter recovery analyses confirmed both good recoverability (see S2 Table) and low confusion between bias direction and strength (𝜖<sub>controllability</sub> → λ<sub>controllability</sub> = −. 07, λ<sub>controllability</sub> → 𝜖<sub>controllability</sub> =. 16, 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> =. 15, λ<sub>elasticity</sub> → 𝜖<sub>elasticity</sub> =. 04), ensuring that observed biases and their relation to psychopathology do not merely reflect slower learning (Supplementary Figure 4), which can result from changes in bias strength but not direction.”

      We also more precisely articulate the impact of providing participants with three free tickets at their initial visits to each planet.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were meant to describe the relationship between model parameters and model-independent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. Additionally, these analyses serve two other purposes: as a validity check, confirming that our computational model effectively captured observed individual differences, and as a help for readers to understand what each parameter in our model represents in terms of observable behavior. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      “To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ...”

      “... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.”

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the Reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. We now explicitly state these details when describing the experimental task (lines 393-395):

      “When participants purchased multiple tickets, they made all boarding attempts in sequence without intermediate feedback, only learning whether they successfully boarded upon reaching their final destination. This served two purposes. First, to ensure that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, to ensure that results could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome (e.g., preparing for an exam or a job interview).”

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates (a<sub>elastic≥1</sub>,a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. We note this now in the presentation of the computational model (caption Figure 4):

      “A failure to board does not change estimated maximum controllability, but rather suggests that 1 ticket might not suffice to obtain control (a<sub>elastic≥1</sub> + 1; 𝑙𝑖𝑔ℎ𝑡 𝑔𝑟𝑒𝑒𝑛 𝑑𝑖𝑚𝑖𝑛𝑖𝑠ℎ𝑒𝑑). As a result, the model’s estimate of average controllability across ticket options is reduced.”

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We now explicitly address these considerations in the revised discussion (lines 326-333) with the following: 

      “Future research could explore alternative models for implementing elasticity inference that extend beyond our current paradigm. First, further investigation is warranted concerning how uncertainty about controllability and its elasticity interact. In the present study, we minimized individual differences in the estimation of maximum available control by providing participants with three free tickets at their initial visits to each planet. We made this design choice to isolate differences in the estimation of elasticity, as opposed to maximum controllability. To study how these two types of estimations interact, future work could benefit from modifying this aspect of our experimental design.”

      Furthermore, we have now tested a Bayesian model suggested by Reviewer 1, but we found that this model fitted participants’ choices worse (see details in the response to Reviewer 1’s comments). 

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      In the introduction, the definition of controllability and elasticity, and the scope of "resources" investigated in the current study were unclear. If I understand correctly, controllability is defined as "the degree to which actions influence the probability of obtaining a reward", and elasticity is defined as the change in controllability based on invested resources. This would define the controllability of the environment and the elasticity of controllability of the environment. However, phrases such as "elastic environment" seem to imply that elasticity can directly attach to an environment, instead of attaching to the controllability of the environment.

      We thank the Reviewer for highlighting the need to clarify our conceptualization of elasticity and controllability. We now provide formal definitions of both, with controllability defined as the fraction of controllably achievable reward[1], and elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is willing and able to invest (see further details in the response to Reviewer 3’s public comments). In the revised manuscript, we now use more precise language to clarify that elasticity is a property of controllability, not of environments themselves. In addition, we now clarify that the current study manipulated monetary, attentional effort, and time costs together (see further details in the response to Reviewer 1’s public comments).   

      (2) Some of the real-world examples were confusing. For example, the authors mention that investing additional effort due to the belief that this leads to better outcomes in OCD patients is overestimated elasticity, but exercising due to the belief that this can make one taller is overestimated controllability. What's the distinction between the examples? The example of the chess expert practicing to win against a novice, because the amount of effort they invest would not change their level of control over the outcome is also unclear. If the control over the outcome depends on their skill set, wouldn't practicing influence the control over the outcome? In the case of the meeting time example, wouldn't the bus routes differ in their time investments even though they are the same price? In addition to focusing the introductory examples around monetary resources, I would also generally recommend tightening the link between those examples and the experimental task.

      We thank the Reviewer for highlighting the need to clarify the examples used to illustrate elasticity and controllability. We have now revised these examples to more clearly distinguish between the concepts and to strengthen their connection to the experimental task.

      Regarding the OCD example, the possibility that OCD patients overestimate elasticity comes from research suggesting they experience low perceived control but nevertheless engage in excessive resource investment2, reflecting a belief that only through repeated and intense effort can they achieve sufficient control over outcomes. As an example, consider an OCD patient investing unnecessary effort in repeatedly locking their door. This behavior cannot result from an overestimation of controllability because controllability truly is close to maximal. It also cannot result from an underestimation of the maximum attainable control, since in that case investing more effort is futile. Such behavior, however, can result from an overestimation of the degree to which controllability requires effort (i.e., overestimation of elasticity). 

      Similarly, with regards to the chess expert, we intended to illustrate a situation where given their current level, the chess expert is already virtually guaranteed to win, such that additional practice time does not improve their chances. Conversely, the height example illustrates overestimated controllability because the outcome (becoming taller through exercise) is in fact not amenable to control through any amount of resource investment.

      Finally, the meeting time example was meant to illustrate that if the desired outcome is reaching a meeting in time, then different bus routes that cost the same provide equal control over this outcome to anyone who can afford the basic fare. This demonstrates inelastic controllability with respect to money, as spending more on transportation doesn't increase the probability of reaching the meeting on time. The Reviewer correctly notes that time investment may differ between routes. However, investing more time does not improve the expected outcome. This illustrates that inelastic controllability does not preclude agents from investing more resources, but such investment does not increase the fraction of controllably achievable reward (i.e., the probability of reaching the meeting in time).

      In the revised manuscript, we’ve refined each of the above examples to better clarify the specific resources being considered, the outcomes they influence, and their precise relationship to both elasticity and controllability: 

      OCD (lines 40-43): Conversely, the repetitive and unusual amount of effort invested by people with obsessive-compulsive disorder in attempts to exert control[23,24] could indicate an overestimation of elasticity, that is, a belief that adequate control can only be achieved through excessive and repeated resource investment[25].  

      Chess expert (54-57): Alternatively, they may do so because they overestimate the elasticity of control – for example, a chess expert practicing unnecessarily hard to win against a novice, when their existing skill level already ensures control over the match's outcome.

      Height (lines 53-54): A given individual, for instance, may tend to overinvest resources because they overestimate controllability – for example, exercising due to a misguided belief that that this can make one taller, when in fact height cannot be controlled. 

      Meeting time (lines 26-28): Choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1).

      Methods

      (1) In the elastic controllability model definition, controllability is defined as "the belief that boarding is possible" (with any number of tickets). The definition again is different from in the task description where controllability is defined as "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket."

      We clarify that "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket" is our definition for inelastic controllability, as opposed to overall/maximum controllability, as stated here (lines 101-103):

      "We defined inelastic controllability as the probability that even one ticket would lead to successfully boarding the vehicle, and elastic controllability as the degree to which two extra tickets would increase that probability."

      Overall controllability is the summation of the two. This summation is referred to in the elastic controllability model definition as the "the belief that boarding is possible". We now clarify this in the caption to figure 4:

      Elastic Controllability model: Represents beliefs about maximum controllability (black outline) and the degree to which one or two extra tickets are necessary to obtain it. These beliefs are used to calculate the expected control when purchasing 1 ticket (inelastic controllability) and the additional control afforded by 2 and 3 tickets (elastic controllability).    

      We also clarify this in the methods when describing the parameterization of the model (lines 529-531): 

      The expected value of one beta distribution (defined by a,sub>control</sub>, b,sub>control</sub>) represents the belief that boarding is possible (controllability) with any number of tickets. 

      (2) The free parameter K is confusing. What is the psychological meaning of this parameter? Is it there just to account for the fact that failure with 3 tickets made participants favor 3 tickets or is there meaning attached to including this parameter?

      This parameter captures how participants update their beliefs about resource requirements after failing to board with maximum resource investment. Our psychological interpretation is that participants who experience failure despite maximum investment (3 tickets) prioritize resolving uncertainty about whether control is fundamentally possible (before exploring whether control is elastic), which can only be determined by continuing to invest maximum resources. 

      We now clarify this in the methods (lines 555-559):

      To account for our finding that failure with 3 tickets made participants favor 3, over 1 and 2, tickets, we introduced a modified elastic controllability* model, wherein purchasing extra tickets is also favored upon receiving evidence of low controllability (loss with 3 tickets). This effect was modulated by a free parameter 𝜅 which reflects a tendency to prioritize resolving uncertainty about whether control is at all possible by investing maximum resources.

      This interpretation is supported by our analysis of 3-ticket choice trajectories (Supplementary Figure 2 presented in response to Reviewer 2). As shown in the figure, participants who win less than 50% of their 3-ticket attempts persistently purchase 3 tickets over the first 10 trials, despite frequent failures. This persistence gradually declines as participants accumulate evidence about their limited control, corresponding with an increase in opt-out rates.

      (3) Some additional details about the task design would be helpful. It seems that participants first completed 90 practice trials and were informed of the planet type every 15 trials (6 times during practice). What message is given to the participants about the planets? Did the authors analyze the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis? How does the computational model (especially the prior beliefs parameters) reset when the planet changes? How do points accumulate over the session and/or are participants motivated to budget the points? Is it possible for participants to accumulate many points and then switch to a heuristic of purchasing 3 tickets on each trial?

      We apologize for not previously clarifying these details of the experimental design.

      During practice blocks, participants received explicit feedback about each planet's controllability characteristics, to help them understand when additional resources would or would not improve their boarding success. For high inelastic controllability planets, the message read: "Your ride actually would stop for you with 1 ticket! So purchasing extra tickets, since they do cost money, is a WASTE." For low controllability planets: "Doesn't seem like the vehicle stops for you nor does purchasing extra tickets help." Lastly, for high elastic controllability planets: "Hopefully by now it's clear that only by purchasing 3 tickets (LOADING AREA) are you consistently successful in catching your ride." We now include these messages in the methods section describing the task (lines 453-458).

      We indeed analyzed the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis. Whereas the modeling attempted to explain participants’ learning process, the regression focused on explaining the resultant behavior, which in our pilot data (N=19), manifested fairly stably in the last 15 trials (ticket choices SD = 0.33 compared to .63 in the first 15 trials). The former is already stated in the text (lines 409-415), and we now also clarify the latter when discussing the model fitting procedure (line 695): 

      Reinforcement-learning models were fitted to all choices made by participants via an expectation maximization approach used in previous work.

      The computational model was initialized with the same prior parameters for all planets. When a participant moved to a new planet, the model's beliefs were reset to these prior values, capturing how participants would approach each new environment with their characteristic expectations about controllability and elasticity. We now clarify this in the methods (line 628): 

      For each new planet participants encountered, these parameters were used to initialize the beta distributions representing participants’ beliefs

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To address the Reviewer's question about changes in ticket purchasing behavior, we conducted a mixed probit regression examining whether accumulated points influenced participants’ decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> \= .01 𝑝 = .87), indicating that participants did not switch to simple heuristic strategies after accumulating enough coins. We now report this analysis in the methods (lines 421-427):

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To ensure that accumulated gains did not lead participants to adopt a simple heuristic strategy of always purchasing 3 tickets, we conducted a mixed probit regression examining whether the number of accumulated coins influenced participants' decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> = .01 𝑝 = .87), ruling out the potential strategy shift.

      Following the modeling section, it may be helpful to have a table of the fitted models, the parameters of each model, and the meaning/interpretation of each parameter.

      We thank the Reviewer for this suggestion. We have now added a table (Supplementary Table 3) that summarizes all fitted models, their parameters, and the meaning/interpretation of each parameter.

      (1) The conclusions from regressing the task choices (opt-in rates and ticket purchases) on the fitted parameters seem confusing given that the model parameters were fitted on the task behavior, and the relationship between these variables seems circular. For example, the authors found that preferences for purchasing 2 or 3 tickets (a2 and a3; computational parameters) were associated with purchasing more tickets (task behavior). But wouldn't this type of task behavior be what the parameters are explaining? It's not clear whether these correlation analyses are about how individuals allocate their resources or about the validity check of the parameters. Perhaps analyses on individual deviation from the optimal strategy and parameter associations with such deviation are better suited for the questions about whether individual biases lead to resource misallocation.

      We thank the Reviewer for highlighting this seeming confusion. These regressions were meant to describe the relationship between model parameters and model-independent measures of task performance. This serves three purposes. First, a validity check, confirming that our computational model effectively captured observed individual differences. Second, to help readers understand what each parameter in our model represents in terms of observable behavior. Third, to examine in greater detail how parameter values specifically mapped onto observable behavior. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ... 

      ... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.  

      Regarding the suggestion to analyze deviation from optimal strategy, this corresponds with our present approach in that opting in is always optimal in high controllability environments and always non-optimal in low controllability environments, and similarly, purchasing extra tickets is always optimal in elastic controllability environments and always non-optimal elsewhere. Thus, positive or negative coefficients can be directly translated into closer or farther from optimal, depending on the planet type, as indicated in the figure by color. We now clarify this mapping in the figure legend:

      (2) Minor: The legend of Figure 6A is difficult to read. It might be helpful to label the colors as their planet types (low controllability, high elastic controllability, high inelastic controllability).

      We thank the Reviewer for this helpful suggestion. We have revised the figure accordingly.

      Reviewer 2 (Recommendations for the authors):

      As noted above, I'm not sure I agree with (or perhaps don't fully understand) the claims the authors make about the distinctions between their "elastic" and "inelastic" experimental conditions. Let's take the travel example from Figure 1 - is this not just an example of “hierarchical” controllability calculations? In other words, in the elastic example, my choice is between going one speed or another (i.e., exerting more or less effort), and in the inelastic example, my choice is first, which route to take (also a consideration of speed, but with lower effort costs than the elastic scenario), and second, an estimate of the time cost (not within my direct control, but could be estimated). In the elastic scenarios, additional value considerations vary between options, and in others (inelastic), they don't, with control over the first choice point (which bus route to choose, or which lunch option to take), but not over the price. I wonder if the paper would be better framed (or emphasized) as exploring the influences of effort and related "costs" of control. There isn't really such a thing as controllability that does not have any costs associated with it (whether that be action costs, effort, money, or simply scenario complexity).

      We thank the Reviewer for highlighting the need to clarify our distinction between elastic and inelastic controllability as it manifests in our examples. We first clarify that elasticity concerns how controllability varies with resources, not costs. Though resource investment and costs are often tightly linked, that is not always the case, especially not when comparing between agents. For example, it may be equally difficult (i.e., costly) for a professional biker to pedal at a high speed as it is for a novice to pedal at a medium speed, simply because the biker’s muscles are better trained. This resource advantage increases the biker’s control over his commute time without incurring additional costs as compared to the novice. We now clarify this distinction in the text by revising our example to (lines 9-11): 

      “For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice.”

      Second, whereas in our examples additional value considerations indeed vary in elastic environments, that does not have to be the case, and indeed, that is not the case in our experiment. In our experimental task, participants are given the option to purchase as many tickets as they wish regardless of whether they are in an elastic or an inelastic environment.  

      We agree that elastic environments often raise considerations regarding the cost of control (for instance, whether it is worth it to pedal harder to get to the destination in time). To consider this cost against potential payoffs, however, the agent must first determine what are the potential payoffs – that is, it must determine the degree to which controllability is elastic to invested resources. It is this antecedent inference that our experiment studies. We uniquely study this inference using environments where control may not only be low or high, but also, where high control may or may not require additional resource investments. We now clarify this point in Figure 1’s caption:

      “In all situations, agents must infer the degree to which controllability is elastic to be able to determine whether the potential gains in control outweigh the costs of investing additional resources (e.g., physical exertion, money spent, time invested).”

      For a formal definition of the elasticity of control, see our response to Reviewer 3’s public comments. 

      Relatedly, another issue I have with the distinctions between inelastic/elastic is that a high/elastic condition has inherently ‘more’ controllability than a high/inelastic condition, no matter what. For example, in the lunch option scenario, I always have more control in the elastic situation because I have two opportunities to exert choice (food option ‘and’ cost). Is there really a significant difference, then, between calling these distinctions "elastic/inelastic" vs. "higher/lower controllability?" Not that it's uninteresting to test behavioral differences between these two types of scenarios, just that it seems unnecessary to refer to these as conceptually distinct.

      As noted in the response above, control over costs may be higher in elastic environments, but it does not have to be so, as exemplified by the elastic environments in our experimental task. For a fuller explanation of why higher elasticity does not imply higher controllability, see our response to Reviewer 2’s public comments. 

      I also wonder whether it's actually the case that people purchased more tickets in the high control elastic condition simply because this is the optimal solution to achieve the desired outcome, not due to a preference for elastic control. To test this, you would need to include a condition in which people opted to spend more money/effort to have high elastic control in an instance where it was not beneficial to do so.

      We appreciate the Reviewer's question about potential preferences for elastic control. We first clarify that participants did not choose which environment type they encountered, so if control was low or inelastic, investing extra resources did not give them more control. Furthermore, our results show that the average participant did not prefer a priori to purchase more tickets. This is evidenced by participants’ successful adaptation to inelastic environments wherein they purchased significantly fewer tickets (see Figure 2B and 2C), and by participants’ parameter fits, which reveal an a priori bias to assume that controllability is inelastic (𝜆<sub>elasticity</sub> \= .16 ± .19), as well as a fixed preference against purchasing the full number of tickets (𝛼<sub>3</sub> \= −.74 ± .37). 

      We now clarify these findings by including a table of all parameter fits in the revised manuscript (see response to Reviewer 1). 

      It was interesting that the authors found that failure with 3 tickets made people more likely to continue to try 3 tickets, however, there is another possible interpretation. Could it be that this is simply evidence of a general controllability bias, where people just think that it is expected that you should be able to exert more money/effort/time to gain control, and if this initially fails, it is an unusual outcome, and they should try again? Did you look at this trajectory over time? i.e., whether repeated tries with 3 tickets immediately followed a failure with 3 tickets? Relatedly, does the perseveration parameter from the model also correlate with psychopathology?

      We thank the Reviewer for this suggestion. Our model accounts for a general controllability bias through the 𝜆<sub>controllability</sub> parameter, which represents a prior belief that planets are controllable. It also accounts, through the 𝜆<sub>elasticity</sub> parameter, for the prior belief that you should be able to exert more money/effort/time to gain control. Now, our addition of 𝜅 to the model captures the observation that failures with 3 tickets made participants more likely to purchase 3 tickets when they opted in. If this observation was due to participants not accepting that the planet is not controllable, then we would expect the increase in 3-ticket purchases when opting in to be coupled with a diminished reduction in opting in. To determine whether this was the case, we tested a variant of our model where 𝜅 not only increases the elasticity estimate but also reduces the controllability update (using 𝛽<sub>control</sub>+(1- 𝜅) instead of 𝛽<sub>control</sub>+1) after failures with 3 tickets. However, implementing this coupling diminished the model's fit to the data, as compared to allowing both effects to occur independently, indicating that the increase in 3 ticket purchases upon failing with 3 tickets did not result from participants not accepting that controllability is in fact low. Thus, we maintain our original interpretation that failure with 3 tickets increases uncertainty about whether control is possible at all, leading participants who continue to opt in to invest maximum resources to resolve this uncertainty. We now report these results in the revised text (lines 662-674). 

      The trajectory over time is consistent this interpretation (new Supplementary Figure 2 shown below). Specifically, we see that under low controllability (0-50%, orange line), over the first 10 trials participants show higher persistence with 3 tickets after failing, despite experiencing frequent failures, but also a higher opt-out probability. As these participants accumulate evidence about their limited control, we observe a gradual decrease in 3-ticket selections that corresponds directly with a further increase in opting out (right panel, orange line). This pattern qualitatively corresponds with the behavior of our computational model (empty circles). We present the results of the new analysis in lines 180-190: 

      “In fact, failure with 3 tickets even made participants favor 3, over 1 and 2, tickets. This favoring  of 3 tickets continued until participants accumulated sufficient evidence about their limited control to opt out (Supplementary Figure 2). Presumably, the initial failures with 3 tickets resulted in an increased uncertainty about whether it is at all possible to control one’s destination. Consequently, participants who nevertheless opted in invested maximum resources to resolve this uncertainty before exploring whether control is elastic.”

      Regarding correlations between the perseveration parameter and psychopathology, we have now conducted a comprehensive exploratory analysis of all two-way relationships between parameters and psychopathology scores (new Supplementary Figure 3). Whereas we observed modest negative correlations with social anxiety (LSAS, r=-0.13), cyclothymic temperament (r=0.13), and alcohol use (AUDIT, r=-0.13), none reached statistical significance after FDR correction for multiple comparisons. 

      Regarding the modeling, I also wondered whether a better alternative model than the controllability model would be a simple associative learning model, where a number of tickets are mapped to outcomes, regardless of elasticity.

      We thank the Reviewer for suggesting this alternative model. Following this suggestion, we implemented a simple associative learning model that directly maps each option to its expected value, without a latent representation of elasticity or controllability. Unlike our controllability model which learns the probability of reaching the goal state for each ticket quantity, this associative learning model simply updates option values based on reward prediction errors.

      We found that this simple Q-learning model performed worse than even the controllability model at explaining participant data (log Bayes Factor  ≥1854 on the combined datasets), further supporting our hypothesis that participants are learning latent estimates of control rather than simply associating options with outcomes. We present the results of this analysis in lines 662664:

      We implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets).

      Reviewer 3 (Recommendations for the authors):

      Please make all materials available, including code (analysis and experiment) and data. Please also provide a link to the task or a video of a few trials of the main task.

      We thank the reviewer for this important suggestion. All requested materials are now available at https://github.com/lsolomyak/human_inference_of_elastic_control. This includes all experiment code, analysis code, processed data, and a video showing multiple sample trials of the main task.

      References

      (1)  Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2)  Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3)  Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4)  Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5)  Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6)  Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7)  Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of our revisions

      (1) We have explained the reason why the untrained RNN with readout (value-weight) learning only could not well learn the simple task: it is because we trained the models continuously across trials with random inter-trial intervals rather than separately for each episodic trial and so it was not trivial for the models to recognize that cue presentation in different trials constitutes a same single state since the activities of untrained RNN upon cue presentation should differ from trial to trial (Line 177-185).

      (2) We have shown that dimensionality was higher in the value-RNNs than in the untrained RNN (Fig. 2K,6H).

      (3) We have shown that even when distractor cue was introduced, the value-RNNs could learn the task (Fig. 10).

      (4) We have shown that extended value-RNNs incorporating excitatory and inhibitory units and conforming to the Dale's law could still learn the tasks (Fig. 9,10-right column).

      (5) In the original manuscript, the non-negatively constrained value-RNN showed loose alignment of value-weight and random feedback from the beginning but did not show further alignment over trials. We have clarified its reason and found a way, introducing a slight decay (forgetting), to make further alignment occur (Fig. 8E,F).

      (6) We have shown that the value-RNNs could learn the tasks with longer cue-reward delay (Fig. 2M,6J) or action selection (Fig. 11), and found cases where random feedback performed worse than symmetric feedback.

      (7) We compared our value-RNNs with e-prop (Bellec et al., 2020, Nat Commun). While e-prop incorporates the effects of changes in RNN weights across distant times through "eligibility trace", our value-RNNs do not. The reason why our models can still learn the tasks with cue-reward delay is considered to be because our models use TD error and TD learning itself, even TD(0) without eligibility trace, is a solution for temporal credit assignment. In fact, TD error-based e-prop was also examined, but for that, result with symmetric feedback, but not with random feedback, was shown (their Fig. 4,5) while for another setup of reward-based e-prop without TD error, result with random feedback was shown (their SuppFig. 5). We have noted these in Line 695-711 (and also partly in Line 96-99).

      (8) In the original manuscript, we emphasized only the spatial locality (random rather than symmetric feedback) of our learning rule. But we have now also emphasized the temporal locality (online learning) as it is also crucial for bio-plausibility and critically different from the original value-RNN with BPTT. We also changed the title.

      (9) We have realized that our estimation of true state values was invalid (as detailed in page 34 of this document). Effects of this error on performance comparisons were small, but we apologize for this error.

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      *please note that we numbered your public review comments and recommendations for the authors as Pub1 and Rec1 etc so that we can refer to them in our replies to other comments.

      Pub1. The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained.

      These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      We have added an explanation of untrained RNN in Line 144-147:

      “As a negative control, we also conducted simulations in which these connections were not updated from initial values, referring to as the case with "untrained (fixed) RNN". Notably, the value weights w (i.e., connection weights from the RNN to the striatal value unit) were still trained in the models with untrained RNN.”

      We have also analyzed the dimensionality of network dynamic by calculating the contribution ratios of each principal component of the trajectory of RNN activities. It was revealed that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN. We have added these results in Fig. 2K and Line 210-220 (for our original models without non-negative constraint):

      “In order to examine the dimensionality of RNN dynamics, we conducted principal component analysis (PCA) of the time series (for 1000 trials) of RNN activities and calculated the contribution ratios of PCs in the cases of oVRNNbp, oVRNNrf, and untrained RNN with 20 RNN units. Figure 2K shows a log of contribution ratios of 20 PCs in each case. Compared with the case of untrained RNN, in oVRNNbp and oVRNNrf, initial component(s) had smaller contributions (PC1 (t-test p = 0.00018 in oVRNNbp; p = 0.0058 in oVRNNrf) and PC2 (p = 0.080 in oVRNNbp; p = 0.0026 in oVRNNrf)) while later components had larger contributions (PC3~10,15~20 p < 0.041 in oVRNNbp; PC5~20 p < 0.0017 in oVRNNrf) on average, and this is considered to underlie their superior learning performance. We noticed that late components had larger contributions in oVRNNrf than in oVRNNbp, although these two models with 20 RNN units were comparable in terms of cue~reward state values (Fig. 2J-left).”

      and Fig. 6H and Line 412-416 (for our extended models with non-negative constraint):

      “Figure 6H shows contribution ratios of PCs of the time series of RNN activities in each model with 20 RNN units. Compared with the cases with naive/shuffled untrained RNN, in oVRNNbp-rev and oVRNNrf-bio, later components had relatively high contributions (PC5~20 p < 1.4×10,sup>−6</sup> (t-test vs naive) or < 0.014 (vs shuffled) in oVRNNbp-rev; PC6~20 p < 2.0×10<sup>−7</sup> (vs naive) or PC7~20 p < 5.9×10<sup>−14</sup> (vs shuffled) in oVRNNrf-bio), explaining their superior value-learning performance.”

      Regarding the poor performance of the model with untrained RNN, we would like to add a note. It is sure that untrained RNN with sufficient dimensions should be able to well represent just <10 different states, and state values should be able to be well learned through TD learning regardless of whatever representation is used. However, a difficulty (nontriviality) lies in that because we modeled the tasks in a continuous way, rather than in an episodic way, the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using backprop-through-time (BPTT) for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      Pub2. The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      In the revised manuscript, we examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps. Our online value RNN models with random feedback could still achieve better performance (smaller squared value error) than the models with untrained RNN, although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      Also, we have added the note about our assumption and consideration on the time-step that we described in our provisional reply in Line 136-142:

      “We assumed that a single RNN unit corresponds to a small population of neurons that intrinsically share inputs and outputs, for genetic or developmental reasons, and the activity of each unit represents the (relative) firing rate of the population. Cortical population activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics [46] such as short-term facilitation, whose time constant can be around 500 milliseconds [47]. Therefore, we assumed that single time-step of our rate-based (rather than spike-based) model corresponds to 500 milliseconds.”

      Pub3. In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      We examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units” and described the details of the extended models in Line 844-862:

      Pub4. Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      We examined the performance of the models in a task in which distractor cue randomly appeared. As a result, our model with random feedback, as well as the model with backprop, could still learn the state values much better than the models with untrained RNN. We have added these results in Fig. 10 and subsection “4.2 Task with distractor cue”

      Reviewer #1 (Recommendations for the authors):

      Detailed comments to authors

      Rec1. Are the untrained RNNs discussed in methods? It seems quite good in estimating value but has a strong dopamine response at time of reward. Is nothing trained in the untrained RNN or are the W values trained. Untrained RNN are not bad at estimating value, but not as good as the two other options. It would seem reasonable that an untrained RNN (if I understand what it is) will be sufficient for such simple Pavlovian conditioning paradigms. This is provided that the RNN generates a complete, or nearly complete basis. Random RNN's provided that the random weights are chosen properly can indeed generate a nearly complete basis. Once there is a nearly complete temporal basis, it seems that a powerful enough learning rule will be able to learn the very simple Pavlovian conditioning. Since there are only 3 time-steps from cue to reward, an RNN dimensionality of 3 would be sufficient. A failure to get a good approximation can also arise from the failure of the learning algorithm for the output weights (W).

      As we mentioned in our reply to your public comment Pub1 (page 3-5), we have added an explanation of "untrained RNN" (in which the value weights were still learnt) (Line 144-147). We also analyzed the dimensionality of network dynamics by calculating the contribution ratios of principal components of the trajectory of RNN activities, showing that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN (Fig. 2K/Line 210-220, Fig.6H/Line 412-416). Moreover, also as we mentioned in our reply to your public comment Pub1, we have added a note that even learning of a small number of states was not trivially easy because we considered continuous learning across trials rather than episodic learning of separate trials and thus it was not trivial for the model to know that cue presentation in different trials after random lengths of inter-trial interval should still be regarded as a same single state (Line 177-185).

      Rec2. For all cases, it will be useful to estimate the dimensionality of the RNN. Is the dimensionality of the untrained RNN smaller than in the trained cases? If this is the case, this might depend on the choice of the initial random (I assume) recurrent connectivity matrix.

      As mentioned above, we have analyzed the dimensionality of the network dynamics, and as you said, the dimensionality of the model with untrained RNN (which was indeed the initial random matrix as you said, as we mentioned above) was on average smaller than the trained value RNN models (Fig. 2K/Line 210-220, Fig.6H/Line 412-416).

      Rec3. It is surprising that the error starts increasing for more RNN units above ~15. See discussion. This might indicate a failure to adjust the learning parameters of the network rather than a true and interesting finding.

      Thank you very much for this insightful comment. In the original manuscript, we set the learning rate to a fixed value (0.1), without normalization by the squared norm of feature vector (as we mentioned in Line 656-7 of the original manuscript) because we thought such a normalization could not be locally (biologically) implemented. However, we have realized that the lack of normalization resulted in excessively large learning rate when the number of RNN units was large and it could cause instability and error increase as you suggested. Therefore, in the revised manuscript, we have implemented a normalization of learning rate (of value weights) that does not require non-local computations, specifically, division by the number of RNN units. As a result, the error now monotonically decreased, as the number of RNN units increased, in the non-negatively constrained models (Fig. 6E-left) and also largely in the unconstrained model with random feedback, although still not in the unconstrained model with backprop or untrained RNN (Fig. 2J-left)

      Rec4. Not numbering equations is a problem. For example, the explanations of feedback alignment (lines 194-206) rely on equations in the methods section which are not numbered. This makes it hard to read these explanations. Indeed, it will also be better to include a detailed derivation of the explanation in these lines in a mathematical appendix. Key equations should be numbered.

      We have added numbers to key equations in the Methods, and references to the numbers of corresponding equations in the main text. Detailed derivations are included in the Methods.

      Rec5. What is shown in Figure 3C? - an equation will help.

      We have added an explanation using equations in the main text (Line 256-259).

      Rec6. The explanation of why alignment occurs is not satisfactory, but neither is it in previous work on feedforward networks. The least that should be done though

      Regarding why alignment occurs, what remained mysterious (to us) was that in the case of nonnegatively constrained model, while the angle between value weight vector (w) and the random feedback vector (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials, despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added these in the revised manuscript (Line 463-477):

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Rec7. I don't understand the qualitative difference between 4G and 4H. The difference seems to be smaller but there is still an apparent difference. Can this be quantified?

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      Rec8. More biologically realistic constraints.

      Are the weights allowed to become negative? - No.

      Figure 6C - untrained RNN with non-negative x_i. Again - it was not explained what untrained RNN is. However, given my previous assumption, this is probably because the units developed in an untrained RNN is much further from representing a complete basis function. This cannot be done with only positive values. It would be useful to see network dynamics of units for untrained RNN. It might also be useful in all cases to estimate the dimensionality of the RNN. For 3 time-steps, it needs to be at least 3, and for more time steps as in Figure 4, larger.

      As we mentioned in our reply to your public comment Pub3 (page 6-8), in the revised manuscript we examined models that incorporated inhibitory and excitatory units and followed Dale's law, which could still learn the tasks (Fig. 9, Line 479-520). We have also analyzed the dimensionality of network dynamics as we mentioned in our replies to your public comment Pub1 and recommendations Rec1 and Rec2.

      Rec9. A new type of untrained RNN is introduced (Fig 6D) this is the first time an explanation of of the untrained RNN is given. Indeed, the dimensionality of the second type of untrained RNN should be similar to the bioVRNNrf. The results are still not good.

      In the model with the new type of untrained RNN whose elements were shuffled from trained bioVRNNrf, contribution ratios of later principal components of the trajectory of RNN activities (Fig. 6H gray dotted line) were indeed larger than those in the model with native untrained RNN (gray solid line) but still much smaller than those in the trained value RNN models with backprop (red line) or random feedback (blue line). It is considered that in value RNN, RNN connections were trained to realize high-dimensional trajectory, and shuffling did not generally preserve such an ability.

      Rec10. The discussion is too long and verbose. This is not a review paper.

      We have made the original discussion much more compact (from 1686 words to 940 words). We have added new discussion, in response to the review comments, but the total length remains to be shorter than before (1589 words).

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain nonnegative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      We have examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units”.

      We have also examined the performance of the models in a task in which distractor cue randomly appeared, finding that our models could still learn the state values much better than the models with untrained RNN. We have added these result in Fig. 10 and subsection “4.2 Task with distractor cue”.

      Regarding the depth, we continue to think about it but have not yet come up with concrete ideas.

      Reviewer #2 (Recommendations for the authors):

      (1) I think the work would greatly benefit from more proofreading. There are language errors/oddities throughout the paper, I will list just a few examples from the introduction:

      Thank you for pointing this out. We have made revisions throughout the paper.

      line 63: "simultaneously learnt in the downstream of RNN". Simultaneously learnt in networks downstream of the RNN? Simulatenously learn in a downstream RNN? The meaning is not clear in the original sentence.

      We have revised it to "simultaneously learnt in connections downstream of the RNN" (Line 67-68).

      starting in line 65: " A major problem, among others.... value-encoding unit" is a run-on sentence and would more readable if split into multiple sentences.

      We have extensively revised this part, which now consists of short sentences (Line 70-75).

      line 77: "in supervised learning of feed-forward network" should be either "in supervised learning of a feed-forward network" or "in supervised learning of feed-forward networks".

      We have changed "feed-forward network" to "feed-forward networks" (Line 83).

      (2) Under what conditions can you use an online learning rule which only considers the influence of the previous timestep? It's not clear to me how your networks solve the temporal credit assignment problem when the cue-reward delay in your tasks is 3-5ish time steps. How far can you stretch this delay before your networks stop learning correctly because of this one-step assumption? Further, how much does feedback alignment constrain your ability to learn long timescales, such as in Murray, J.M. (2019)?

      The reason why our models can solve the temporal credit assignment problem at least to a certain extent is considered to be because temporal-difference (TD) learning, which we adopted, itself has a power to resolve temporal credit assignment, as exemplified in that TD(0) algorithms without eligibility trance can still learn the value of distant rewards. We have added a discussion on this in Line 702-705:

      “…our models do not have "eligibility trace" (nor memorable/gated unit, different from the original value-RNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]).”

      We have also examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps, and our models with random feedback could still achieve better performance than the models with untrained RNN although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      As for the difficulty due to random feedback compared to backprop, there appeared to be little difference in the models without non-negative constraint (Fig. 2M), whereas in the models with nonnegative constraint, when the cue-reward delay was elongated to 6 time-steps, the model with random feedback performed worse than the model with backprop (Fig. 6J bottom-left panel).

      (3) Line 150: Were the RNN methods trained with continuation between trials?

      Yes, we have added

      “The oVRNN models, and the model with untrained RNN, were continuously trained across trials in each task, because we considered that it was ecologically more plausible than episodic training of separate trials.” in Line 147-150. This is considered to make learning of even the simple cue-reward association task nontrivial, as we describe in our reply to your comment 9 below.

      (4) Figure 2I, J: indicate the statistical significance of the difference between the three methods for each of these measures.

      We have added statistical information for Fig. 2J (Line 198-203):

      “As shown in the left panel of Fig. 2J, on average across simulations, oVRNNbp and oVRNNrf exhibited largely comparable performance and always outperformed the untrained RNN (p < 0.00022 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units), although oVRNNbp somewhat outperformed or underperformed oVRNNrf when the number of RNN units was small (≤10 (p < 0.049)) or large (≥25 (p < 0.045)), respectively.”

      and also Fig. 6E (for non-negative models) (Line 385-390):

      “As shown in the left panel of Fig. 6E, oVRNNbp-rev and oVRNNrf-bio exhibited largely comparable performance and always outperformed the models with untrained RNN (p < 2.5×10<sup>−12</sup> in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units), although oVRNNbp-rev somewhat outperformed or underperformed oVRNNrf-bio when the number of RNN units was small (≤10 (p < 0.00029)) or large (≥25 (p < 3.7×10<sup>−6</sup>)), respectively…”

      Fig. 2I shows distributions, whose means are plotted in Fig. 2J, and we did not add statistics to Fig. 2I itself.

      (5) Line 178: Has learning reached a steady state after 1000 trials for each of these networks? Can you show a plot of error vs. trial number?

      We have added a plot of error vs trial number for original models (Fig. 2L, Line 221-223):

      “We examined how learning proceeded across trials in the models with 20 RNN units. As shown in Fig. 2L, learning became largely converged by 1000-th trial, although slight improvement continued afterward.”

      and non-negatively constrained models (Fig. 6I, Line 417-422):

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      As shown in these figures, learning became largely steady at 1000 trials, but still slightly continued, and we have added simulations with 3000 trials (Fig. 2M and Fig. 6J).

      (6) Line 191: Put these regression values in the figure caption, as well as on the plot in Figure 3B.

      We have added the regression values in Fig. 3B and its caption.

      (7) Line 199: This idea of being in the same quadrant is interesting, but I think the term "relatively close angle" is too vague. Is there another more quantatative way to describe this what you mean by this?

      We have revised this (Line 252-254) to “a vector that is in a relatively close angle with c , or more specifically, is in the same quadrant as (and thus within at maximum 90° from) c (for example, [c<sub>1</sub>  c<sub>2</sub>  c<sub>3</sub>]<sup>T</sup> and [0.5c<sub>1</sub> 1.2c<sub>2</sub> 0.8c<sub>3</sub>]T) “

      (8) Line 275: I'd like to see this measure directly in a plot, along with the statistical significance.

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      (9) Line 280: Surely the untrained RNN should be able to solve the task if the reservoir is big enough, no? Maybe much bigger than 50 units, but still.

      We think this is not sure. A difficulty lies in that because we modeled the tasks in a continuous way rather than in an episodic way (as we mentioned in our reply to your comment 3), the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using BPTT for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      (10) It's a bit confusing to compare Figure 4C to Figure 4D-H because there are also many features of D-H which do not match those of C (response to cue, response to late reward in task 1). It would make sense to address this in some way. Is there another way to calculate the true values of the states (e.g., maybe you only start from the time of the cue) which better approximates what the networks are doing?

      As we mentioned in our replies to your comments 3 and 9, our models with RNN were trained continuously across trials rather than separately for each episodic trial, and whether the models could still learn the state representation is a key issue. Therefore, starting learning from the time of cue would not be an appropriate way to compare the models, and instead we have made statistical comparison regarding key features, specifically, TD-RPEs at early and late rewards, as indicated in Fig. 4D-H.

      (11) Line 309: Can you explain why this non-monotic feature exists? Why do you believe it would be more biologically plausible to assume monotonic dependence? It doesn't seem so straightforward to me, I can imagine that competing LTP/LTD mechanisms may produce plasticity which would have a non-monotic dependence on post-synaptic activity.

      Thank you for this insightful comment. As you suggested, non-monotonic dependence on the postsynaptic activity (BCM rule) has been proposed for unsupervised learning (cortical self-organization) (Bienenstock et al., 1982 J Neurosci), and there were suggestions that triplet-based STDP could be reduced to a BCM-like rule and additional components (Gjorgjieva et al., 2011 PNAS; Shouval, 2011 PNAS). However, the non-monotonicity appeared in our model, derived from the backprop rule, is maximized at the middle and thus opposite from the BCM rule, which is minimized at the middle (i.e., initially decrease and thereafter increase). Therefore we consider that such an increase-then-decreasetype non-monotonicity would be less plausible than a monotonic increase, which could approximate an extreme case (with a minimum dip) of the BCM rule. We have added a note on this point in Line 355-358:

      “…the dependence on the post-synaptic activity was non-monotonic, maximized at the middle of the range of activity. It would be more biologically plausible to assume a monotonic increase (while an opposite shape of nonmonotonicity, once decrease and thereafter increase, called the BCM (Bienenstock-Cooper-Munro) rule has actually been suggested [56-58]).”

      (12) Line 363: This is the most exciting part of the paper (for me). I want to learn way more about this! Don't hide this in a few sentences. I want to know all about loose vs. feedback alignment. Show visualizations in 3D space of the idea of loose alignment (starting in the same quadrant), and compare it to how feedback alignment develops (ending in the same quadrant). Does this "loose" alignment idea give us an idea why the random feedback seems to settle at 45 degree angle? it just needs to get the signs right (same quadrant) for each element?

      In reply to this encouraging comment, we have made further analyses of the loose alignment. By the term "loose alignment", we meant that the value weight vector w and the feedback vector c are in the same (non-negative) quadrant, as you said. But what remained mysterious (to us) was while the angle between w and c was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the nonnegative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      As for visualization, because the model's dimension was high such as 12, we could not come up with better ways of visualization than the trial versus angle plot (Fig. 3A, 8A,F). Nevertheless, we would expect that the abovementioned additional analyses of loose alignment (with graphs) are useful to understand what are going on.

      (13) Line 426: how does this compare to some of the reward modulated hebbian rules proposed in other RNNs? See Hoerzer, G. M., Legenstein, R., & Maass, W. (2014). Put another way, you arrived at this from a top-down approach (gradient descent->BP->approximated by RF->non-negativity constraint>leads to DA dependent modulation of Hebbian plasticity). How might this compare to a bottom up approach (i.e. starting from the principle of Hebbian learning, and adding in reward modulation)

      The study of Hoerzer et al. 2014 used a stochastic perturbation, which we did not assume but can potentially be integrated. On the other hand, Hoerzer et al. trained the readout of untrained RNN, whereas we trained both RNN and its readout. We have added discussion to compare our model with Hoerzer et al. and other works that also used perturbation methods, as well as other top-down approximation method, in Line 685-711 (reference 128 is Hoerzer et al. 2014 Cereb Cortex):

      “As an alternative to backprop in hierarchical network, aside from feedback alignment [36], Associative Reward-Penalty (A<sub>R-P</sub>) algorithm has been proposed [124-126]. In A<sub>R-P</sub>, the hidden units behave stochastically, allowing the gradient to be estimated via stochastic sampling. Recent work [127] has proposed Phaseless Alignment Learning (PAL), in which high-frequency noise-induced learning of feedback projections proceeds simultaneously with learning of forward projections using the feedback in a lower frequency. Noise-induced learning of the weights on readout neurons from untrained RNN by reward-modulated Hebbian plasticity has also been demonstrated [128]. Such noise- or perturbation-based [40] mechanisms are biologically plausible because neurons and neural networks can exhibit noisy or chaotic behavior [129-131], and might improve the performance of value-RNN if implemented.

      Regarding learning of RNN, "e-prop" [35] was proposed as a locally learnable online approximation of BPTT [27], which was used in the original value RNN 26. In e-prop, neuron-specific learning signal is combined with weight-specific locally-updatable "eligibility trace". Reward-based e-prop was also shown to work [35], both in a setup not introducing TD-RPE with symmetric or random feedback (their Supplementary Figure 5) and in another setup introducing TD-RPE with symmetric feedback (their Figure 4 and 5). Compared to these, our models differ in multiple ways.

      First, we have shown that alignment to random feedback occurs in the models driven by TD-RPE. Second, our models do not have "eligibility trace" (nor memorable/gated unit, different from the original valueRNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]). However, as mentioned before, single time-step in our models was assumed to correspond to hundreds of milliseconds, incorporating slow synaptic dynamics, whereas e-prop is an algorithm for spiking neuron models with a much finer time scale. From this aspect, our models could be seen as a coarsetime-scale approximation of e-prop. On top of these, our results point to a potential computational benefit of biological non-negative constraint, which could effectively limit the parameter space and promote learning.”

      Related to your latter point (and also replying to other reviewer's comment), we also examined the cases where the random feedback in our model was replaced with uniform feedback, which corresponds to a simple bottom-up reward-modulated triplet plasticity rule. As a result, the model with uniform feedback showed largely comparable, but somewhat worse, performance than the model with random feedback. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1)<sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN. and also added a biological implication of the results in Line 644-652:

      We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (postsynaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      We have examined the cases where the feedback was uniform, i.e., in the direction of (1, 1, ..., 1) in both models without and with non-negative constraint. In both models, the models with uniform feedback performed somewhat worse than the original models with random feedback, but still better than the models with untrained RNN. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN.”

      We have also added a discussion on the biological implication of the model with uniform feedback mentioned in our provisional reply in Line 644-652:

      “We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      In addition, while preparing the revised manuscript, we found a recent simulation study, which showed that uniform feedback coupled with positive forward weights was effective in supervised learning of one-dimensional output in feed-forward network (Konishi et al., 2023, Front Neurosci).

      We have briefly discussed this work in Line 653-655:

      “Notably, uniform feedback coupled with positive forward weights was shown to be effective also in supervised learning of one-dimensional output in feed-forward network [114], and we guess that loose alignment may underlie it.”

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      We have added a discussion on the prediction of our models, mentioned in our provisional reply, in Line 627-638:

      “oVRNNrf predicts that the feedback vector c and the value-weight vector w become gradually aligned, while oVRNNrf-bio predicts that c and w are loosely aligned from the beginning. Element of c could be measured as the magnitude of pyramidal cell's response to DA stimulation. Element of w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell. Then, the abovementioned predictions could be tested by (i) identify cortical, striatal, and VTA regions that are connected, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether DA→pyramidal responses and pyramidal→striatal responses are associated across pyramidal cells, and whether such associations develop through learning.”

      Moreover, we have considered another (technically more doable) prediction of our model, and described it in Line 639-643:

      “Testing this prediction, however, would be technically quite demanding, as mentioned above. An alternative way of testing our model is to manipulate the cortical DA feedback and see if it will cause (re-)alignment of value weights (i.e., cortical striatal strengths). Specifically, our model predicts that if DA projection to a particular cortical locus is silenced, effect of the activity of that locus on the value-encoding striatal activity will become diminished.”

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task? [1] https://www.nature.com/articles/s41467-020-17236-y

      As for a specific feature of non-negative models, we did not describe (actually did not well recognize) an intriguing result that the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left (please mind the difference in the vertical scales)). This suggests that the non-negative constraint effectively limited the parameter space and thereby learning became efficient. We have added this result in Line 392-395:

      “Remarkably, oVRNNrf-bio generally achieved better performance than both oVRNNbp and oVRNNrf, which did not have the non-negative constraint (Wilcoxon rank sum test, vs oVRNNbp : p < 7.8×10,sup>−6</sup> for 5 or ≥25 RNN units; vs oVRNNrf: p < 0.021 for ≤10 or ≥20 RNN units).”

      Also, in the models with non-negative constraint, the model with random feedback learned more rapidly than the model with backprop although they eventually reached a comparable level of errors, at least in the case with 20 RNN units. This is presumably because the value weights did not develop well in early trials and so the backprop-based feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning. We have added this result in Fig. 6I and Line 417-422:

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      We have also added a discussion on how our model can be positioned in relation to other models including the study you mentioned (e-prop by Bellec, ..., Maass, 2020) in subsection “Comparison to other algorithms” of the Discussion):

      Regarding the slightly better performance of the non-negative model with random feedback than that of the non-negative model with backprop when the number of RNN units was large (mentioned in our provisional reply), state values in the backprop model appeared underdeveloped than those in the random feedback model. Slightly better performance of random feedback than backprop held also in our extended model incorporating excitatory and inhibitory units (Fig. 9B).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In the cue-reward association task with 3 time-steps delay, the non-negative model with random feedback performed largely comparably to the non-negative model with backprop, and this remained to hold in a task where distractor cue, which was not associated with reward, appeared in random timings. We have added the results in Fig. 10 and subsection “4.2 Task with distractor cue”.

      We have also examined the cases where the cue-reward delay was elongated. In the case of longer cue-reward delay (6 time-steps), in the models without non-negative constraint, the model with random feedback performed comparably to (and slightly better than when the number of RNN units was large) the model with backprop (Fig. 2M). In contrast, in the models with non-negative constraint, the model with random feedback underperformed the model with backprop (Fig. 6J, left-bottom). This indicates a difference between the effect of non-negative random feedback and the effect of positive+negative random feedback.

      We have further examined the performance of the models in terms of action selection, by extending the models to incorporate an actor-critic algorithm. In a task with inter-temporal choice (i.e., immediate small reward vs delayed large reward), the non-negative model with random feedback performed worse than the non-negative model with backprop when the number of RNN units was small. When the number of RNN increased, these models performed more comparably. These results are described in Fig. 11 and subsection “4.3 Incorporation of action selection”.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      As for 7a), 'CSC (complete serial compound)' was actually not the name of the task but the name of the 'punctate' state representation, in which each state (timing from cue) is represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), ..., and (0, 0, ..., 1). As you pointed out, using the name of 'CSC' would make the text appearing more technical than it actually is, and so we have moved the reference to the name of 'CSC' to the Methods (Line 903-907):

      “For the agents with punctate state representation, which is also referred to as the complete serial compound (CSC) representation [1, 48, 133], each timing from a cue in the tasks was represented by a 10-dimensional one-hot vector, starting from (1 0 0 ... 0)<sup>T</sup> for the cue state, with the next state (0 1 0 ... 0) <sup>T</sup> and so on.”

      and in the Results we have instead added a clearer explanation (Line 163-165):

      “First, for comparison, we examined traditional TD-RL agent with punctate state representation (without using the RNN), in which each state (time-step from a cue) was represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), and so on.”

      As for 7b), we have added the rationale for our examination of the tasks with probabilistic structures (Line 282-294):

      “Previous work [54] examined the response of DA neurons in cue-reward association tasks in which reward timing was probabilistically determined (early in some trials but late in other trials). There were two tasks, which were largely similar but there was a key difference that reward was given in all the trials in one task whereas reward was omitted in some randomly determined trials in another task. Starkweather et al. [54] found that the DA response to later reward was smaller than the response to earlier reward in the former task, presumably reflecting the animal's belief that delayed reward will surely come, but the opposite was the case in the latter task, presumably because the animal suspected that reward was omitted in that trial. Starkweather et al.[54] then showed that such response patterns could be explained if DA encoded TD-RPE under particular state representations that incorporated the probabilistic structures of the task (called the 'belief state'). In that study, such state representations were 'handcrafted' by the authors, but the subsequent work [26] showed that the original value-RNN with backprop (BPTT) could develop similar representations and reproduce the experimentally observed DA patterns.”

      As for 7c), we have extensively revised the text of the results, adding high-level explanations while trying to reduce the lengthy low-level descriptions (e.g., Line 172-177 for Fig2E-G).

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      There is actually an unexpected finding with non-negative model: the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left), presumably because the nonnegative constraint effectively limited the parameter space and thereby learning became efficient, as we mentioned in our reply to your point 6a above (we did not well recognize this at the time of original submission).

      Another potential merit of our present work is the simplicity of the model and the task. This simplicity enabled us to derive an intuitive explanation on why feedback alignment could occur. Such an intuitive explanation was lacking in previous studies while more precise mathematical explanations did exist. Related to the mechanism of feedback alignment, one thing remained mysterious to us at the time of original submission. Specifically, in the non-negatively constraint random feedback model, while the angle between the value weight (w) and the random feedback (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Correction of an error in the original manuscript

      In addition to revising the manuscript according to your comments, we have made a correction on the way of estimating the true state values. Specifically, in the original manuscript, we defined states by relative time-steps from a reward and estimated their values by calculating the sums of discounted future rewards starting from them through simulations. However, we assumed variable inter-trial intervals (ITIs) (4, 5, 6, or 7 time-steps with equal probabilities), and so until receiving cue information, agent should not know when the next reward will come. Therefore, states for the timings up to the cue timing cannot be defined by the upcoming reward, but previously we did so (e.g., state of "one timestep before cue") without taking into account the ITI variability.

      We have now corrected this issue, having defined the states of timings with respect to the previous (rather than upcoming) reward. For example, when ITI was 4 time-steps and agent existed in its last time-step, agent will in fact receive a cue at the next time-step, but agent should not know it until actually receiving the cue information and instead should assume that s/he was at the last time-step of ITI (if ITI was 4), last − 1 (if ITI was 5), last − 2 (if ITI was 6), or last − 3 (if ITI was 7) with equal probabilities (in a similar fashion to what we considered when thinking about state definition for the probabilistic tasks). We estimated the true values of states defined in this way through simulations. As a result, the corrected true value of the cue-timing has become slightly smaller than the value described in the original manuscript (reflecting the uncertainty about ITI length), and consequently small positive TD-RPE has now appeared at the cue timing.

      Because we measured the performance of the models by squared errors in state values, this correction affected the results reporting the performance. Fortunately, the effects were relatively minor and did not largely alter the results of performance comparisons. However, we sincerely apologize for this error. In the revised manuscript, we have used the corrected true values throughout the manuscript, and we have described the ways of estimating these values in Line 919-976.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      We agree that context is important! Although we expected to get the most interesting results from studying the classical genes, we did include the non-classical genes specifically for comparison. They are located in the same genomic region, have multiple sequences catalogued in different species (although they are less diverse), and perform critical immune functions. We think this is a more appropriate set to compare with the classical MHC genes than, say, a random set of genes. Interestingly, we did not detect TSP in these non-classical genes. This likely means that the classical MHC genes are truly exceptional, but it could also mean that not enough sequences are available for the non-classical genes to detect TSP. 

      It would be very interesting to repeat this analysis for another gene family to see whether such deep TSP also occurs in other immune or non-immune gene families. We are lucky that decades of past work and a dedicated database exists for cataloging MHC sequences. When this level of sequence collection is achieved for other highly polymorphic gene families, it will be possible to do a comparable analysis.  

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      We were not able to infer TSP rates conditional on rates of gene gain/loss. We agree that some cases of TSP were likely lost due to the loss of a gene paralog from certain species. Furthermore, the dearth of MHC whole-region and allele sequences available for most primates makes it difficult to detect TSP, even if the gene paralog is still present. Long-read sequencing of more primate genomes should help with this. We agree that it would also be very interesting to study TSPs that were maintained for millions of years but were lost recently.

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      We agree that a linear model is likely not the most biologically reasonable choice, as protein interactions are complex. However, we made the choice to implement the simplest model because the evolutionary rates we inferred were relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

      To strengthen this claim, we added Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the eLife template does not allow). Here, we plot the number of associations for each amino acid against evolutionary rate, revealing a significant positive slope in Class I. We also added explanatory text for this figure in lines 400-404.

      Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534).  We also added text (lines 216-219 and 250-252) to more explicitly point out that our method is conservative when few sequences are available.

      We also added a paragraph to the discussion which addresses data quality and mismapping issues (lines 473-499).

      We clarified the role of our companion paper (line 49-50) by changing “In our companion paper, we explored the relationships between the different classical and non-classical genes” to “In our companion paper, we built large multi-gene trees to explore the relationships between the different classical and non-classical genes.” We also changed the text in lines 97-99 from “In our companion paper, we compared genes across dozens of species and learned more about the orthologous relationships among them” to “In our companion paper, we built trees to compare genes across dozens of species. When paired with previous literature, these trees helped us infer orthology and assign sequences to genes in some cases.”

      Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):

      -  Inadequate description and presentation of the data used

      -  Large parts of the results read like extended figure captions, which breaks the flow. - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.

      -  The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

      We address these comments in the more detailed section below.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The abstract could benefit from being sharpened. A personal pet peeve is a common habit of saying we don't know everything about a topic (line 16 - "lack a full picture of primate MHC evolution"); We never know everything on a topic, so this is hardly a strong rationale to do more work on it. This is followed by "to start addressing this gap" - which is vague because you haven't explicitly stated any gap, you simply said we are not yet omniscent on the topic. Please clearly identify a gap in our knowledge, a question that you will be able to answer with this paper.

      That makes sense! We added another sentence to the abstract to make the specific gap clearer. Inserted “In particular, we do not know to what extent genes and alleles are retained across speciation events” in lines 16-17.

      Reviewer #2 (Recommendations for the authors):

      - Some discussion of alternative explanations when certain comparisons were not found to have TSP - is this consistent with genetic drift sometimes leading to lineage loss, or does it suggest that the proposed tradeoff between autoimmunity and pathogen recognition might differ depending on primates' life history and/or exposure to similar pathogens? Could the trade-off of pathogen to self-recognition not be as costly in some species?

      This is consistent with genetic drift, as no lineages are expected to be maintained across these distantly-diverged primates under neutral selection. These ideas are certainly possible, but our Bayes Factor test only reveals evidence (or lack thereof) for deviations from the species tree and cannot provide reasons why or why not.

      - It would be interesting to put these results on very long-term balancing selection in the context of what has been reported at the region for shorter term balancing selection. The discussion compares findings of previous genes in the literature but not regarding the time scale.

      Indeed, there is some evidence for the idea of “divergent allele advantage”, in which MHC-heterozygous individuals have a greater repertoire of peptides that they can present, leading to greater resistance against pathogens and greater fitness. This heterozygote advantage thus leads to balancing selection (Pierini and Lenz, 2018; Chowell et al., 2019). Our discussion mentions other time scales of balancing selection across the primates at the MHC and other loci, but we choose to focus more on long-term than short-term balancing selection.

      - Lines 223-226 - how is the difference in BF across exons in MHC-A to be interpreted? The paragraph is about MHC-A, but then the explanation in the last sentence is for when similar BF are observed which is not the case for MHC-A. Is this interpreted as lack of evidence for TSP? Or something about recombination or gene conversion? Or that one exon may be under balancing selection but not the other?

      Thank you for pointing out the confusing logic in this paragraph. 

      Previous: “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Many sequences had to be excluded from MHC-A comparisons because they were identified as gene-converted in the \textit{GENECONV} analysis or were previously identified as recombinants \citep{Hans2017,Gleimer2011,Adams2001}. Importantly, for MHC-A we do not see concordance in Bayes factors across the different exons, whereas we do for the other gene groups. Similar Bayes factors across all exons for a given comparison is thus evidence in favor of TSP being the primary driver of the observed deep coalescence structure (rather than recombination or gene conversion).” Current (lines 228-238): 

      “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Past work suggests that this gene has had a long history of gene conversion affecting different exons, resulting in different evolutionary histories for different parts of the gene \citep{Hans2017,Gleimer2011,Adams2001}. Indeed, we excluded many MHC-A sequences from our Bayes factor calculations because they were identified as gene-converted in our \textit{GENECONV} analysis or were previously suggested to be recombinants. As shown in \FIG{bayes_factors_classI}, the lack of concordance in Bayes factors across the different exons for MHC-A is evidence for gene conversion, rather than balancing selection, being the most important factor in this gene's evolution. In contrast, the other gene groups generally show concordance in Bayes factors across exons. We interpret this as evidence in favor of TSP being the primary driver of the observed deep coalescence structure for MHC-B and -C (rather than recombination or gene conversion).”

      - In Figures 5C and 6C, the points sometimes show a kind of smile pattern of possibly higher rates further from the peptide. Did authors explore other fits like a polynomial? Or, whether distance only matters in close proximity to the peptide? Out of curiosity, is it possible to map substitution time/branch into the distance to the peptide binding region for each substitution? Is there any pattern with distance to interacting proteins in non-peptide binding MHC proteins like MHC-DOA? Although they don't have a PBR they do interact with other proteins.

      Thank you for these ideas! We did not explore other fits, such as a polynomial, because we wanted to implement the simplest model. Our evolutionary rates are relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      There is most likely a relationship between evolutionary rate and the distance to interacting proteins in the non-peptide-binding molecules MHC-DM and -DO. However, there are few currently available models and it is difficult to determine which residues in these models are actually interacting. However, researchers with more experience in protein interactions would be able to undertake such an analysis. 

      - How biased is the database towards human alleles? Could this affect some of the analyses, including the coincidence of rapidly evolving sites with associations? Are there more associations than expected under some null model?

      While the database is indeed biased toward human alleles, we included only a small subset of these in order to create a more balanced data set spanning the primates. This is unlikely to affect the coincidence of rapidly-evolving sites with associations; however, we note that there are no such association studies meeting our criteria in other species, meaning the associations are only coming from studies on humans.

      - To this reader, it is unnecessary and distracting to describe the figures within the text; there are frequent sentences in the text that belongs in the figure legend instead (e.g., lines 139-143, 208-211, 214-215, 328-330, etc). It would be better to focus on the results from the figures and then cite the figure, where the colors and exactly what is plotted can be in the figure legend.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      - I'm still concerned that the poor mappability of short-read data is contributing in some ways. Were the sequences in the database mostly from long-reads? Was nucleotide diversity calculated directly from the sequences in the database or from another human dataset? Is missing data at some sites accounted for in the denominator?

      The sequences in the database are mostly from short reads and come from a wide array of labs. We have added a paragraph to the discussion to explain the limitations of this (lines 473-499). However, the nucleotide diversity calculations shown in Figure 1 do not rely on the MHC database; rather, they are calculated from the human genomes in the 1000 Genomes project. Nucleotide diversity would be calculable for other species, but we did not do so for exactly the reason you mention–too much missing data.

      - The Figure 2 and Figure 3 supplements took me a little bit to understand - is it really worth pointing out the top 5 Bayes-factor comparisons when there is no evidence for TSP? A lot of the colored squares are not actually supporting TSP but in the grids you can't see which are and which aren't without looking at the Bayes Factor. I wonder if it would help if only those with BF > 100 were shown? Or if these were marked some other way so that it was easy to see where TSPs are supported.

      Thank you for your perspective on these figures! We initially limited them to only show >100 Bayes factors for each gene group and region, but some gene groups have no high Bayes factors. Additionally, the “summary” tree pictured in these figures is necessarily a simplification of the full space of posterior trees. We felt that showing low Bayes factor comparisons could help readers understand this relationship. For example, allele sets that look non-monophyletic on the summary tree may still have a low Bayes factor, showing that they are generally monophyletic throughout the larger (un-visualizable) space of trees.

      Reviewer #3 (Recommendations for the authors):

      Specific comments

      Abstract

      I think the abstract would benefit from some editing. For example, one might get the impression that you equate allele sharing, which would normally be understood as sharing identical sequences, with sharing ancestral allelic lineages. This distinction is important because you can have many TSPs without sharing identical allele sequences. In l. 20 you write about "deep TSP", which requires either definition of reformulation. In l. 21-23 you seem to suggest that long-term retention of allelic lineages is surprising in the light of rapid sequence evolution - it may be, depending on the evolutionary scenarios one is willing to accept, but perhaps it's not necessary to float such a suggestion in the abstract where it cannot be properly explained due to space constraints? The last sequence needs a qualifier like "in some cases".

      Thank you for catching these! For clarity, we changed several words:

      ● “alleles” to “allelic lineages” in line 13

      ● “deep” to “ancient” in line 21

      ● “Despite” to “in addition to” in line 22

      ● Added “in some cases” to line 28

      Results - Overall, parts of the results read like extended figure captions. I understand that the authors want to make the complex figures accessible to the reader. However, including so much information in the text disrupts the flow and makes it difficult to follow what the main findings and conclusions are.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      l. 37-39 such a short sentence on non-classical MHC is necessarily an oversimplification, I suggest it be expanded or deleted.

      There is certainly a lot to say about each of these genes! While we do not have space in this paper’s introduction to get into these genes’ myriad functions, we added a reference to our companion paper in lines 40-41:

      “See the appendices of our companion paper \citep{Fortier2024a} for more detail.”

      These appendices are extensive, and readers can find details and references for literature on each specific gene there. In addition, several genes are mentioned in analyses further on in the results, and their specific functions are discussed in more detail when they arise.

      l. 47 -49 It would be helpful to briefly outline your criteria for selecting these 17 genes, even if this is repeated later.

      Thank you! For greater clarity, we changed the text (lines 50-52) from “Here, we look within 17 specific genes to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.” to “Here, we look within 17 specific genes---representing classical, non-classical, Class I, and Class II ---to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.“  

      l.85-87 I may be completely wrong, but couldn't problems with establishing orthology in some cases lead to false inferences of TSP, even in primates? Or do you think the data are of sufficient quality to ignore such a possibility? (you touch on this in pp. 261-264)

      Yes, problems with establishing orthology can lead to false inferences of TSP, and it has happened before. For example, older studies that used only exon 2 (binding-site-encoding) of the MHC-DRB genes inferred trees that grouped NWM sequences with ape and OWM sequences. Thus, they named these NWM genes MHC-DRB3 and -DRB5 to suggest orthology with ape/OWM MHC-DRB3 and -DRB5, and they also suggested possible TSP between the groups. However, later studies that used non-binding-site-encoding exons or introns noticed that these NWM sequences did not group with ape/OWM sequences (which now shared the same name), providing evidence against orthology. This illustrates that establishing orthology is critical before assessing TSP (as is comparing across regions). This is part of the reason we published a companion paper (https://doi.org/10.7554/eLife.103545.1), which clears up questions of orthology and supports the analyses we did in this paper. In cases where orthology was ambiguous, this also helped us to be conservative in our conclusions here. The problems with ambiguous gene assignment are also discussed in lines 488-499.

      l. 88-93 is the first place (others are pp. 109-118 and 460-484) where a fuller description of the data used would be welcome. It's clear that the amount of data from different species varies enormously, not only in the number of alleles per locus, but also in the loci for which polymorphism data are available. In such a synthesis study, one would expect at least a tabulation of the data used in the appendices and perhaps a summary table in the main article.

      l. 109-118 Again, a more quantitative summary of the data used, with reference to a table, would be useful.

      Thank you! To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534). Supplementary Files listing the exact alleles and sequences used in each group are also included in the resubmission.

      l. 123-124 here you say that the definition of the "16 gene groups" is in the methods (probably pp. 471-484), but it would be useful to present an informative summary of your rationale in the introduction or here

      Thank you! We agree that it is helpful to outline these groups earlier. We have changed the paragraph in lines 123-135 from: 

      “We considered 16 gene groups and two or three different genic regions for each group: exon 2 alone, exon 3 alone, and/or exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. See the Methods for more detail on how gene groups were defined. Because few intron sequences were available for non-human species, we did not include them in our analyses.” To: 

      “We considered 16 gene groups spanning MHC classes and functions. These include the classical Class I genes (MHC-A-related, MHC-B-related, MHC-C-related), non-classical Class I genes (MHC-E-related, MHC-F-related, MHC-G-related), classical Class IIA genes (MHC-DRA-related, MHC-DQA-related, MHC-DPA-related), classical Class IIB genes (MHC-DRB-related, MHC-DQB-related, MHC-DPB-related), non-classical Class IIA genes (MHC-DMA-related, MHC-DOA-related, and non-classical Class IIB genes (MHC-DMB-related, MHC-DOB-related). We studied two or three different genic regions for each group: exon 2 alone, exon 3 alone, and (for Class I) exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. Because few intron sequences were available for non-human species, we did not include them in our analyses.”

      l. 100 "alleles" -> "allelic lineages"

      Thank you for catching this. We have changed this language in line 104.

      l. 227-238 it's important to discuss the possible effect of the number of sequences available on the detectability of TSP - this is particularly important as the properties of MHC genealogies may differ considerably from those expected for neutral genealogies.

      This is a good point that may not be obvious to readers. We have added several sentences to clarify this:

      Line 193-194: “In a neutral genealogy, monophyly of each species' sequences is expected.”

      Line 213-219: “Note that the number of sequences available for comparison also affects the detectability of TSP. For example, if the only sequences available are from the same allelic lineage, they will coalesce more recently in the past than they would with alleles from a different lineage and would not show evidence for TSP. This means our method is well-suited to detect TSP when a diverse set of allele sequences are available, but it is conservative when there are few alleles to test. There were few available alleles for some non-classical genes, such as MHC-F, and some species, such as gibbon.”

      Line 244-246: “However, since there are fewer alleles available for the non-classical genes, we note that our method is likely to be conservative here.”

      l. 301 and 624-41 it's been difficult for me to understand the rationale behind using rates at mostly gap positions as the baseline and I'd be grateful for a more extensive explanation

      Normalizing the rates posed a difficult problem. We couldn’t include every single sequence in the same alignment because BEAST’s computational needs scale with the number of sequences. Therefore, we had to run BEAST separately on smaller alignments focused on a single group of genes at a time. We still wanted to be able to compare evolutionary rates across genes, but because of the way SubstBMA is implemented, evolutionary rates are relative, not absolute. Recall that to help us compare the trees, we included a common set of “backbone” sequences in all of the 16 alignments. This set included some highly-diverged genes. Initially, we planned to use 4-fold degenerate sites as the baseline sites for normalization, but there simply weren’t enough of them once we included the “backbone” set on top of the already highly diverse set of sequences in each alignment. This diversity presented an opportunity.  In BEAST, gaps are treated as missing and do not contribute any probability to the relevant branch or site (https://groups.google.com/g/beast-users/c/ixrGUA1p4OM/m/P4R2fCDWMUoJ?pli=1). So, we figured that sites that were “mostly gap” (a gap in all the human backbone sequences but with an insertion in some sequence) were mostly not contributing to the inference of the phylogeny or evolutionary rates. Because the “backbone” sequences are common to all alignments, making the “mostly gap” sites somewhat comparable across sets while not affecting inferred rates, we figured they would be a reasonable choice for the normalization (for lack of a better option).

      We added text to lines 680 and 691-693 to clarify this rationale.

      l. 380-84 this overview seems rather superficial. Would it be possible to provide a more quantitative summary?

      To make this more quantitative, we plotted the number of associations for each amino acid against evolutionary rate, shown in Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the template does not allow). This reveals a significant positive slope for the Class I genes, but not for Class II. We also added explanatory text for this figure in lines 400-404.

      Discussion - your approach to detecting TSP is elegant but deserves discussion of its limitations and, in particular, a clear explanation of why detecting TSP rather than quantifying its extent is more important in the context of this work. Another important point for discussion is alternative explanations for the patterns of TSP or, more broadly, gene tree - species tree discordance. Although long-term maintenance of allelic lineages due to long-term balancing selection is probably the most convincing explanation for the observed TSP, interspecific introgression and incorrect orthology assessment may also have contributed, and it would be good to see what the authors think about the potential contribution of these two factors.

      Overall, our goal was to use modern statistical methods and data to more confidently assess how ancient the TSP is at each gene. We have added several lines of text (as noted elsewhere in this document) to more clearly illustrate the limitations of our approach. We also agree that interspecific introgression and incorrect orthology assessment can cause similar patterns to arise. We attempted to minimize the effect of incorrect orthology assessment by creating multi-gene trees and exploring reference primate genomes, as described in our companion paper (https://doi.org/10.7554/eLife.103545.1), but cannot eliminate it completely. We have added a paragraph to the discussion to address this (lines 488-499). Interspecific introgression could also cause gene tree-species tree discordance, but we are not sure about how systematic this would have to be to cause the overall patterns we observe, nor about how likely it would have been for various clades of primates across the world.

      l. 421 -424 A more nuanced discussion distinguishing between positive selection, which facilitates the establishment of a mutation, and directional selection, which leads to its fixation, would be useful here.

      We added clarification to this sentence (line 443-445), from “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate.” to “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate, generating ample mutations upon which selection may act.”

      l. 432-434 You write here about the shaping of TCR repertoires, but I couldn't find any such information in the paper, including Table 1.

      We did not include a separate column for these, so they can be hard to spot. They take the form of “TCR 𝛽 Interaction Probability >50%”, “TCR Expression (TRAV38-1)”, or “TCR 𝛼 Interaction Probability >50%” and can be found in Table 1.

      l. 436-442 Here a more detailed discussion in the context of divergent allelic advantage and even the evolution of new S-type specificities in plants would be valuable.

      We added an additional citation to a review article to this sentence (lines 438-439).  

      l. 443 The use of the word "training" here is confusing, suggesting some kind of "education" during the lifetime of the animal.

      We agree that “train” is not an entirely appropriate term, and have changed it to “evolve” (line 465).

      489-491 What data were used for these calculations?

      Apologies for missing this citation! We used the 1000 genomes project data, and the citation has been updated (line 541-542).

    1. Author response:

      Reviewer 1:

      Concern 1: Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      We thank you for your recommendations. As rearranging figures is not critical to convey the data, we have decided to keep the figures and supplemental figures as they are currently presented.

      Concern 2a: It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.

      This data is published in a previous manuscript from our group. Please see Figure 1 in “Layilin Anchors Regulatory T Cells in Skin” (PMID: 34470859)

      Concern 2b: We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.

      Given that we have already shown that layilin plays a major role in Treg and CD8+ T cell adhesion in tissues, we used a candidate approach for our GSEA. We tested the hypothesis that adhesion and motility pathways are enriched in Layilin-expressing Tregs. There was a statistically significant enrichment for these genes in Layilin+ Tregs compared to Layilin- Tregs, which we feel adequately tests our hypothesis.

      Concern 2c: For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.

      We respect this concern. We omit these secondary to space constraints.

      Concern 2d: For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      We respectfully disagree. Three doners were used in a paired fashion (internally controlled) achieving statistical significance.

      Concern 3: For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

      We agree. However, as the reviewer points out, these experiments are not logistically and practically feasible at this point. We do perform several experiments in this manuscript in which layilin is reduced via gene editing with results supporting our hypotheses.

      Reviewer 2:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

      We have been transparent with all our methods and data. We will leave this to the reader to determine level of rigor and the robustness of the data.

      Reviewer 3:

      Weaknesses:

      It is not clear that the assays used for functional analysis of the patient samples were optimal. (2) Several conclusions are not fully substantiated. (3) The report is lacking some experimental details.

      We have tried to be as comprehensive and thorough as possible. We feel that the data supports our conclusions. We will leave this to the reader to interpret and conclude.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Weaknesses:

      The robust data showing the quality of this model at the transcriptomic level can be strengthened with confirmation at protein and functional levels. The authors were unable to investigate the effects of Adar1-KO using ER-Hoxb8 cells and instead had to rely on a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H). Additionally, ER-Hoxb8-derived microglia do not express Sall1, a key marker of microglia, which limits their fidelity as a full microglial replacement, as has been rightfully pointed out in the discussion.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. While Adar1-KO macrophages do not engraft well, the success of TLR4-KO line highlights the model's potential for investigating other genes. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

      Thank you for this thoughtful and balanced assessment. The major suggestion from Reviewer 1 was that confirmation of RNAseq data with protein or functional studies would add strength.  We provided protein staining by IHC for IBA1 in vivo, as well as protein staining by FACS for CD11B, CD45, and TMEM119 in vitro and in vivo.  For TLR4, we showed successful protein KO and blunted response to LPS (a TLR4 ligand) challenge, which we believe provides some protein and functional data to support the approach.  To bolster these data, we added staining for P2RY12 on brain-engrafted ER-Hoxb8s.

      Regarding the Adar1 KO phenotypes showing non-engraftment. Because ADAR1 KO mice are embryonically lethal due to hematopoietic failure, we see the health impacts of Adar1 KO on ER-Hoxb8s as a strength of the transplantation model, enabling the assessment of ADAR1 global function in macrophages and microglia-like cells without generation of a transgenic mouse line. In addition, it was a surprise that the health impact occurs at the macrophage and not the progenitor stage, perhaps providing insight for future studies of ADAR1’s role in hematopoiesis. Instead, we were able to show a significant impact of complete loss of Adar1 on survival and engraftment, suggesting an important biological function of ADAR1. Macrophage-specific D1113H mutation, which affects part of the deaminase domain, shows that when the RNA deamination (but not the RNA binding) function of ADAR1 is disrupted, we find brain-wide interferonopathy. This is very exciting to our group and hopefully the community as astrocytes are thought to be a major driver of brain interferonopathy in patients with ADAR1 mutations. Instead, this suggests that disruption of brain macrophages is also a major contributor. 

      Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gained traction in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provided evidence to support: (1)

      Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to the Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competence and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an elegant study, demonstrating both the utility and limitations of ER-Hoxb8 technology as a surrogate model for microglia in vivo. The manuscript is well-designed and clearly written, but authors should consider the following suggestions:

      (1) Validation of RNA hits at the protein level: To strengthen the comparison between ER-Hoxb8 macrophages and WT bone marrow-derived macrophages, validating several RNA hits at the protein level would be beneficial. As many of these hits are surface markers, flow cytometry could be employed for confirmation (e.g., Figure 1D, Figure 3E).

      In vitro, we show protein levels by flow cytometry for CD11B (ITGAM) and CD45 (PTPRC; Figure 1C), as well as TMEM119 (Supplemental Figure 2A) and TLR4 (Supplemental Figure 3C/D). In vivo, we show TMEM119 protein levels by flow cytometry (Figure 3A), as well as their CD11B/CD45 pregates (Supplemental Figure 2C), plus immunostaining for IBA1 (AIF1; Figure 2D). We now provide additional data showing P2RY12 immunostaining in brain-engrafted cells (Supplemental Figure 2B). 

      (2) The authors should consider testing the phagocytic capacity of ER-Hoxb8-derived macrophages to further validate their functionality.

      Thank you for the suggestion. We measured ER-Hoxb8 macrophage ability to engulf phosphatidylserine-coated beads that mimic apoptotic cells, compared with phosphatidylcholine-coated beads, now as new Supplemental Figure 1C/D. This agrees with existing literature showing efficient engulfment/phagocytosis by ER-Hoxb8-derived cells (Elhag et al., 2021).

      (3) For Figure 3E, incorporating a wild-type (WT) microglia reference would be beneficial to establish a baseline for comparison (e.g. including WT microglia data in the graph or performing a ratio analysis against WT expression levels).

      We agree - we now include bars representing our sequenced primary microglia data in Figure 3E as a comparison.  

      (4) Some statistical analyses may require refinement. Specifically, for Figure 4J, where the effects of Adar1 KO and Adar1 KO with Bari are compared, it would be more appropriate to use a two-way ANOVA.

      Thank you for noting it. We have now done more appropriate two-way ANOVA and included the updated results in Figure 4J and the corresponding Supplemental Figure 4G. Errors in figure legend texts have also been corrected to reflect the statistical tests used.

      (5) Cx3cr1-creERT2 pups injected with tamoxifen: The authors could clarify the depletion ratio in these experiments before the engraftment and assess whether the depletion is global or regional. In comparison to Csf1r-/-, where TLR4-KO ER-Hoxb8 engraft globally, in Cx3cr1-creERT2, the engraftment seems more regional (Figure 5A vs Supplementary Figure 5B); is this due to the differences in depletion efficiency?

      This is an excellent question and observation, and one that we are very interested in, though that finding does not change the conclusions of this particular study.  We find some region-specific differences in depletion early after tamoxifen injection, but that all brain regions are >95% depleted by P7. For instance, in a recently published manuscript (Bastos et al., 2025) we find some differences in the depletion kinetics in the genetic model. By P3, we find 90% depletion in cortex with 50-60% in thalamus and hippocampus. In other studies, we typically deliver primary monocytes, and this is the first study where we report engraftment of ER-Hoxb8 cells in the inducible model.  In this sense, it is possible that depletion kinetics may regionally affect engraftment, but future studies are required to more finely assess this point with ER-Hoxb8s, as it may change how these models are used in the future.

      Bastos et al., Monocytes can efficiently replace all brain macrophages and fetal liver monocytes can generate bonafide SALL1+ microglia, Immunity (2025), https://doi.org/10.1016/j.immuni.2025.04.006

      (6) It would be helpful for the authors to clarify whether Adar1 is predominantly expressed by microglia, especially since the study aims to show its role in dampening the interferon response.

      That’s a wonderful point. Adar1 is expressed by all brain cells, with highest transcript level in some neurons, astrocytes, and oligodendrocytes. It is an interferon-stimulated gene, and mutation itself leads to interferonopathy, we believe, due to poor RNA editing and detection of endogenous RNA as non-self by MDA5. We hope it can dampen the interferon response, but in the case of mutation, Adar1 is probably causal of interferonopathy.  It is induced in microglia upon systemic inflammatory challenge (LPS). We have edited the text to highlight its expression pattern.  See BrainRNAseq.org (Zhang*, Chen*, Sloan*, et al., 2014 and Bennett et al., 2016)

      Reviewer #2 (Recommendations for the authors):

      (1) There appears to be a morphological difference between wt and Adar1/Ifih1 double KO (dKO) cells in the engrafted brains (Figure 5). It would be good if the authors could systematically compare the morphology (e.g., soma size, number, and length of branches) of the engrafted MLCs between the wt and mutant cells.

      We agree. While cells did not differ in branch number or length, engrafted dKO cells had significantly larger somas compared with controls, which we now present in Figure S5A.

      (2) To fully appreciate the extent of how those engrafted ER-Hoxb8 immortalized macrophages resemble primary, engrafted yolk sac-myeloid cells, vs engrafted iPSC-induced microglia, it would be informative to provide a comparison of their RNAseq data derived from the engrafted ER-Hoxb8 immortalized macrophages with published data transcriptomic data sets (e.g. Bennett et al. Neuron 2018; Chadarevian et al. Neuron 2024; Schafer et al. Cell 2023).

      Thank you for this suggestion. To address this, we provide our full dataset for additional experiments. To compare with a similar non-immortalized model, we compared top up- and down-regulated genes from our data to those of ICT yolk sac progenitor cells from our previous work (Bennett et al., 2018). We find overlap between brain-engrafted ER-Hoxb8-, bone marrow-, and yolk sac-derived cells (Supplemental Figure 2F, Supplemental Table 3).  

      Minor comments:

      Figure 6C: red arrow showing zoom in regions are not matchable. It might be beneficial to provide bigger images with each channel for C and D as a Supplemental Figure.

      We fixed this in Figure 6C to show areas of interest in the cortex for both conditions. Figure S7A shows intermediate power images to aid in interpretation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rühling et al analyzes the mode of entry of S. aureus into mammalian cells in culture. The authors propose a novel mechanism of rapid entry that involves the release of calcium from lysosomes via NAADP-stimulated activation of TPC1, which in turn causes lysosomal exocytosis; exocytic release of lysosomal acid sphingomyelinase (ASM) is then envisaged to convert exofacial sphingomyelin to ceramide. These events not only induce the rapid entry of the bacteria into the host cells but are also described to alter the fate of the intracellular S. aureus, facilitating escape from the endocytic vacuole to the cytosol.

      Strengths:

      The proposed mechanism is novel and could have important biological consequences.

      Weaknesses:

      Unfortunately, the evidence provided is unconvincing and insufficient to document the multiple, complex steps suggested. In fact, there appear to be numerous internal inconsistencies that detract from the validity of the conclusions, which were reached mostly based on the use of pharmacological agents of imperfect specificity.

      We thank the reviewer for the detailed evaluation of our manuscript. We will address the criticism below.

      We agree with the reviewer that many of the experiments presented in our study rely on the usage of inhibitors. However, we want to emphasize that the main conclusion (invasion pathway affects the intracellular fate/phagosomal escape) was demonstrated without the use of inhibitors or genetic ablation in two key experiments (Figure5 D/E). These experiments were in line with the results we obtained with inhibitors (amitriptyline [Figure 4D], ARC39, PCK310, [Figure 4C] and Vacuolin-1 [Figure4E]). Importantly, the hypothesis was also supported by another key experiment, in which we showed the intracellular fate of bacteria is affected by removal of SM from the plasma membrane before invasion, but not by removal of SM from phagosomal membranes after bacteria internalization (Figure5A-C). Taken together, we thus believe that the main hypothesis is strongly supported by our data.

      Moreover, we either used different inhibitors for the same molecule (ASM was inhibited by ARC39, amitriptyline and PCK310 with similar outcome) or supported our hypothesis with gene-ablated cell pools (TPC1, Syt7, SARM1), as we will point out in more detail below.

      Firstly, the release of calcium from lysosomes is not demonstrated. Localized changes in the immediate vicinity of lysosomes need to be measured to ascertain that these organelles are the source of cytosolic calcium changes. In fact, 9-phenantrol, which the authors find to be the most potent inhibitor of invasion and hence of the putative calcium changes, is not a blocker of lysosomal calcium release but instead blocks plasmalemmal TRPM4 channels. On the other hand, invasion is seemingly independent of external calcium. These findings are inconsistent with each other and point to non-specific effects of 9-phenantrol. The fact that ionomycin decreases invasion efficiency is taken as additional evidence of the importance of lysosomal calcium release. It is not clear how these observations support involvement of lysosomal calcium release and exocytosis; in fact treatment with the ionophore should itself have induced lysosomal exocytosis and stimulated, rather than inhibited invasion. Yet, manipulations that increase and others that decrease cytosolic calcium both inhibited invasion.

      With respect to lysosomal Ca<sup>2<sup>+</sup></sup> release, we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection will improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.1 The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the main manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      Author response image 1.

      Lysosomal Ca<sup>2<sup>+</sup></sup> imaging during S. aureus infection. The lysosomes of HuLEC were stained with two dextran-coupled fluorescent dyes. A Ca<sup>2<sup>+</sup></sup>-sensitive dye Rhod-2 as well as Ca<sup>2<sup>+</sup></sup>insensitive AF647. Cells were infected with fluorescent S. aureus JE2 and monitored by live cell imaging (see Author response video 1). The intensity of Rhod-2/AF647 was measured close to a S. aureus-host contact site. Ratio of Rhod-2 vs. AF647 fluorescence intensity was calculated

      As to the TRPM4 involvement in S. aureus host cell internalization, it has been reported that TRPM4 is activated by cytosolic Ca<sup>2<sup>+</sup></sup>. However, the channel conducts monovalent cations such as K<sup>+</sup> or Na<sup>+</sup> but is impermeable for Ca<sup>2<sup>+</sup></sup> [2, 3]. The following of our observations are supporting this:

      i) S. aureus invasion is dependent on intracellular Ca<sup>2<sup>+</sup></sup>, but is independent from extracellular Ca<sup>2<sup>+</sup></sup>  (Figure 1A).

      ii) 9-phenantrol treatment reduces S. aureus internalization by host cells, illustrating the dependence of this process on TRPM4 (data removed from the manuscript) . We therefore hypothesize that TRPM4 is activated by Ca<sup>2<sup>+</sup></sup> released from lysosomes (see above).

      TRPM4 is localized to focal adhesions and is connected to actin cytoskeleton[4, 5] – a requisite of host cell entry of S. aureus.[6, 7] This speaks for an important function of TRPM4 in uptake of S. aureus in general, but does not necessarily have to be involved exclusively in the rapid uptake pathway.

      TRPM4 itself is not permeable for Ca<sup>2<sup>+</sup></sup> but is activated by the cation.  Thus, it is unlikely to cause lysosomal exocytosis. The stronger bacterial uptake reduction by treatment with 9-phenantrol when compared to Ned19 thus may be caused by the involvement of TRPM4 in additional pathways of S. aureus host cell entry involving that association of TRPM4 with focal adhesions or as pointed out by the reviewer, unspecific side effects of 9-phenantrol that we currently cannot exclude.  However, we think that experiments with 9-phenantrol distract from the main story (lysosomal Ca<sup>2<sup>+</sup></sup> and exocytosis) and might be confusing for the reader. We thus removed all data and discussion concerning 9phenantrol in the revised manuscript.

      Regarding the reduced S. aureus invasion after ionomycin treatment, we agree with the reviewer that ionomycin is known to lead to lysosomal exocytosis as was previously shown by others8 as well as our laboratory[9}. 

      We hypothesized that pretreatment with ionomycin would trigger lysosomal exocytosis and thus would reduce the pool of lysosomes that can undergo exocytosis before host cells are contacted by S. aureus. As a result, we should observe a marked reduction of S. aureus internalization in such “lysosome-depleted cells”, if the lysosomal exocytosis is coupled to bacterial uptake. Our observation of reduced bacterial internalization after ionomycin treatment supports this hypothesis.

      However, ionomycin treatment and S. aureus infection of host cells are distinct processes.  

      While ionomycin results in strong global and non-directional lysosomal exocytosis of all “releasable” lysosomes (~5-10 % of all lysosomes according to previous observations)8, we hypothesize that lysosomal exocytosis upon contact with S. aureus only involves a small proportion of lysosomes at host-bacteria contact sites. This is supported by experiments that demonstrate that ~30% of the lysosomes that are released by ionomycin treatment are exocytosed during S. aureus infection (see below and Figure 2, A-C). We added this new data as well as an according section to the discussion  (line 563 ff). Moreover, we moved the data obtained with ionomycin to Figure 2E and described our idea behind this experiment more precisely (line 166 ff).

      The proposed role of NAADP is based on the effects of "knocking out" TPC1 and on the pharmacological effects of Ned-19. It is noteworthy that TPC2, rather than TPC1, is generally believed to be the primary TPC isoform of lysosomes. Moreover, the gene ablation accomplished in the TPC1 "knockouts" is only partial and rather unsatisfactory. Definitive conclusions about the role of TPC1 can only be reached with proper, full knockouts. Even the pharmacological approach is unconvincing because the high doses of Ned-19 used should have blocked both TPC isoforms and presumably precluded invasion. Instead, invasion is reduced by only ≈50%. A much greater inhibition was reported using 9-phenantrol, the blocker of plasmalemmal calcium channels. How is the selective involvement of lysosomal TPC1 channels justified?

      As to partial gene ablation of TPC1: To avoid clonal variances, we usually perform pool sorting to obtain a cell population that predominantly contains cells -here- deficient in TPC1, but also a small proportion of wildtype cells as seen by the residual TPC1 protein on the Western blot. We observe a significant reduction in bacterial uptake in this cell pool suggesting that the uptake reduction in a pure K.O. population may be even more pronounced. 

      As to the inhibition by Ned19: 

      The scale of invasion reduction upon Ned19 treatment (50%, Figure 1B) is comparable with the reduction caused by other compounds that influence the ASM-dependent pathway (such as amitriptyline, ARC39 [Figure 2G], BAPTA-AM [Figure 1A], Vacuolin-1 [Figure 2D], β-toxin [Figure 2L] and ionomycin [Figure 2E]). Further, the partial reduction of invasion is most likely due to the concurrent activity of multiple internalization pathways which are not all targeted by the used compounds and which we briefly discuss in the manuscript.

      We agree with the reviewer that Ned19 inhibits TPC1 and TPC2. Since ablation of TPC1 reduced invasion of S. aureus, we concluded that TPC1 is important for S. aureus host cell invasion. We thus agree with the reviewer that a role for TPC2 cannot be excluded. We clarified this in the revised manuscript (Lines 552). It needs to be noted, however, that deficiency in either TPC1 or TPC2 alone was sufficient to prevent Ebola virus infection10, which is in line with our observations.

      In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these cell lines supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry and a role for both TPC channels (Author response image 2, see end of the document). Since we did not have a single TPCN2 knock-out available we decided to exclude these data from the main manuscript.

      Author response image 2.

      Invasion efficiency is reduced in TPC1/TPC2 double K.O. HeLa cells. Invasion efficiency of S. aureus JE2 was determined in TPC1/TPC2 double K.O. cells after 10 and 30 min. Results were normalized to the parental HeLa WT cell line (set to 100 %).  

      Invoking an elevation of NAADP as the mediator of calcium release requires measurements of the changes in NAADP concentration in response to the bacteria. This was not performed. Instead, the authors analyzed the possible contribution of putative NAADP-generating systems and reported that the most active of these, CD38, was without effect, while the elimination of SARM1, another potential source of NAADP, had a very modest (≈20%) inhibitory effect that may have been due to clonal variation, which was not ruled out. In view of these data, the conclusion that NAADP is involved in the invasion process seems unwarranted.

      Our results from two independent experimental set-ups (Ned19 [Figure 1B] and TPC1 K.O. [Figure 1C & Figure 2N]) indicate the involvement of NAADP in the process. Together with the metabolomics unit at the Biocenter Würzburg, we attempted to measure cellular NAADP levels, however, this proved to be non-trivial and requires further optimization. However, we can rule out clonal variation in the SARM1 mutant since experiments were conducted with a cell pool as described above in order to avoid clonal variation of single clones.

      The mechanism behind biosynthesis of NAADP is still debated. CD38 was the first enzyme discovered to possess the ability of producing NAADP. However, it requires acidic pH to produce NAADP[11] -which does not match the characteristics of a cytosolic NAADP producer. HeLa cells do not express CD38 and hence, it is not surprising that inhibition of CD38 had no effect on S. aureus invasion in HeLa cells. However, NAADP production by HeLa cells was observed in absence of CD38[12]. Thus CD38independent NAADP generation is likely. SARM1 can produce NAADP at neutral pH[13] and is expressed in HeLa, thus providing a more promising candidate.  

      We agree with the reviewer that the reduction of S. aureus internalization after ablation of SARM1 is less pronounced than in other experiments of ours. This may be explained by NAADP originating from other enzymes, such as the recently discovered DUOX1, DUOX2, NOX1 and NOX2[14], which – with exception of DUOX2- possess a low expression even in HeLa cells. We add this to the discussion in the revised manuscript (line 579).

      We can, however, rule out clonal variation for the inhibitory effect. As stated above we generated K.O. cell pools specifically to avoid inherent problems of clonality. Thus, we also detect some residual wildtype cells within our cell pools.  

      The involvement of lysosomal secretion is, again, predicated largely on the basis of pharmacological evidence. No direct evidence is provided for the insertion of lysosomal components into the plasma membrane, or for the release of lysosomal contents to the medium. Instead, inhibition of lysosomal exocytosis by vacuolin-1 is the sole source of evidence. However, vacuolin-1 is by no means a specific inhibitor of lysosomal secretion: it is now known to act primarily as a PIKfyve inhibitor and to cause massive distortion of the endocytic compartment, including gross swelling of endolysosomes. The modest (20-25%) inhibition observed when using synaptotagmin 7 knockout cells is similarly not convincing proof of the requirement for lysosomal secretion.

      We agree with the reviewer that the manuscript will benefit from a functional analysis of lysosomal exocytosis and therefore conducted assays to investigate exocytosis in the revised manuscript. We previously showed i) by addition of specific antisera that LAMP1 transiently is exposed on the plasma membrane during ionomycin and pore-forming toxin challenge and ii) demonstrated the release of ASM activity into the culture medium under these conditions.[9] However, both measurements are not compatible with S. aureus infection, since LAMP1 antibodies also are non-specifically bound by protein A and another IgG-binding proteins on the S. aureus surface, which would bias the results. Since protein A also may serve as an adhesin in the investigated pathway, we cannot simply delete the ORF without changing other aspects of staphylococcal virulence. Further, FBS contains a ASM background activity that impedes activity measurements of cell culture medium. We previously removed this background activity by a specific heat-inactivation protocol.[9] However, S. aureus invasion is strongly reduced in culture medium containing this heat-inactivated FBS.

      We therefore developed a luminescence assay based on split NanoLuc luciferase that enables detection of LAMP1 exposed on the plasma membrane without usage of antibodies (Figure 2, A-C). We added a section on the assay in the revised manuscript. Briefly, we generated reporter cells by fusing a short peptide fragment of NanoLuc called HiBiT between the signal peptide and the mature luminal domain of LAMP1 and stably expressed the resulting protein in HeLa cells by lentiviral transduction. The LgBiT protein domain of NanoLuc luciferase (Promega) as well as the substrate Furimazine are added to the culture medium. HiBiT can reconstitute a functional NanoLuc with LgBiT and process Furimazine when lysosomes are exocytosed thereby generating luminescence measurable in a suitable plate reader. 

      With this assay we detected that  about 30% of lysosomes that were “releasable” by treatment with ionomycin are exocytosed during S. aureus infection. Lysosomal exocytosis was strongly reduced (even below the levels of untreated controls), if we treated cells with Vacuolin-1 or Ned19.  

      We agree with the reviewer that Vacuolin-1 to some extent has unspecific side effects as has been shown by others and which we addressed in the revised version of the manuscript (line 541 ff). However, our new results with the HiBiT reporter cell line clearly demonstrate a reduction of lysosomal exocytosis after Vacuolin-1 treatment. Supported by this and our other results we hypothesize that Vacuolin-1 decreases S. aureus internalization due to the inhibition of lysosomal exocytosis.

      As to the involvement of synaptotagmin 7: The effect of Syt7 K.O. on invasion was moderate in initial experiments, likely due to a high culture passage and presumably overgrowth of WT cells. However, reduction of invasion in Syt7 K.O.s was more pronounced in experiments with β-toxin complementation (Figure 2, N) and hence, we combined the two data sets (Figure 2, F). This demonstrates the reduction of bacterial invasion by ~40% in Syt7 K.O. cell pools. Moreover, Syt7 is not the only protein possibly involved in Ca<sup>2<sup>+</sup></sup>-dependent exocytosis. For instance, Syt1 has been shown to possess an overlapping function.[15] This may explain the differences between our Vacuolin-1 and Syt7 ablation experiments. We added this information to the discussion. 

      ASM is proposed to play a central role in the rapid invasion process. As above, most of the evidence offered in this regard is pharmacological and often inconsistent between inhibitors or among cell types. Some drugs affect some of the cells, but not others. It is difficult to reach general conclusions regarding the role of ASM. The argument is made even more complex by the authors' use of exogenous sphingomyelinase (beta-toxin). Pretreatment with the toxin decreased invasion efficiency, a seemingly paradoxical result. Incidentally, the effectiveness of the added toxin is never quantified/validated by directly measuring the generation of ceramide or the disappearance of SM.

      Although pharmacological inhibitors can have unspecific side effects, we want to emphasize that the inhibitors used in our study act on the enzyme ASM by completely different mechanisms. Amitriptyline is a so called functional inhibitor of ASM (FIASMA) which induces the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16] By contrast, ARC39 is a competitive inhibitor.[17, 18] 

      There are no inconsistencies in our data obtained with ASM inhibitors. Amitriptyline and ARC39 both reduce the invasion of S. aureus in HuLEC, HuVEC and HeLa cells (Figure 2G). ARC39 needs a longer pre-incubation, since its uptake by host cells is slower (to be published elsewhere). We observe a different outcome in 16HBE14o- and Ea.Hy 926 cells, with 16HBE14o- even demonstrating a slightly increased invasion of S. aureus upon ARC39 treatment. Amitriptyline had no effect (Figure 2G). 

      Thus, the ASM-dependent S. aureus internalization is cell type/line specific, which we state in the manuscript. The molecular origin of these differences is unclear and will require further investigation, e.g. in testing cell lines for potential differences in surface receptors. In a separate study we have already developed a biotinylation-based approach to identify potential novel host cell surface interaction partners during S. aureus infection.[19]

      Moreover, both inhibitors affected the invasion dynamics (Figure 3D), phagosomal escape (Figure 4C and Figure 4D) and Rab7 recruitment (Figure 4A and Supp. Figure 4A-C) in a similar fashion. Proper inhibition of ASM by both compounds in all cell lines used was validated by enzyme assays (Supp. Figure 2H), which again suggests that the ASM-dependent pathway does only exist in specific cell lines and also supports  that we do not observe unspecific side effects of the compounds. We clarified this in the revised manuscript.

      ASM is a key player for SM degradation and recycling. In clinical context, deficiency in ASM results in the so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered[20], which will result in severe side effects. Short-term inhibition by small molecules therefore poses a clear benefit when compared to the usage of ASM K.O. cells. In order to satisfy the query of the reviewer, we generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested these for S. aureus invasion efficiency (Figure 2, I). We did not observe bacterial invasion differences between WT and K.O. cells. However, when we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in inhibitor-treated WT cells  predominantly is due to absence of ASM, while the small reduction observed in ARC39treated ASM K.O.s is likely due to unspecific side effects.  

      We performed lipidomics on these cells and demonstrated a strongly altered sphingolipid profile in ASM K.O. cells compared to untreated and inhibitor-treated WT cells (Figure 2, K). We speculate that other ASM-independent bacterial invasion pathways are upregulated in ASM K.O.s., thereby obscuring the effect contributed by absence of ASM. We discussed this in the revised manuscript (line 518 ff).

      Moreover, we introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I.  The latter strain is non-cytotoxic and serves as negative control, since it is known to possess a very low escape rate, due to its inability to produce toxin. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As observed  for JE2, “early invaders” possess lower escape rates than “early<sup>+</sup>late invaders”.

      We did not observe differences between WT and ASM K.O. cells, if we infected for only 10 min. By contrast, we observed a lower escape rate in ASM K.O (Author response image 3, see end of the document). compared to WT cells, when we infected for 30 min.  

      However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). Reduced phagosomal escape of intracellular S. aureus in ASM K.O. cells may be caused by the altered sphingolipid profile(e.g., by interference with binding of bacterial toxins to phagosomal membranes or altered vesicular acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Author response image 3.

      Phagosomal escape rates were established in either HeLa wild-type or ASM K.O. cells expressing the phagosomal escape reporter RFP-CWT. Host cells that were infected with the cytotoxic S. aureus strain JE2 or the non-cytotoxic strain Cowan I for 10 or 30 minutes and escape rates were determined by microscopy 3h p.i.

      As to the treatment with a bacterial sphingomyelinase:

      Treatment with the bacterial SMase (bSMase, here: β-toxin) was performed in two different ways:

      i) Pretreatment of host cells with β-toxin to remove SM from the host cell surface before infection. This removes the substrate of ASM from the cell surface prior to addition of the bacteria (Figure 2L, Figure 4A-C). Since SM is not present on the extracellular plasma membrane leaflet after treatment, a release of ASM cannot cause localized ceramide formation at the sites of lysosomal exocytosis. Similar observations were made by others.[21] 

      ii) Addition of bSMase to host cells together with the bacteria to complement for the absence of ASM (Figure 2N).  

      Removal of the ASM substrate before infection (i) prevents localized ASM-mediated conversion of SM to Cer during infection and resulted in a decreased invasion, while addition of the SMase during infection resulted in an increased invasion in TPC1 and Syt7 ablated cells. Thus, both experiments are consistent with each other and in line with our other observations. 

      Removal of SM from the plasma membrane by β-toxin was indirectly demonstrated by the absence of Lysenin recruitment to phagosomes/escaped bacteria when host cells were pretreatment with the toxin before infection (Figure5C). We also added another data set that demonstrates degradation of a fluorescence SM derivative upon β-toxin treatment of host cells (Supp Figure 2, M). In another publication, we recently quantified the effectiveness of β-toxin treatment, even though with slightly longer treatment times (75 min vs. 3h).[22]

      To clarify our experimental approaches to the readership we added an explanatory section to the revised manuscript (line 287 ff) and we also added a scheme to in Figure 2M describing the experimental settings.

      As to the general conclusions regarding the role of ASM: ASM and lysosomal exocytosis has been shown to be involved in uptake of a variety of pathogens[21, 23-27] supporting its role in the process.

      The use of fluorescent analogs of sphingomyelin and ceramide is not well justified and it is unclear what conclusions can be derived from these observations. Despite the low resolution of the images provided, it appears as if the labeled lipids are largely in endomembrane compartments, where they would presumably be inaccessible to the secreted ASM. Moreover, considering the location of the BODIPY probe, the authors would be unable to distinguish intact sphingomyelin from its breakdown product, ceramide. What can be concluded from these experiments? Incidentally, the authors report only 10% of BODIPY-positive events after 10 min. What are the implications of this finding? That 90% of the invasion events are unrelated to sphingomyelin, ASM, and ceramide?

      During the experiments with fluorescent SM analogues (Figure 3a,b), S. aureus was added to the samples immediately before the start of video recording. Hence, bacteria are slowly trickling onto the host cells, and we thus can image the initial contact between them and the bacteria, for instance, the bacteria depicted in Figure 3A contact the host cell about 9 min before becoming BODIPY-FL-positive (see Supp. Video 1, 55 min). Hence, in these cases we see the formation of phagosomes around bacteria rather than bacteria in endomembrane compartments. Since generation of phagosomes happens at the plasma membrane, SM is accessible to secreted ASM.  

      The “trickling” approach for infection is an experimental difference to our invasion measurements, in which we synchronized the infection by  centrifugation. This ensures that all bacteria have contact to host cells and are not just floating in the culture medium. However, live cell imaging of initial bacterialhost contact and synchronization of infection is hard to combine technically.

      In our invasion measurements -with synchronization-, we typically see internalization of ~20% of all added bacteria after 30 min. Hence, most bacteria that are visible in our videos likely are still extracellular and only a small proportion was internalized. This explains why only 10% of total bacteria are positive for BODIPY-FL-SM after 10 min. The proportion of internalized bacteria that are positive for BODIPY-FL-SM should be way higher but cannot be determined with this method.

      We agree with the reviewer that we cannot observe conversion of BODIPY-FL-SM by ASM. In order to do that, we attempted to visualize the conversion of a visible-range SM FRET probe (Supp. Figure 3), but the structure of the probe is not compatible with measurement of conversion on the plasma membrane, since the FITC fluorophore released into the culture medium by the ASM activity thereby gets lost for imaging. In general, the visualization of SM conversion with subcellular resolution is challenging and even with novel tools developed in our lab[28] visualization of SM on the plasma membrane is difficult. 

      The conclusions we draw from these experiments are that i.) S. aureus invasion is associated with SM and ii.) SM-associated invasion can be very fast, since bacteria are rapidly engulfed by BODIPY-FL-SM containing membranes.

      It is also unclear how the authors can distinguish lysenin entry into ruptured vacuoles from the entry of RFP-CWT, used as a criterion of bacterial escape. Surely the molecular weights of the probes are not sufficiently different to prevent the latter one from traversing the permeabilized membrane until such time that the bacteria escape from the vacuole.

      We here want to clarify that both Lysenin as well as the CWT reporter have access to ruptured vacuoles (Figure 4B). We used the Lysenin reporter in these experiments for estimation of SM content of phagosomal membranes. If a vacuole is ruptured, both the bacteria and the luminal leaflet of the phagosomal membrane remnants get in contact with the cytosol and hence with the cytosolically expressed reporters YFP-Lysenin as well as RFP-CWT resulting in “Lysenin-positive escape” when phagosomes contained SM (see Figure 5C). By contrast, either β-toxin expression by S. aureus or pretreatment with the bSMase resulted in absence of Lysenin recruitment suggesting that the phagosomal SM levels were decreased/undetectable (Figure 5C, Supp Figure 6F, G, I, J).

      Although this approach does not enable a quantitative measurement of phagosomal SM, this method is sufficient to show that β-toxin expression and pretreatment result in markedly decreased phagosomal SM levels in the host cells.

      The approach we used here to analyze “Lysenin-positive escape” can clearly be distinguished from Lysenin-based methods that were used by others.29 There Lysenin was used to show trans-bilayer movement of SM before rupture of bacteria-containing phagosomes.

      To clarify the function of Lysenin in our approach we added  additional figures (Figure 4F, Supp. Figure 5) and a movie (Supp. Video 4) to the revised manuscript.

      Both SMase inhibitors (Figure 4C) and SMase pretreatment increased bacterial escape from the vacuole. The former should prevent SM hydrolysis and formation of ceramide, while the latter treatment should have the exact opposite effects, yet the end result is the same. What can one conclude regarding the need and role of the SMase products in the escape process?

      As pointed out above, pretreatment of host cells with SMase removes SM from the plasma membrane and hence, ASM does not have access to its substrate. Hence, both treatment with either ASM inhibitors or pretreatment with bacterial SMase prevent ASM from being active on the plasma membrane and hence block the ASM-dependent uptake (Figure 2 G, L). Although overall less bacteria were internalized by host cells under these conditions, the bacteria that invaded host cells did so in an ASM-independent manner. 

      Since blockage of the ASM-dependent internalization pathway (with ASM inhibitor [Figure 4C, D], SMase pretreatment [Figure 5B] and Vacuolin-1[Figure.4E]) always resulted in enhanced phagosomal escape, we conclude that bacteria that were internalized in an ASM-independent fashion cause enhanced escape. Vice versa, bacteria that enter host cells in an ASM-dependent manner demonstrate lower escape rates. 

      This is supported by comparing the escape rates of “early” and “late” invaders [Figure 5D, E], which in our opinion is a key experiment that supports this hypothesis. The “early” invaders are predominantly ASM-dependent (see e.g. Figure 3E) and thus, bacteria that entered host cell in the first 10 min of infection should have been internalized predominantly in an ASM-dependent fashion, while slower entry pathways are active later during infection. The early ASM dependent invaders possessed lower escape rates, which is in line with the data obtained with inhibitors (e.g. Figure 4C, D).

      We hypothesize that the activity of ASM on the plasma membrane during invasion mediates the recruitment of a specific subset of receptors, which then influences downstream phagosomal maturation and escape. This hypothesis is supported by the fact that the subset of receptors interacting with S. aureus is altered upon inhibition of the ASM-dependent uptake pathway. We describe this in another study that is currently under evaluation elsewhere.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca<sup>2<sup>+</sup></sup> and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry.

      The evidence provided is solid, methods used are appropriate and results largely support their conclusions, but can be substantiated further as detailed below. The weakness is a reliance on chemical inhibitors that can be non-specific to delineate critical steps.

      Specific comments:

      A large number of experiments rely on treatment with chemical inhibitors. While this approach is reasonable, many of the inhibitors employed such as amitriptyline and vacuolin1 have other or nondefined cellular targets and pleiotropic effects cannot be ruled out. Given the centrality of ASM for the manuscript, it will be important to replicate some key results with ASM KO cells.

      We thank the reviewer for the critical evaluation of our manuscript and plenty of constructive comments. 

      We agree with the reviewer, that ASM inhibitors such as functional inhibitors of ASM (FIASMA) like amitriptyline used in our study have unspecific side effects given their mode-of-action. FIASMAs induce the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16]  However, we want to emphasize that we also used the competitive inhibitor ARC39 in our study[17, 18] which acts on the enzyme by a completely different mechanism. All phenotypes (reduced invasion [Figure 2G], effect on invasion dynamics [Figure 3D], enhanced escape [Figure 4C, D] and differential recruitment of Rab7 [Supp. Figure 4A-C]) were observed with both inhibitors thereby supporting the role of ASM in the process.  

      We further agree that experiments with genetic evidence usually support and improve scientific findings. However, ASM is a cellular key player for SM degradation and recycling. In a clinical context, deficiency in ASM results in a so-called Niemann Pick disease type A/B. The lipid profile of ASMdeficient cells is massively altered[20], which in itself will result in severe side effects. Thus, the usage of inhibitors provides a clear benefit when compared to ASM K.O. cells, since ASM activity can be targeted in a short-term fashion thereby preventing larger alterations in cellular lipid composition.

      We nevertheless generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested for invasion efficiency (Figure 2, I). Here, we did not observe differences between WT and mutants. However, if we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in WT cells upon inhibitor treatment predominantly is due to inhibition of ASM, whereas the small reduction observed in ARC39-treated ASM K.O.s is likely due to unspecific side effects. We also demonstrated a strongly altered sphingolipid profile in ASM K.O. cells when compared to untreated and inhibitor-treated WT cells (new Figure 2, K). We speculate that other ASM-independent invasion pathways are upregulated in ASM K.O.s., thereby making up for the absence of ASM. We discuss this in the revised manuscript (line 518 ff).

      We introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I (Author response image 3). The latter serves as negative control, since it is known to possess a very low escape rate, due to its inability of toxin production. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As seen before for JE2, early invaders possess lower escape rates than early<sup>+</sup>late invaders. We did not observe differences between WT and K.O. cells, if we infected for 10 min. By contrast, we observed a lower escape rate in ASM K.O. compared to WT cells, when we infected for 30 min. However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). We think that the reduced phagosomal escape in ASM K.O. is caused by the altered sphingolipid profile, which could have versatile effects (e.g., inference with binding of bacterial toxins to phagosomal membranes or changes in acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Most experiments are done in HeLa cells. Given the pathway is projected as generic, it will be important to further characterize cell type specificity for the process. Some evidence for a similar mechanism in other cell types S. aureus infects, perhaps phagocytic cell type, might be good. 

      Whenever possible we performed the experiments not only in HeLa but also in HuLECs. For example, we refer to experiments concerning the role of Ca<sup>2<sup>+</sup></sup> (Figure 1A/Supp.Figure1A), lysosomal Ca<sup>2<sup>+</sup></sup>/Ned19 (Figure1B/Supp Figure 1C), lysosomal exocytosis/Vacuolin-1 (Figure 2D/Supp. Figure2D), ASM/ARC39 and amitriptyline (Figure 2G), surface SM/β-toxin (Figure 2L/Supp. Figure 2L), analysis of invasion dynamics (complete Figure 3) and measurement of cell death during infection (Figure 6C<sup>+</sup>E, Supp. Figure 8A<sup>+</sup>B).

      HuLECs, however, are not really genetically amenable and hence we were not able to generate gene deletions in these cells and upon introduction of the fluorescence escape reporter the cells are not readily growing. 

      As to ASM involvement in phagocytic cells: a role for ASM during the uptake of S. aureus by macrophages was previously reported by others.[25] However, in professional phagocytes S. aureus does not escape from the phagosome and replicates within the phagosome.[30]

      I'm a little confused about the role of ASM on the surface. Presumably, it converts SM to ceramide, as the final model suggests. Overexpression of b-toxin results in the near complete absence of SM on phagosomes (having representative images will help appreciate this), but why is phagosomal SM detected at high levels in untreated conditions? If bacteria are engulfed by SM-containing membrane compartments, what role does ASM play on the surface? If surface SM is necessary for phagosomal escape within the cell, do the authors imply that ASM is tuning the surface SM levels to a certain optimal range? Alternatively, can there be additional roles for ASM on the cell surface? Can surface SM levels be visualized (for example, in Figure 4 E, F)?

      We initially hypothesized that we would detect higher phagosomal SM levels upon inhibition of ASM, since our model suggests SM cleavage by ASM on the host cell surface during bacterial cell entry. However, we did not detect any changes in our experiments (Supp. Figure 4F). We currently favor the following explanation: SM is the most abundant sphingolipid in human cells.[31] If peripheral lysosomes are exocytosed and thereby release ASM, only a localized and relative small proportion of SM may get converted to Cer, which most likely is below our detection limit. In addition, the detection of cytosolically exposed phagosomal SM by YFP-Lysenin is not quantitative and provides a “Yes or No” measurement. Hence, we think that the rather limited SM to Cer conversion in combination with the high abundance of SM in cellular membranes does not visibly affect the recruitment of the Lysenin reporter. 

      In our experiments that employ BODIPY-FL-SM (Figure 3a<sup>+</sup>b), we cannot distinguish between native SM and downstream metabolites such as Cer. Hence, again we cannot make any assumptions on the extent to which SM is converted on the surface during bacterial internalization. Although our laboratory recently used trifunctional sphingolipid analogs to analyze the SM to Cer conversion[22], the visualization of this process on the plasma membrane is currently still challenging.

      Overall, we hypothesize that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms. Subsequently, a certain subset of receptors may be recruited to these platforms and influence the uptake process. These platforms are supposed to be very small, which also would explain that we did not detect changes in Lysenin recruitment.

      Related to that, why is ASM activity on the cell surface important? Its role in non-infectious or other contexts can be discussed.

      ASM release by lysosomal exocytosis is implied in plasma membrane repair upon injury. We added a short description of the role of extracellular ASM in the introduction (line 35).

      If SM removal is so crucial for uptake, can exocytosis of lysosomes alone provide sufficient ASM for SM removal? How much or to what extent is lysosomal exocytosis enhanced by initial signaling events? Do the authors envisage the early events in their model happening in localized confines of the PM, this can be discussed.

      Ionomycin treatment led to a release of ~10 % of all lysosomes and also increased extracellular ASM activity.[8, 9] In the revised manuscript, we developed an assay to determine lysosomal exocytosis during S. aureus infection (Figure 2, A-C). We detected lysosomal exocytosis of ~30% when compared to ionomycin treatment  during infection. Since this is only a fraction of the “releasable lysosomes”, we assume that the effects (lysosomal Ca<sup>2<sup>+</sup></sup> liberation, lysosomal exocytosis and ASM activity) are very localized and take place only at host-pathogen contact sites (see also above). We discuss this in the revised manuscript (line 563 ff). To our knowledge it is currently unclear to which extent the released ASM affects surface SM levels. We attempted to visualize the local ASM activity on the cell surface by using a visible range FRET probe (Supp. Fig. 3). Cleavage of the probe by ASM on the surface leads to release of FITC into the cell culture medium, which does not contribute a measurable signal at the surface. 

      How are inhibitor doses determined? How efficient is the removal of extracellular bacteria at 10 min? It will be good to substantiate the cfu experiments for infectivity with imaging-based methods. Are the roles of TPC1 and TPC2 redundant? If so, why does silencing TPC1 alone result in a decrease in infectivity? For these and other assays, it would be better to show raw values for infectivity. Please show alterations in lysosomal Ca<sup>2<sup>+</sup></sup> at the doses of inhibitors indicated. Is lysosomal Ca<sup>2<sup>+</sup></sup> released upon S. aureus binding to the cell surface? Will be good to directly visualize this.

      Concerning the inhibitor concentrations, we either used values established in published studies or recommendations of the suppliers (e.g. 2-APB, Ned19, Vacuolin-1). For ASM inhibitors, we determined proper inhibition of ASM by activity assays. Concentrations of ionomycin resulting in Ca<sup>2<sup>+</sup></sup> influx and lysosomal exocytosis was determined in earlier studies of our lab.[9, 32] 

      As to the removal of bacteria at 10 min p.i.: Lysostaphin is very efficient for removal of extracellular S. aureus and sterilizes the tissue culture supernatant. It significantly lyses bacteria within a few minutes, as determined by turbidity assays.[33]

      As to imaging-based infectivity assays: We performed imaging-based invasion assays to show reduced invasion efficiency with two ASM inhibitors in the revised manuscript with similar results as obtained by CFU counts (Supp. Figure 2, J).

      Regarding the roles of TPC1 and TPC2: from our data we cannot conclude whether the roles of TPC1 and TPC2 are redundant. One could speculate that since blockage of TPC1 alone is sufficient to reduce internalization of bacteria, that both channels may have distinct roles. On the other hand, there might be a Ca<sup>2<sup>+</sup></sup> threshold in order to initiate lysosomal exocytosis that can only be attained if TPC1 and TPC2 are activated in parallel. Thus, our observations are in line with another study that shows reduced Ebola virus infection in absence of either TPC1 or TPC2.[34] In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these double KO cell lines even further supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry (Author response image 2, see end of the document). Since we did not have a single TPCN2 knockout available, we decided to exclude these data from the main manuscript.

      As to raw CFU counts: whereas the observed effects upon blocking the invasion of S. aureus are stable, the number of internalized bacteria varies between individual biological replicates, for instance, by differences in host cell fitness or growth differences in bacterial cultures, which are prepared freshly for each experiment.

      With respect to visualization of lysosomal Ca<sup>2<sup>+</sup></sup> release: we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection would improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.[1] The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the final manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      The precise identification of cytosolic vs phagosomal bacteria is not very easy to appreciate. The methods section indicates how this distinction is made, but how do the authors deal with partial overlaps and ambiguities generally associated with such analyses? Please show respective images.

      The number of events (individual bacteria) for the live cell imaging data should be clearly mentioned.

      We apologize for not having sufficiently explained the technology to detect escaped S. aureus. The cytosolic location of S. aureus is indicated by recruitment of RFP-CWT.[35] CWT is the cell wall targeting domain of lysostaphin, which efficiently binds to the pentaglycine cross bridge in the peptidoglycan of S. aureus. This reporter is exclusively and homogenously expressed in the host cytosol. Only upon rupture of phagoendosomal membranes, the reporter can be recruited to the cell wall of now cytosolically located bacteria. S. aureus mutants, for instance in the agr quorum sensing system, cannot break down the phagosomal membrane in non-professional phagocytes and thus stay unlabeled by the CWT-reporter.[35] We  include several images (Figure 4, F, Supp. Figure 5) /movies (Supp. Video 4) of escape events in the revised manuscript.  The bacteria numbers for live cell experiments are now shown in Supp. Figure 7.

      In the phagosome maturation experiments, what is the proportion of bacteria in Rab5 or Rab7 compartments at each time point? Will the decreased Rab7 association be accompanied by increased Rab5? Showing raw values and images will help appreciate such differences. Given the expertise and tools available in live cell imaging, can the authors trace Rab5 and Rab7 positive compartment times for the same bacteria?

      We included the proportion of Rab7-associated bacteria in the revised manuscript (Supp. Figure 4A and C) and also shortly mention these proportions in the text (line 353). Usually, we observe that Rab5 is only transiently (for a few minutes) present on phagosomes and only afterwards the phagosomes become positive for Rab7. We do not think that a decrease in Rab7-positive phagosomes would increase the proportion of Rab5-positive phagosomes. However, we cannot exclude this hypothesis with our data.

      We can achieve tracing of individual bacteria for recruitment of Rab5/Rab7 only manually, which impedes a quantitative evaluation. However, we included a Video (Supp. Video 3)  that illustrates the consecutive recruitment of the GTPases.

      The results with longer-term infection are interesting. Live cell imaging suggests that ASM-inhibited cells show accelerated phagosomal escape that reduces by 6 hpi. Where are the bacteria at this time point ? Presumably, they should have reached lysosomes. The relationship between cytosolic escape, replication, and host cell death is interesting, but the evidence, as presented is correlative for the populations. Given the use of live cell imaging, can the authors show these events in the same cell?

      We think that most bacteria-containing phagoendosomes should have fused with lysosomes 6 h p.i. as we have previously shown by acidification to pH of 5 and LAMP1 decoration.[36]

      The correlation between phagosomal escape and replication in the cytosol of non-professional phagocytes has been observed by us and others. In the revised manuscript we also provide images (Supp. Figure 5)/videos (Supp. Video 4) to show this correlation in our experiments.

      Given the inherent heterogeneity in uptake processes and the use of inhibitors in most experiments, the distinction between ASM-dependent and independent pathways might not be as clear-cut as the authors suggest. Some caution here will be good. Can the authors estimate what fraction of intracellular bacteria are taken up ASM-dependent?

      We agree with the reviewer that an overlap between internalization pathways is likely. A clear distinction is therefore certainly non-trivial. Alternative to ASM-dependent and ASM-independent pathways, the ASM activity may also accelerate one or several internalization pathways. We address this limitation in the discussion of the revised manuscript (line 596 ff).

      Early in infection (~10 min after contact with the cells), the proportion of bacteria that enter host cells ASM-dependently is relatively high amounting to roughly 75-80% in HuLEC. After 30 min, this proportion is decreasing to about 50%. We included a paragraph in the discussion of the revised manuscript (line 593 ff).

      Reviewer #2 (Recommendations for the authors):

      (1) The experiment in Figure 4H is interesting. Details on what proportion of the cell is double positive, and if only this fraction was used for analysis will be good.

      We did use all bacteria found in the images independently from whether host cells were infected with only one or both strains. We unfortunately cannot properly determine the proportion of cells that are double infected, since i) we record the samples with CLSM and hence, cannot exclude that there are intracellular bacteria found in higher or lower optical sections. ii) we visualized cells by staining Nuclei and did not stain the cell borders, thus we cannot precisely tell to which host cell the bacteria localize.

      (2) Data is sparse for steps 5 and 6 of the model (line 330).

      We apologize for the inconvenience. There is a related study published  elsewhere[19], in which we identified NRCAM and PTK7 as putative receptors involved in this invasion pathway. We included a section in the discussion with the corresponding citation (line 569).

      (3) Data for the reduced number of intracellular bacteria upon blocking ASM-dependent uptake (line 235) is not clear. Do they mean decreased invasion efficiency? These two need not be the same.

      We changed “reduced number of intracellular bacteria” to “invasion efficiency”.

      (4) b-toxin added to the surface can get endocytosed. Can its surface effect be delineated from endo/phagosomal effect?

      We attempted to delineate effects contributed by the toxin activity on the surface vs. within phagosomes (Figure 5 A-C). We see an increased phagosomal escape, when we pretreated host cells with β-toxin (removal of SM form the surface) and infected either in presence (toxin will be taken up together with the bacteria into the phagosome) or in absence (toxin was washed away shortly before infection) of β-toxin. By contrast, overexpression of β-toxin by S. aureus did not affect phagosomal escape rates. The proper activity of β-toxin was confirmed by absence of Lysenin recruitment during phagosomal escape in all three conditions. We concluded that the activity on the surface and not the activity in the phagosome is important.

      (5) The potential role(s) of bacterial factors in the uptake and subsequent intracellular stages can be discussed.

      There are multiple bacterial adhesins known in S. aureus. These usually are either covalently attached to the bacterial cell wall such as the sortase-dependently anchored Fibronectin-binding Proteins A and B but also secreted and “cell wall binding” proteins as well at non proteinaceous factor such as wall-teichoic acids. A discussion of these factors would thus be out of the scope of this manuscript, and we here suggest reverting to specialized reviews on that topic.

      (6) The manuscript is not very easy to read. The abstract could be rephrased for better clarity and succinctness, with a clearly stated problem statement. The introduction is somewhat haphazard, I feel it can be better structured.

      We apologize for the inconvenience. We stated the problem/research question in the abstract and tried to improve the introduction without adding too much unnecessary detail. In general, we tried  to improve the readability of the manuscript and hope that our results and conclusions can be easier understood by the reader in the revised version.

      (7) Typo in Figure 5F. Step 6 should read "accessory receptors"

      The typo was corrected.

      References

      (1) Lloyd-Evans, E. et al. Niemann-Pick disease type C1 is a sphingosine storage disease that causes deregulation of lysosomal calcium. Nature Medicine 14, 1247-1255 (2008).

      (2) Launay, P. et al. TRPM4 Is a Ca<sup>2<sup>+</sup></sup>-Activated Nonselective Cation Channel Mediating Cell Membrane Depolarization. Cell 109, 397-407 (2002).

      (3) Nilius, B. et al. The Ca<sup>2<sup>+</sup></sup>‐activated cation channel TRPM4 is regulated by phosphatidylinositol 4,5‐biphosphate. The EMBO Journal 25, 467-478-478 (2006).

      (4) Cáceres, M. et al. TRPM4 Is a Novel Component of the Adhesome Required for Focal Adhesion Disassembly, Migration and Contractility. PLoS One 10, e0130540 (2015).

      (5) Silva, I., Brunett, M., Cáceres, M. & Cerda, O. TRPM4 modulates focal adhesion-associated calcium signals and dynamics. Biophysical Journal 123, 390a (2024).

      (6) Schlesier, T., Siegmund, A., Rescher, U. & Heilmann, C. Characterization of the Atl-mediated staphylococcal internalization mechanism. International Journal of Medical Microbiology 310, 151463 (2020).

      (7) Jevon, M. et al. Mechanisms of Internalization ofStaphylococcus aureus by Cultured Human Osteoblasts. Infection and Immunity 67, 2677-2681 (1999).

      (8) Rodriguez, A., Webster, P., Ortego, J. & Andrews, N.W. Lysosomes behave as Ca<sup>2<sup>+</sup></sup>-regulated exocytic vesicles in fibroblasts and epithelial cells. J Cell Biol 137, 93-104 (1997).

      (9) Krones & Rühling et al. Staphylococcus aureus alpha-Toxin Induces Acid Sphingomyelinase Release From a Human Endothelial Cell Line. Front Microbiol 12, 694489 (2021).

      (10) Sakurai, Y. et al. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (11) Aarhus, R., Graeff, R.M., Dickey, D.M., Walseth, T.F. & Lee, H.C. ADP-ribosyl cyclase and CD38 catalyze the synthesis of a calcium-mobilizing metabolite from NADP. J Biol Chem 270, 3032730333 (1995).

      (12) Schmid, F., Fliegert, R., Westphal, T., Bauche, A. & Guse, A.H. Nicotinic acid adenine dinucleotide phosphate (NAADP) degradation by alkaline phosphatase. J Biol Chem 287, 32525-32534 (2012).

      (13) Angeletti, C. et al. SARM1 is a multi-functional NAD(P)ase with prominent base exchange activity, all regulated bymultiple physiologically relevant NAD metabolites. iScience 25, 103812 (2022).

      (14) Gu, F. et al. Dual NADPH oxidases DUOX1 and DUOX2 synthesize NAADP and are necessary for Ca(2<sup>+</sup>) signaling during T cell activation. Sci Signal 14, eabe3800 (2021).

      (15) Schonn, J.-S., Maximov, A., Lao, Y., Südhof, T.C. & Sørensen, J.B. Synaptotagmin-1 and -7 are functionally overlapping Ca<sup>2<sup>+</sup></sup> sensors for exocytosis in adrenal chromaffin cells. Proceedings of the National Academy of Sciences 105, 3998-4003 (2008).

      (16) Kornhuber, J. et al. Functional Inhibitors of Acid Sphingomyelinase (FIASMAs): a novel pharmacological group of drugs with broad clinical applications. Cell Physiol Biochem 26, 9-20 (2010).

      (17) Naser, E. et al. Characterization of the small molecule ARC39, a direct and specific inhibitor of acid sphingomyelinase in vitro. J Lipid Res 61, 896-910 (2020).

      (18) Roth, A.G. et al. Potent and selective inhibition of acid sphingomyelinase by bisphosphonates. Angew Chem Int Ed Engl 48, 7560-7563 (2009).

      (19) Rühling, M., Schmelz, F., Kempf, A., Paprotka, K. & Fraunholz Martin, J. Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (20) Schuchman, E.H. & Desnick, R.J. Types A and B Niemann-Pick disease. Mol Genet Metab 120, 27-33 (2017).

      (21) Miller, M.E., Adhikary, S., Kolokoltsov, A.A. & Davey, R.A. Ebolavirus Requires Acid Sphingomyelinase Activity and Plasma Membrane Sphingomyelin for Infection. Journal of Virology 86, 7473-7483 (2012).

      (22) M. Rühling, L.K., F. Wagner, F. Schumacher, D. Wigger, D. A. Helmerich, T. Pfeuffer, R. Elflein, C. Kappe, M. Sauer, C. Arenz, B. Kleuser, T. Rudel, M. Fraunholz, J. Seibel Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nat Commun accepted in principle (2024).

      (23) Peters, S. et al. Neisseria meningitidis Type IV Pili Trigger Ca(2<sup>+</sup>)-Dependent Lysosomal Trafficking of the Acid Sphingomyelinase To Enhance Surface Ceramide Levels. Infect Immun 87 (2019).

      (24) Grassmé, H. et al. Acidic sphingomyelinase mediates entry of N. gonorrhoeae into nonphagocytic cells. Cell 91, 605-615 (1997).

      (25) Li, C. et al. Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (26) Fernandes, M.C. et al. Trypanosoma cruzi subverts the sphingomyelinase-mediated plasma membrane repair pathway for cell invasion. J Exp Med 208, 909-921 (2011).

      (27) Luisoni, S. et al. Co-option of Membrane Wounding Enables Virus Penetration into Cells. Cell Host & Microbe 18, 75-85 (2015).

      (28) Rühling, M. et al. Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nature Communications 15, 7456 (2024).

      (29) Ellison, C.J., Kukulski, W., Boyle, K.B., Munro, S. & Randow, F. Transbilayer Movement of Sphingomyelin Precedes Catastrophic Breakage of Enterobacteria-Containing Vacuoles. Curr Biol 30, 2974-2983 e2976 (2020).

      (30) Moldovan, A. & Fraunholz, M.J. In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (31) Slotte, J.P. Biological functions of sphingomyelins. Progress in Lipid Research 52, 424-437 (2013).

      (32) Stelzner, K. et al. Intracellular Staphylococcus aureus Perturbs the Host Cell Ca(2<sup>+</sup>) Homeostasis To Promote Cell Death. mBio 11 (2020).

      (33) Kunz, T.C. et al. The Expandables: Cracking the Staphylococcal Cell Wall for Expansion Microscopy. Front Cell Infect Microbiol 11, 644750 (2021).

      (34) Sakurai, Y. et al. Ebola virus. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (35) Grosz, M. et al. Cytoplasmic replication of Staphylococcus aureus upon phagosomal escape triggered by phenol-soluble modulin alpha. Cell Microbiol 16, 451-465 (2014).

      (36) Giese, B. et al. Staphylococcal alpha-toxin is not sufficient to mediate escape from phagolysosomes in upper-airway epithelial cells. Infect Immun 77, 3611-3625 (2009).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Weaknesses:

      While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant. Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.

      In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We made text edits to clarify the use of the linear mixed effect model. (page6, second paragraph and page 11, first paragraph)

      Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening also is a key advantage of our induced seizure model.  

      Reviewer 1 (Recommendations for the authors):

      (1) Address why the EEG data comparisons were performed between seizures and not between animals (as explicitly described in the public review). Further, a discussion of the biological significance (or lack thereof) of the effect size differences observed is warranted. This is especially concerning when the authors make the claim that spontaneous and induced seizures are essentially the same while their analysis shows all evaluated feature space parameters were significantly difference in the initial 1/3 of the EEG waveforms.

      We made text edits to clarify the use of the linear mixed effects model (page 6, second paragraph, and page 11, first paragraph)

      (2) The authors place great emphasis on the use of clinically/etiologically relevant epilepsy models in drug discovery research. There is discussion criticizing the time points required to enact kindling and the artificial nature of acute seizure induction methods. However, the combination IHK-opto seizure induction model also requires a lengthy timeline. A more tempered discussion of this novel model's strengths may benefit readers.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16.

      (3) The authors should further emphasize the benefit of having an inducible seizure model of focal epilepsy since other mouse models (e.g., genetic or TBI models) may have superior etiological relevance (construct and face validity) but may not be amenable to their optogenetic stimulation approach.

      Thank you for the suggestion. We revised the manuscript to better emphasize the potential significance of our approach. We added a discussion in the 'Application of Models...' section on page 15, second paragraph. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation.

      (4) Suggestion: Provide immunolabeled imagery demonstrating ChR2 presence in Thy1 cells.

      Thank you for the suggestion. We added a fluorescence image showing ChR2 expression in Fig. 2A

      (5) It might be prudent to mention any potential effects of laser heat on hippocampal cell damage, although the 10 Hz, ~10 mW, and 6 s stim is unlikely to cause any substantial burns. Without knowing the diameter and material of the optic fiber, this is left up to some interpretation.

      Thank you for the comments. In the Methods section, we listed the optical fiber diameter as 400 microns (page 17, EEG and Fiber Implantation section). Using 5–18 mW laser power with a relatively large fiber diameter of 400 microns, the power density falls within the range of commonly employed channelrhodopsin activation conditions in vivo. That said, we would like to investigate potential heat effects or cell damage in a follow-up study.

      (6) There are instances in the manuscript where the authors describe experimental and analytical parameters vaguely (e.g. "Seizures were induced several times a day", "stimulation was performed every 1 - 3 hours over many days"). These descriptions can and should be more precise.

      Thank you for the comments. To enhance clarity, we added the stimulation protocol in a flowchart format in Fig. S2A, describing how we determined the threshold and proceeded to the drug test. Following this protocol, there was variability in the number of stimulations per day.

      (7) In the second to last paragraph of the discussion, the authors state "However, HPDs are not generalizable across species - they are specific to the mouse model (55)." This statement is inaccurate. The paper cited comes from Dr. Corrine Roucard's lab at Synapcell. In fact, Dr. Rouchard argues the opposite (See Neurochem Res (2017) 42:1919-1925).

      Thank you for pointing out the mistake. On page 16, in the first paragraph, reference 55 (now 58 in the revised version) was intended to refer to 'quickly produce dose-response curves with high confidence.' In the revision, we cited another paper reporting that hippocampal spikes were not reproduced in the rat IHK model. R. Klee, C. Brandt, K. Töllner, W. Löscher, Various modifications of the intrahippocampal kainate model of mesial temporal lobe epilepsy in rats fail to resolve the marked rat-to-mouse differences in type and frequency of spontaneous seizures in this model. Epilepsy Behav. 68, 129–140 (2017).

      (8) In the discussion, Levetiracetam is highlighted as an ASM that would not be detected in acute induced seizure models; the authors point out its lack of effect in MES and PTZ. However, LEV is effective in the 6Hz test (also an acute-induced seizure model). This should be stated.

      Thank you for the comments. We highlighted the discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (9) The results text indicates that 9 epileptic mice were used to test LEV and DZP. However, the individual data points illustrated in Figure 5B show N=8 mice. Please correct.

      Thank you for the comments. A total of nine epileptic mice were used to assess two drugs, with the animals being re-used as indicated in the schematic. A total of eight assessments were conducted for DZP with six mice and eight assessments for LEV with five mice. Each assessment included hourly ChR2 activations without an ASM and hourly ChR2 activations after ASM injection.

      (10) Figure 4D: Naïve mice are labeled as solid blue circles in the legend while the data points are solid blue triangles. Please correct.

      Thank you. We corrected the marker in Fig.4D.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs. This seems to be a problem in other areas of the paper, also.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9, which shows behavioral seizure severity scores observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (2) The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.

      Thank you for the comments. IHK-injected mice had spontaneous tonic-clonic seizures before the start of optical stimulation, as shown in Figure S1.

      (3) The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.

      We appreciate the reviewer’s insights. We added a discussion comparing our model with other existing models in the Discussion section (pages 15 and 16, 'Comparison to Other Seizure Models Used in Pharmacologic Screening' section). In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      (4) The outcome measure for testing LEV and DZP on seizures was essentially the fraction of unsuccessful or successful activations of seizures, where high ASD efficacy is based on showing that the optogenetic stimulation causes fewer seizures when the drug is present. The final outcome measure is thus a percentage, which would still lead to a large number of tests to be assured of adequate statistical power. Thus, there is a concern about whether this proposed approach will have high enough resolution to be more useful than conventional screening methods so that one can obtain actual dose-response data on ASDs.

      Thank you for the comments. In this revision, we added Supplemental Figure S9, showing the severity of behavioral seizures observed before and during ASM testing for each animal. We observed a reduction in behavioral seizure severity for each subject. We would like to explore using behavioral severity as an outcome measure in a follow-up study.

      (5) The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.

      We appreciate the reviewer’s insights. We revised the manuscript to better emphasize the potential significance of our approach (page 15, second paragraph). The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents a critical goal. However, we believe that a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript, and we would like to explore it in a follow-up study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors should explain why 10 Hz was chosen as the stimulation frequency.

      Thank you for the comment. A frequency of 10 Hz was determined based on previous work using anesthetized animals prepared in an acute in vivo setting. To simplify the paper and avoid confusion, we did not include a discussion on how we determined the frequency. Instead, we added a detailed description of how we optimized the power in a flowchart format in Supplemental Figure S2. We hope this improves reproducibility.

      (2) After micro-injection of KA, morphological changes were observed in the hippocampus, but no comparison of Chr2 expression was made in naïve animals vs KA-injected animals. Presumably, the Thy1-Chr2 mouse expresses GFP in cells that express Chr2. Thus, it may be useful to show the expression of Chr2 in animals with hippocampal sclerosis. This may explain the lack of dramatic difference between stimulation parameters in naïve vs epileptic animals, as shown in supplemental Figure S2.

      Thank you for the suggestion. We added a fluorescence image of ChR2 expression in CA1, ipsilateral to the KA-injected site, in Fig. 2A.

      (3) The authors state that "During epileptogenesis, neural networks in the brain undergo various changes ranging from modification of membrane receptors to the formation of new synapses" and that these changes are critical for successful "on-demand" seizure induction. However, it is not clear or well-discussed whether changes in neuronal cell densities that occur during sclerosis are important for "on-demand" seizure induction as well. Also, the authors showed that naïve animals exhibit a kindling-like effect, but it was unclear whether a similar effect was present in epileptic animals (i.e. do stimulation thresholds to seizure induction change as the animal gets more induction stimulations)? If present, would the secondary kindling affect drug-testing studies (e.g., would the drug effect be different on induced seizure #2 vs induced seizure #20)?

      Thank you for the suggestion. Since this is an important aspect of the model, we would like to address the kindling effect, the secondary kindling effect, and histopathology in a longer-term setting (several weeks) in a follow-up study.

      (4) The authors show that in their model, LEV and DZP were both efficacious. The authors do not seem to mention that, over 25 years ago, LEV was originally missed in the standard ETSP screens; and, it was only discovered outside of the ETSP with the kindling model. The kindling model is now used to screen ASDs. The authors should consider adding this point to the Discussion. It remains unclear, however, if the author's screening strategy shows advantages over kindling and other such approaches in the field.

      Thank you for the suggestion. We added a discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (5) P8 paragraph 2. The authors state values for naïve animals, but they should also provide values for epileptic animals since they state that the groups were not significantly different (p>0.05). It would be useful to show values for both and state the actual p-value from the test. This issue of stating mean/median values with SD and sample size should be addressed for all data throughout the paper. Additionally, Figure S2 should be added to the manuscript and discussed, as it has data that may be valuable for the reproducibility of the paper.

      Thank you for the suggestion. Figure S2 shows the threshold power required to induce electrographic activity for n = 10 epileptic animals (9.14 ± 4.75 mW) and n = 6 naïve animals (6.17 ± 1.58 mW) (Wilcoxon rank-sum test, p = 0.137). The threshold duration was comparable between the same epileptic animals (6.30 ± 1.64 s) and naïve animals (5.67 ± 1.03 s) (Wilcoxon rank-sum test, p = 0.7133). 

      (6) In addition to the other stated references on synaptic reorganization in the CA1 area, the authors should mention similar studies from Esclapez et al. (1999, J Comp Neurol).

      Thank you. We have included the reference in the revision.

      (7) All of the raw EEG data on the seizures should be accessible to the readers.

      Thank you for the suggestion. We will consider depositing EEG data in a publicly accessible site.

      Reviewer 3 (Public review):

      Weaknesses:

      (1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      Thank you for the comment. We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9 to show the types of seizures observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.

      In this work, we used a linear mixed-effects model to address two levels of variability—between animals and within animals. The interactive linear mixed-effects model shows that most (~90%) of the variability in our data comes from within animals (residual), the random effect that the model accounts for, rather than between animals. Since variability between animals is low, the model identifies common changes in seizure propagation across animals while accounting for the variability in seizures within each animal. Therefore, the results we find reflect changes that occur across animals, not individual seizures. We made text edits to clarify the use of the linear mixed-effects model.

      (4) Seizure burden is not easily tested.

      Thank you for the comment. We added Supplemental Figure S9 to summarize the severity of behavioral seizures before and during ASM testing. This addresses the reviewer’s comment on seizure burden. In a follow-up study, we would like to explore this type of outcome measure for drug screening.

      Reviewer 3 (Recommendations for the authors):

      (1) Provide a stronger rationale to use area CA1. For example, the authors mention that CA1 is active during seizure activity, but can seizures originate from CA1? That would make the approach logical and also explain why induced and spontaneous seizures are similar.

      Thank you for the comment. We discussed it in the Discussion section (page 14, first and second paragraphs).

      (2) Explain the use of SVM classifiers so it is more convincing that induced and spontaneous seizures are similar. Or, if they are not similar, explain that this is a limitation.

      We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (3)If feasible, extend the duration over which seizure induction reliability is assessed so that the long-term utility of the model can be demonstrated.

      Thank you for the suggestion. We would like to assess long-term utility in a follow-up study.

      (4) The GitHub link is not yet active. The authors will be required to supply their relevant code for peer evaluation as well as publication.

      Thank you. The GitHub repository is now active.

      (5) State and assess the impacts of sex as a biological variable.

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      Thank you for the valuable comments and good suggestions you have proposed, and we have added information and analysis of another mouse model for LAMA2-MD in the updated version 2 of this manuscript.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Thank you for the good comments you have proposed, and we have carefully corrected the overinterpretation and overstatements in the previous updated version.

      Unfortunately, the data on RNA-seq and scRNA-seq are still rather weak. scRNA-seq was conducted with only one mouse resulting in only 8000 nuclei. I am not convinced that the data allow us to interpret them to the extent of the authors. Similar to the first version, the authors infer function by examining expression. Although they are a bit more cautious, they still argue that the BBB is not functional in dy<sup>H</sup>/dy<sup>H</sup> mice without showing leakiness. Such experiments can be done using dyes, such as Evans-blue or Cadaverin. Hence, I would suggest that they formulate the text still more carefully.

      Thank you for the valuable suggestions. We also agree that we should perform more related functional experiments such as Evans-blue or Cadaverin to confirm the impaired BBB. However, the related functional experiments haven’t been done due to the first author has been working in clinic. While, we have added the "Limitations" part, and made statements in the Limitations part with "Even though RNA-seq and scRNA-seq have been performed, the data of scRNA-seq are still insufficient due to the limited number of mouse brains. This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed".

      A similar lack of evidence is true for the suggested cobblestone-like lissencephaly of the mice. There is no strong evidence that this is indeed occurring in the mice (might also be a problem because mice die early). Hence, the conclusions need to be formulated in such a way that readers understand that these are interpretations and not facts.

      Thank you for the valuable suggestions. We do agree with this comment, and have made statement in the Limitations with "This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed". Also, for the cobblestone-like lissencephaly which was showed in LAMA2-CMD patients while not found in the mouse model, we have added the discussion as "Though the cortical malformations were not found in the dy H/dy H brains by MRI analysis probably due to the small volume in within 1 month old, Thus, the changes in transcriptomes and protein levels provided potentially useful data for the hypothesis of the impaired gliovascular basal lamina of the BBB, which might be associated with occipital pachygyria in LAMA2-CMD patients."

      Finally, I am surprised that the only improvement in the main figures is the Western blot for laminin-alpha2. The histology of skeletal muscle still looks rather poor. I do not know what the problems are but suggest that the authors try to make sections from fresh-frozen tissue. I anticipate that the mice were eventually perfused with PFA before muscles were isolated. This often results in the big gaps in the sections.

      Thank you for the valuable suggestions. We do agree with this comment and we should make sections from fresh-frozen tissue. Therefore, we have made statement in the Limitations with "Moreover, due to making sections with PFA before muscles isolated, and not from fresh-frozen tissue, there have been big gaps in the sections which do affect the histology of skeletal muscle to some extent."

      Overall, the work is improved but still would need additional experiments to make it really an important addition to the literature in the LAMA-MD field.

      Thank you for all your good comments and the valuable suggestions.

      Reviewer #2 (Public Review):

      This revised manuscript describes the production of a mouse model for LAMA2- Related Muscular Dystrophy. The authors investigate changes in transcripts within the brain and blood barrier. The authors also investigate changes in the transcriptome associated with the muscle cytoskeleton. Strengths: (1) The authors produced a mouse model of LAMA2-CMD using CRISPR-Cas9. (2) The authors identify cellular changes that disrupted the blood-brain barrier.

      Thank you for your good comments.

      Weaknesses:

      The authors throughout the manuscript overstate "discoveries" which have been previously described, published and not appropriately cited.

      Thank you for your great suggestion. We have toned-down the interpretations and overstatements throughout the manuscript, and added words such as "potentially", "possible", "some potential clues", "was speculated to probably", and so on.

      Alternations in the blood brain barrier and in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published in the literature and are not cited appropriately.

      Thank you for your great suggestion. We do agree with that alternations in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published, and the related literatures have been cited in the updated version 2.0. However, alternations in the blood brain barrier in LAMA2-CMD haven’t been extensively studied, only some papers (such as PMID: 25392494, PMID: 32792907) have investigated or discussed this issue.

      The authors have increased animal number to N=6, but this is still insufficient based on Power analysis results in statistical errors and conclusions that may be incorrect.

      Thank you for your great suggestion. We do agree that the animal number should be increased for Power analysis, and we have added statements in the Limitations with "Finally, due to the limited number of animal samples for the Power analysis, the statistical errors and conclusions might be affected."

      The use of "novel mouse model" in the manuscript overstates the impact of the study.

      Thank you for your great suggestion. We have changed the statement "novel mouse model" throughout the manuscript except the title.

      All studies presented are descriptive and do not more to the field except for producing yet another mouse model of LAMA2-CMD and is the same as all the others produced.

      Thank you for your comment. We do agree that further functional experiments have not been performed to reveal and confirm the pathogenesis. However, the analysis of phenotype was systematic and comprehensive, including survival time, motor function, serum CK, muscle MRI, muscle histopathology in different stages, and brain histopathology. Moreover, RNA-seq and scRNA-seq in LAMA2-CMD have been seldom performed before, and the data in this study could provide potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD.

      Grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength, which is better achieved using ex vivo or in vivo muscle contractility studies.

      Thank you for your great suggestion. We do agree that grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength. And we have added related statement in the Limitations with "Grip strength measurements used in this study are considered error prone and do not give an accurate measurement of muscle strength, which would be better achieved using ex vivo or in vivo muscle contractility studies."

      A lack of blinded studies as pointed out of the authors is a concern for the scientific rigor of the study.

      Thank you for your great suggestion. We performed the studies with those scoring outcome measures not blinded to the groups. Actually, it was very easy to discriminate the dy<sup>H</sup>/dy<sup>H</sup> groups from the WT/Het mice due to that the dy<sup>H</sup>/dy<sup>H</sup> mice showed much smaller body shape than other groups from as early as P7 .

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      There are multiple grammatical errors throughout the manuscript which should be corrected.

      Thank you for your recommendation. We have carefully corrected the grammatical errors within the manuscript.

      The authors mention no changes in intestinal muscles, but it is unclear if they are referring to skeletal or smooth muscle.

      Thank you for your good comment. The intestinal muscles with no changes in this study are referring to smooth muscle, and we have changes the description into intestinal smooth muscles.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have done an impressive job in responding to the previous critique and even gone beyond what was asked. I have only very minor comments on this excellent manuscript. The manuscript also needs some light editing for grammar and readability.

      We have worked to improve the grammar and readability of the manuscript.

      Comments:

      Lines 227-234: At what age was tamoxifen administered to the various CreERTM mice?

      We have updated the ages of the mice used in this study in the methods sections.

      UMAP in Figure 5A is missing label for cluster 19.

      The UMAP in Figure 5A has the label for cluster 19 at the center-bottom of the image.

      Supplement Figure 6: Cluster 10 seems to be separate from the other AdvC clusters, and it includes some expression of Myh11 and Notch3. Further, there is low expression of Pdgfra in this cluster, which can be seen in panel B and panels D-I. Are the Pdgfra negative cells in the pie charts from cluster 10? Could the cells in this cluster by more LMC like than AdvC like?

      We agree with the reviewer that the subcluster 10 of the fibroblasts cells are intriguing if only a minor population. When assessing just this population of cells, which is 77 cells out of 2261 total, 40 of the 77 were Pdgfra+ and of the 37 remaining Pdgfra- but 11 of those were still CD34+. Thus at least half of these cells could be expected to have the PdgfraCreERTM. Only 8 of the 37 were Pdgfra-Notch3+ while 12 cells were Pdgfra+Notch3+, and only 3 were Pdgfra-Myh11+ while 3 were Pdgfra+Myh11+. 26 of 77 cells were Pdgfra+Pdgfrb+ double positive, while 12 of 37 Pdgfra- cells were still Pdgfrb+. Additionally, within the 77 cells of subcluster 10 17 were positive for Scn3a (Nav1.3), 21were positive for Kcnj8 (Kir6.1), and 33 were positive for Cacna1c (Cacna1c) which are typically LMC markers would support the reviewers thinking that this group contains a fibroblast-LMC transitional cell type. Only 2 of 77 cells were positive for the BK subunit (Kcnma1), which is a classic smooth muscle marker. Another possibility is this population represents the Pdgfra+Pdgfrb+ valve interstitial cells we identified in our IF staining and in our reporter mice. Of note almost all cells in this cluster were Col3a1+ and Vim+. Even though we performed QC analysis to remove doublets, it is also possible some of these cells could represent doublets or contaminants, however the low % of Myh11 expression, a very highly expressed gene in LMCs especially compared to ion channels, would suggest this is less likely. Assessing the presence of this particular cell cluster in future RNAseq or with spatial transcriptomics will be enlightening.

      Line 360. Proofread section title.

      We have simplified this title to read “Optogenetic Stimulation of iCre-driven Channel Rhodopsin 2”

      Lines 370-371. Are the length units supposed to be microns or millimeters?

      We have corrected this to microns as was intended. Thank you for catching this error.

      The resolution for each UMAP analysis should be stated, particularly for the identification of subclusters. How was the resolution chosen?

      To select the optimal cluster resolution, we used Clustree with various resolutions. We examined the resulting tree to identify a resolution where the clusters were well-separated and biologically meaningful, ensuring minimal merging or splitting at higher resolutions. Our goal was to find a resolution that captures relevant cell subpopulations while maintaining distinct clusters without excessive fragmentation. We have now stated the resolution for the subclustering of the LECs, LMCs, and fibroblasts. We have also added greater detail regarding the total number of cells, QC analysis, and the marker identification criteria used to the methods sections. We used resolution of 0.5 for sub-clustering LMCs, 0.87 for LECs, and 1.0 for fibroblasts.  These details are now added to the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author Response:

      In the Weaknesses, Reviewer 3 suggests that in the Discussion, we comment upon whether WRN ATPase/3’-5’ helicase and WRNIP1 ATPase work on Y-family Pols additively or synergistically to raise fidelity. However, in the Discussion on page 20, we do comment on the role of WRN and WRNIP1 ATPase activities in conferring an additive increase in the fidelity of TLS by Y-family Pols.

    1. Author Response:

      We thank the reviewers for their thoughtful feedback and appreciate their recognition of the value of our findings. In response, we are refining the manuscript to clarify key terminology, more clearly describe our image analysis workflows, and temper the interpretation of our results where appropriate. We are planning to perform additional experiments to further investigate the specificity of mRNA co-localization between BK and CaV1.3 channels. We acknowledge the importance of understanding ensemble trafficking dynamics and the functional role of pre-assembly at the plasma membrane, and we plan to explore these questions in future work. We look forward to submitting a revised manuscript that addresses the reviewers’ comments in detail.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

      We thank the reviewer for the valuable recommendation. We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      Furthermore, we acknowledge reviewers' comments that SIRT2 regulates systemic inflammatory responses and provides potent protection against viral infection. Additionally, NF-κB is not the only mediator of SIRT2's suppression of viral infection; other possible molecular mechanisms are also involved in this process.

      Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

      We thank the reviewer for the valuable recommendation. We are willing to measure NF-kB acetylation in AdSIRT2 JEV-infected cells compared to WT-infected cells, to verify that the acetylation of NF-kB is truly linked to SIRT2 expression levels as per the reviewers' suggestion.

      We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      We are accepting the reviewer's suggestion that AGK2 can also inhibit other Sirtuins. Thus, to test the contribution of other Sirtuins, the experiment could be repeated using wild-type and Sirt2 KO mice. We are willing to conduct the AGK2 experiment using JEV-infected wild-type and Sirt2 knockout mice.

    1. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. Author response:

      We thank the editors and the reviewers for their positive comments regarding our manuscript and the methodological approach we have taken to understand the historical demographic response of endemic island birds to climate change. We acknowledge the issues of uneven sample sizes and plan to include additional species of island endemic birds for which genomic data is now available. As requested by reviewer 1, we will also address the issues related to the PSMC analysis in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      For clarity, the methods would benefit from further detail of task framing to participants. I.e. were there explicit instructions regarding volatility/task contingencies? Or were participants told nothing?

      We have added in the following explanatory text to the methods section (page 20), clarifying the limited instructions provided to participants:

      “Participants were informed that the task would be split into 6 blocks, that they had to learn which was the best option to choose, and that this option may change over time. They were not informed about the different forms of uncertainty we were investigating or of the underlying structure of the task (that uncertainty varied between blocks).”

      In the results, it would be useful to report the general task behavior of participants to get a sense of how they performed across different parts of the task. Also, were participants excluded if they didn't show evidence of learning adaptation to volatility?

      We have added the following text reporting overall performance to the results (page 6):

      “Participants were able to learn the best option to choose in the task, selecting the most highly rewarded option on an average of 71% of trials (range 65% - 74%).”

      And the following text to the methods, confirming that participants were not excluded if they didn’t respond to volatility/noise (the failure in this adaptation is the focus of the current study) (page 19):

      “No exclusion criteria related to task performance were used.”

      The results would benefit from a more intuitive explanation of what the lesioning is trying to recapitulate; this can get quite technical and the objective is not necessarily clear, especially for the less computationally-minded reader.

      We have amended the relevant section of the results to clarify this point (page 9):

      “Having shown that an optimal learner adjusts its learning rate to changes in volatility and noise as expected, we next sought to understand the relative noise insensitivity of participants. In these analyses we “lesion” the BOM, to reduce its performance in some way, and then assess whether doing so recapitulates the pattern of learning rate adaptation observed for participants (Fig 3e). In other words, we damage the model so it performs less well and then assess whether this damage makes the behaviour of the BOM (shown in Fig 3f) more closely resemble that seen in participants (Fig 3e).”

      The modelling might be improved by the inclusion of another class of model. Specifically, models that adapt learning rates in response to the estimation of latent states underlying the current task outcomes would be very interesting to see. In a sense, these are also estimating volatility through changeability of latent states, and it would be interesting to explore whether the findings could also be explained by an incorrect assumption that the latent state has changed when outcomes are noisy.

      Thank you for this suggestion. We have added additional sections to the supplementary materials in which we use a general latent state model and a simple RL model to try to recapitulate the behaviour of participants (and to compare with the BOM). These additional sections are extensive, so are not reproduced here. We have also added in a section to the discussion in the main paper covering this interesting question in which we confirm that we were unable to reproduce participant behaviour (or the normative effect of the lesioned BOMs) using these models but suggest that alternative latent state formulations would be interesting to explore in future work (page 18):

      “A related question is whether other, non-Bayesian model formulations may be able to account for participants’ learning adaptation in response to volatility and noise. Of note, the reinforcement learning model used to measure learning rates in separate blocks does not achieve this goal—as this model is fitted separately to each block rather than adapting between blocks (NB the simple reinforcement learning model that is fitted across all blocks does not capture participant behaviour, see supplementary information). One candidate class of model that has potential here is latent-state models (Cochran & Cisler, 2019), in which the variance and unexpected changes in the process being learned (which have a degree of similarity with noise and volatility respectively) is estimated and used to alter the model’s rates of updating as well as the estimated number of states being considered. Using the model described by Cochran and Cisler, we were unable to replicate the learning rate adaptation demonstrated by participants in the current study (see supplementary information) although it remains possible that other latent state formulations may be more successful. “

      The discussion may benefit from a little more discussion of where this work leads us - what is the next step?

      As above, we have added in a suggestion about future modelling work. We have also added in a section about the outstanding interesting questions concerning the neural representation of these quantities, reproduced in response to the suggestion by reviewer #2 below.

      Reviewer #2 (Recommendations for the authors):

      The study presents an opportunity to explore potential neural coding models that could account for the cognitive processes underlying the task. In the field of neural coding, noise correlation is often measured to understand how a population of neurons responds to the same stimulus, which could be related to the noise signal in this task. Since the brain likely treats the stimulus as the same, with noise representing minor changes, this aspect could be linked to the participants' difficulty distinguishing noise from volatility. On the other hand, signal correlation is used to understand how neurons respond to different stimuli, which can be mapped to the volatility signal in the task. It would be highly beneficial if the authors could discuss how these established concepts from neural population coding might relate to the Bayesian behavior model used in the study. For instance, how might neurons encode the distinction between noise and volatility at a population level? Could noise correlation lead to the misattribution of noise as volatility at a neural level, mirroring the behavioral findings? Discussing possible neural models that could explain the observed behavior and relating it to the existing literature on neural population coding would significantly enrich the discussion. It would also open up avenues for future research, linking these behavioral findings to potential neural mechanisms.

      We thank the reviewer for this interesting suggestion. We have added in the following paragraph to the discussion section which we hope does justice to this interesting questions (page 18):

      Previous work examining the neural representations of uncertainty have tended to report correlations between brain activity and some task-based estimate of one form of uncertainty at a time (Behrens et al., 2007; Walker et al., 2020, 2023). We are not aware of work that has, for example, systematically varied volatility and noise and reported distinct correlations for each. An interesting possibility as to how different forms of uncertainty may be encoded is suggested by parallels with the neuronal decoding literature. One question addressed by this literature is how the brain decodes changes in the world from the distributed, noisy neural responses to those changes, with a particular focus on the influence of different forms of between-neuron correlation (Averbeck et al., 2006; Kohn et al., 2016). Specifically, signal-correlation, the degree to which different neurons represent similar external quantities (required to track volatility) is distinguished from, and often limited by, noise-correlation, the degree to which the activity of different neurons covaries independently of these external quantities. One possibility relevant to the current study, which resembles the underlying logic of the BOM, is that a population of neurons represents the estimated mean of the generative process that produces task outcomes. In this case, volatility would be tracked as the signal-correlation across this population, whereas noise would be analogous to the noise-correlation and, crucially, misestimation of noise as volatility might arise as misestimation of these two forms of correlation. While the current study clearly cannot adjudicate on the neural representation of these processes, our finding of distinct behavioural and physiological responses to the two forms of uncertainty, does suggest that separable neural representations of uncertainty are maintained. “

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an excellent study by a superb investigator who discovered and is championing the field of migrasomes. This study contains a hidden "gem" - the induction of migrasomes by hypotonicity and how that happens. In summary, an outstanding fundamental phenomenon (migrasomes) en route to becoming transitionally highly significant.

      Strengths:

      Innovative approach at several levels. Migrasomes - discovered by Dr Yu's group - are an outstanding biological phenomenon of fundamental interest and now of potentially practical value.

      Weaknesses:

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      We sincerely thank the reviewer for the encouraging and insightful comments. We fully agree that the fundamental aspects of migrasome biology are of great importance and deserve deeper exploration.

      In line with the reviewer’s suggestion, we have expanded our discussion on the basic biology of engineered migrasomes (eMigs). A recent study by the Okochi group at the Tokyo Institute of Technology demonstrated that hypoosmotic stress induces the formation of migrasome-like vesicles, involving cytoplasmic influx and requiring cholesterol for their formation (DOI: 10.1002/1873-3468.14816, February 2024). Building on this, our study provides a detailed characterization of hypoosmotic stressinduced eMig formation, and further compares the biophysical properties of natural migrasomes and eMigs. Notably, the inherent stability of eMigs makes them particularly promising as a vaccine platform.

      Finally, we would like to note that our laboratory continues to investigate multiple aspects of migrasome biology. In collaboration with our colleagues, we recently completed a study elucidating the mechanical forces involved in migrasome formation (DOI: 10.1016/j.bpj.2024.12.029), which further complements the findings presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors' report describes a novel vaccine platform derived from a newly discovered organelle called a migrasome. First, the authors address a technical hurdle in using migrasomes as a vaccine platform. Natural migrasome formation occurs at low levels and is labor intensive, however, by understanding the molecular underpinning of migrasome formation, the authors have designed a method to make engineered migrasomes from cultured, cells at higher yields utilizing a robust process. These engineered migrasomes behave like natural migrasomes. Next, the authors immunized mice with migrasomes that either expressed a model peptide or the SARSCoV-2 spike protein. Antibodies against the spike protein were raised that could be boosted by a 2nd vaccination and these antibodies were functional as assessed by an in vitro pseudoviral assay. This new vaccine platform has the potential to overcome obstacles such as cold chain issues for vaccines like messenger RNA that require very stringent storage conditions.

      Strengths:

      The authors present very robust studies detailing the biology behind migrasome formation and this fundamental understanding was used to form engineered migrasomes, which makes it possible to utilize migrasomes as a vaccine platform. The characterization of engineered migrasomes is thorough and establishes comparability with naturally occurring migrasomes. The biophysical characterization of the migrasomes is well done including thermal stability and characterization of the particle size (important characterizations for a good vaccine).

      Weaknesses:

      With a new vaccine platform technology, it would be nice to compare them head-tohead against a proven technology. The authors would improve the manuscript if they made some comparisons to other vaccine platforms such as a SARS-CoV-2 mRNA vaccine or even an adjuvanted recombinant spike protein. This would demonstrate a migrasome-based vaccine could elicit responses comparable to a proven vaccine technology. 

      We thank the reviewer for the thoughtful evaluation and constructive suggestions, which have helped us strengthen the manuscript. 

      Comparison with proven vaccine technologies:

      In response to the reviewer’s comment, we now include a direct comparison of the antibody responses elicited by eMig-Spike and a conventional recombinant S1 protein vaccine formulated with Alum. As shown in the revised manuscript (Author response image 1), the levels of S1-specific IgG induced by the eMig-based platform were comparable to those induced by the S1+Alum formulation. This comparison supports the potential of eMigs as a competitive alternative to established vaccine platforms. 

      Author response image 1.

      eMigrasome-based vaccination showed similar efficacy compared with adjuvanted recombinant spike protein The amount of S1-specific IgG in mouse serum was quantified by ELISA on day 14 after immunization. Mice were either intraperitoneally (i.p.) immunized with recombinant Alum/S1 or intravenously (i.v.) immunized with eM-NC, eM-S or recombinant S1. The administered doses were 20 µg/mouse for eMigrasomes, 10 µg/mouse (i.v.) or 50 µg/mouse (i.p.) for recombinant S1 and 50 µl/mouse for Aluminium adjuvant.

      Assessment of antigen integrity on migrasomes:

      To address the reviewer’s suggestion regarding antigen integrity, we performed immunoblotting using antibodies against both S1 and mCherry. Two distinct bands were observed: one at the expected molecular weight of the S-mCherry fusion protein, and a higher molecular weight band that may represent oligomerized or higher-order forms of the Spike protein (Figure 5b in the revised manuscript).

      Furthermore, we performed confocal microscopy using a monoclonal antibody against Spike (anti-S). Co-localization analysis revealed strong overlap between the mCherry fluorescence and anti-Spike staining, confirming the proper presentation and surface localization of intact S-mCherry fusion protein on eMigs (Figure 5c in the revised manuscript). These results confirm the structural integrity and antigenic fidelity of the Spike protein expressed on eMigs.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      I know that the reviewers always ask for more, and this is not the case here. Can the abstract and title be changed to emphasize the science behind migrasome formation, and possibly add a few more fundamental aspects on how hypotonic shock induces migrasomes?

      Alternatively, if the authors desire to maintain the emphasis on vaccines, can immunological mechanisms be somewhat expanded in order to - at least to some extent - explain why migrasomes are a better vaccine vehicle?

      One way or another, this reviewer is highly supportive of this study and it is really up to the authors and the editor to decide whether my comments are of use or not.

      My recommendation is to go ahead with publishing after some adjustments as per above.

      We’d like to thank the reviewer for the suggestion. We have changed the title of the manuscript and modified the abstract, emphasizing the fundamental science behind the development of eMigrasome. To gain some immunological information on eMig illucidated antibody responses, we characterized the type of IgG induced by eM-OVA in mice, and compared it to that induced by Alum/OVA. The IgG response to Alum/OVA was dominated by IgG1. Quite differently, eM-OVA induced an even distribution of IgG subtypes, including IgG1, IgG2b, IgG2c, and IgG3 (Figure 4i in the revised manuscript). The ratio between IgG1 and IgG2a/c indicates a Th1 or Th2 type humoral immune response. Thus, eM-OVA immunization induces a balance of Th1/Th2 immune responses.

      Reviewer #2 (Recommendations For The Authors):

      The study is a very nice exploration of a new vaccine platform. This reviewer believes that a more head-to-head comparison to the current vaccine SARS-CoV-2 vaccine platform would improve the manuscript. This comparison is done with OVA antigen, but this model antigen is not as exciting as a functional head-to-head with a SARS-CoV-2 vaccine.

      I think that two other discussion points should be included in the manuscript. First, was the host-cell protein evaluated? If not, I would include that point on how issues of host cell contamination of the migrasome could play a role in the responses and safety of a vaccine. Second, I would discuss antigen incorporation and localization into the platform. For example, the full-length spike being expressed has a native signal peptide and transmembrane domain. The authors point out that a transmembrane domain can be added to display an antigen that does not have one natively expressed, however, without a signal peptide this would not be secreted and localized properly. I would suggest adding a discussion of how a non-native signal peptide would be necessary in addition to a transmembrane domain.

      We thank the reviewer for these thoughtful suggestions and fully agree that the points raised are important for the translational development of eMig-based vaccines.

      (1) Host cell proteins and potential immunogenicity:

      We appreciate the reviewer’s suggestion to consider host cell protein contamination. Considering potential clinical application of eMigrasomes in the future, we will use human cells with low immunogenicity such as HEK-293 or embryonic stem cells (ESCs) to generate eMigrasomes. Also, we will follow a QC that meets the standard of validated EV-based vaccination techniques. 

      (2) Antigen incorporation and localization—signal peptide and transmembrane domain:

      We also agree with the reviewer’s point that proper surface display of antigens on eMigs requires both a transmembrane domain and a signal peptide for correct trafficking and membrane anchoring. For instance, in the case of full-length Spike protein, the native signal peptide and transmembrane domain ensure proper localization to the plasma membrane and subsequent incorporation into eMigs. In case of OVA, a secretary protein that contains a native signal peptide yet lacks a transmembrane domain, an engineered transmembrane domain is required. For antigens that do not naturally contain these features, both a non-native signal peptide and an artificial transmembrane domain are necessary. We have clarified this point in the revised discussion and explicitly noted the requirement for a signal peptide when engineering antigens for surface display on migrasomes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The chromophore molecule of animal and microbial rhodopsins is retinal which forms a Schiff base linkage with a lysine in the 7-th transmembrane helix. In most cases, the chromophore is positively charged by protonation of the Schiff base, which is stabilized by a negatively charged counterion. In animal opsins, three sites have been experimentally identified, Glu94 in helix 2, Glu113 in helix 3, and Glu181 in extracellular loop 2, where a glutamate acts as the counterion by deprotonation. In this paper, Sakai et al. investigated molecular properties of anthozoan-specific opsin II (ASO-II opsins), as they lack these glutamates. They found an alternative candidate, Glu292 in helix 7, from the sequences. Interestingly, the experimental data suggested that Glu292 is not the direct counterion in ASO-II opsins. Instead, they found that ASO-II opsins employ a chloride ion as the counterion. In the case of microbial rhodopsin, a chloride ion serves as the counterion of light-driven chloride pumps. This paper reports the first observation of a chloride ion as the counterion in animal rhodopsin. Theoretical calculation using a QM/MM method supports their experimental data. The authors also revealed the role of Glu292, which serves as the counterion in the photoproduct, and is involved in G protein activation.

      The conclusions of this paper are well supported by data, while the following aspects should be considered for the improvement of the manuscript.

      We thank the reviewer for carefully reading the manuscript and providing important suggestions. Below, we address the specific comments.

      (1) Information on sequence alignment only appears in Figure S2, not in the main figures. Figure S2 is too complicated by so many opsins and residue positions. It will be difficult for general readers to follow the manuscript because of such an organization. I recommend the authors show key residues in Figure 1 by picking up from Figure S2.

      We thank the reviewer for pointing this out. As suggested, we have selected key residues (potential counterion sites) from Fig. S2 and show them now as Fig. 1B in the revised manuscript. Fig. S2 has also been simplified by showing only the most important residues.

      (2) Halide size dependence. The authors observed spectral red-shift for larger halides. Their observation is fully coincident with the chromophore molecule in solution (Blatz et al. Biochemistry 1972), though the isomeric states are different (11-cis vs all-trans). This suggests that a halide ion is the hydrogen-bonding acceptor of the Schiff base N-H group in solution and ASO-II opsins. A halide ion is not the hydrogen-bonding acceptor in the structure of halorhodopsin, whose halide size dependence is not clearly correlated with absorption maxima (Scharf and Engelhard, Biochemistry 1994). These results support their model structure (Figure 4), and help QM/MM calculations.

      We appreciate the comment, which provides a deeper insight into our results and reinforces our conclusions. We have revised the discussion of the effect of halide size on the λ<sub>max</sub> shift to cite the prior work mentioned by the reviewer.

      (3) QM/MM calculations. According to Materials and Methods, the authors added water molecules to the structure and performed their calculations. However, Figure 4 does not include such water molecules, and no information was given in the manuscript. In addition, no information was given for the chloride binding site (contact residues) in Figure 4. More detailed information should be shown with additional figures in Figure SX.

      We thank the reviewer for making us realize that Fig. 4 was oversimplified.

      We have added following text in the “Structural modelling and QM/MM calculations of the dark state of Antho2a” section:

      Lines 220 – 223

      “The chloride ion is also coordinated by two water molecules and the backbone of Cys187 which is part of a conserved disulfide bridge (Fig. S2). The retinylidene Schiff base region also includes polar (Ser186, Tyr91) and non-polar (Ala94, Leu113) residues (Fig. 4).”

      We have updated Fig. 4 and its legend to show a more detailed environment of the protonated Schiff base and the chloride ion, including water molecules and other nearby residues.

      (4) Figure 5 clearly shows much lower activity of E292A than that of WT, whose expression levels are unclear. How did the authors normalize (or not normalize) expression levels in this experiment?

      We thank the reviewer for this valuable comment. In the previous version of the manuscript, we did not normalize the activity based on expression levels. We have considered this in the amended version.

      First, we evaluated the expression levels of wild type and E292A Antho2a by comparing absorbances at λ<sub>max</sub> (± 5 nm) of these pigments that were expressed and purified under the same conditions. Assuming that their molar absorption coefficients at the absorption maximum wavelengths are approximately the same, this can allow us to roughly compare their expression levels. The relative expression of the E292A mutant compared to the wild type (set as 1) was 0.81 at pH 6.5 and 140 mM NaCl, in which 94.0% (for E292A) and 99.8% (for wild type) of the Schiff base is protonated (Fig. 3A and B). As we conducted the live cell Ca<sup>2+</sup> assay in media at pH 7.0, we estimated the proportion of the protonated states of wild type and E292A mutant at same pH. The relative amounts of the protonated states to the wild type at pH 6.5 (set as 1) were estimated to be 0.99 for wild type and 0.84 for E292A. Together, the protonated pigment of the E292A mutant was calculated to be about 73% of that of the wild type at pH 7.0. From Fig. 5, the amplitude of Ca<sup>2+</sup> response of the E292A mutant was 12.1% of the wild type, showing that even after normalizing the expression levels, the Ca<sup>2+</sup> response amplitude was lower in the E292A mutant than in the wild type. This leads to our conclusion that the E292A mutation can also influence the G protein activation efficiency.

      We have added Fig. S11 showing the comparison of expression levels between the wild type and E292A of Antho2a (Fig. S11A) and maximum Ca<sup>2+</sup> responses after normalizing the expression levels (Fig. S11B).

      We have also revised the discussion section as follows:

      Lines 324 – 335

      “The relative expression level of the E292A mutant of Antho2a was approximately 0.81 of the wild type (set as 1), as determined by comparing absorbances at λ<sub>max</sub> for both pigments expressed and purified under identical conditions (Fig. S11A). Additionally, the fraction of protonated pigment relative to the wild type (set as 1 at pH 6.5) was estimated to be 0.94 for the E292A mutant at pH 6.5, and 0.99 and 0.84 for the wild type and the E292A mutant at pH 7.0, respectively (Fig. 3A and B). Since pH 7.0 corresponds to the conditions used in the live cell Ca<sup>2+</sup> assays, the effective amount of protonated pigment for the E292A mutant was approximately 73% of the wild type. Nevertheless, even after normalization for these differences, the Ca<sup>2+</sup> response amplitude of the E292A mutant remained significantly lower (~ 17% of wild type, compared to the observed 12% prior to normalization; Fig. 5 and Fig. S11B). These observations suggest that Glu292 serves not only as a counterion in the photoproduct but also plays an allosteric role in influencing G protein activation.”

      (5) The authors propose the counterion switching from a chloride ion to E292 upon light activation. A schematic drawing on the chromophore, a chloride ion, and E292 (and possible surroundings) in Antho2a and the photoproduct will aid readers' understanding.

      We thank the reviewer for this excellent suggestion. We have prepared a new figure with a schematic drawing of the environment of the protonated Schiff base depicting the counterion switch in Fig. S10.

      Reviewer #2 (Public review):

      Summary:

      This work reports the discovery of a new rhodopsin from reef-building corals that is characterized experimentally, spectroscopically, and by simulation. This rhodopsin lacks a carboxylate-based counterion, which is typical for this family of proteins. Instead, the authors find that a chloride ion stabilizes the protonated Schiff base and thus serves as a counterion.

      Strengths:

      This work focuses on the rhodopsin Antho2a, which absorbs in the visible spectrum with a maximum at 503 nm. Spectroscopic studies under different pH conditions, including the mutant E292A and different chloride concentrations, indicate that chloride acts as a counterion in the dark. In the photoproduct, however, the counterion is identified as E292.

      These results lead to a computational model of Antho2a in which the chloride is modeled in addition to the Schiff base. This model is improved using the hybrid QM/MM simulations. As a validation, the absorption maximum is calculated using the QM/MM approach for the protonated and deprotonated E292 residue as well as the E292A mutant. The results are in good agreement with the experiment. However, there is a larger deviation for ADC(2) than for sTD-DFT. Nevertheless, the trend is robust since the wt and E292A mutant models have similar excitation energies. The calculations are performed at a high level of theory that includes a large QM region.

      Weaknesses:

      I have a couple of questions about this study:

      We thank the reviewer for providing critical comments, particularly on the QM/MM calculations. We have carefully considered all comments and have addressed them as detailed below. Corresponding revisions have been made to the manuscript.

      (1) I find it suspicious that the absorption maximum is so close to that of rhodopsin when the counterion is very different. Is it possible that the chloride creates an environment for the deprotonated E292, which is the actual counterion?

      We think it is unlikely that the chloride ion merely facilitates deprotonation of Glu292 in such a way that it acts as the counterion of the dark state Antho2a. This conclusion is based on two results from our study. (1) λ<sub>max</sub> of wild type Antho2a in the dark is positively correlated with the ionic radius of the halide in the solution; the λ<sub>max</sub> is red shifted in the order Cl- < Br- < I- (Fig. 2E and F in the revised manuscript). This tendency is observed when the halide anion acts as a counterion of the protonated Schiff base (Blatz et al. Biochemistry 11: 848–855, 1972). (2) The QM/MM models of the dark state of Antho2a show that the calculated λ<sub>max</sub> of Antho2a with a protonated (neutral) Glu292 is much closer to the experimentally observed λ<sub>max</sub> than with a deprotonated (negatively charged) Glu292 (Fig. 4), suggesting that the Glu292 is likely to be protonated even in the presence of chloride ion. Therefore, we conclude that a solute anion, and not Glu292, acts as the counterion of the protonated Schiff base in the dark state of Antho2a. We have discussed this in the revised manuscript as follows:

      Lines 274 – 291

      “We found that the type of halide anions in the solution has a small but noticeable effect on the λ<sub>max</sub> values of the dark state of Antho2a. This is consistent with the effect observed in a counterion-less mutant of bovine rhodopsin, in which halide ions serve as surrogate counterions (Nathans, 1990; Sakmar et al., 1991). Similarly, our results align with earlier observations that the λ<sub>max</sub> of a retinylidene Schiff base in solution increases with the ionic radius of halides acting as hydrogen bond acceptors (i.e., I− > Br− > Cl−) (Blatz et al., 1972). In contrast, the λ<sub>max</sub> of halorhodopsin from Natronobacterium pharaonic does not clearly correlate with halide ionic radius (Scharf and Engelhard, 1994), as the halide ion in this case is not a hydrogen-bonding acceptor of the protonated Schiff base (Kouyama et al., 2010; Mizuno et al., 2018). Altogether, these findings support our hypothesis that in Antho2a, a solute halide ion forms a hydrogen bond with the Schiff base, thereby serving as the counterion in the dark state. Moreover, QM/MM calculations for the dark state of Antho2a suggest that Glu292 is protonated and neutral, further supporting the hypothesis that Glu292 does not serve as the counterion in the dark state. However, unlike dark state, Cl− has little to no effect on the visible light absorption of the photoproduct (Fig. S5). Therefore, we conclude that Cl− and Glu292, respectively, act as counterions for the protonated Schiff base of the dark state and photoproduct of Antho2a. This represents a unique example of counterion switching from exogeneous anion to a specific amino acid residue upon light irradiation (Fig. S10).”

      (2) The computational protocol states that water molecules have been added to the predicted protein structure. Are there water molecules next to the Schiff base, E292, and Cl-? If so, where are they located in the QM region?

      We have updated Fig. 4 to show amino acids and water molecules near the Schiff base, E292, and the chloride ion. These include Ser186, Tyr91, Ala94, Leu113, Cys187, and two water molecules coordinating the chloride ion. We have added following text in the “Structural modelling and QM/MM calculations of the dark state of Antho2a” section of the revised manuscript.

      Lines 220 – 223

      “The chloride ion is also coordinated by two water molecules and the backbone of Cys187 which is part of a conserved disulfide bridge (Fig. S2). The retinylidene Schiff base region also includes polar (Ser186, Tyr91) and non-polar (Ala94, Leu113) residues (Fig. 4).”

      Water molecules, which have been modelled by homology to other GPCR structures, were not included in the QM region. In the revised version of the manuscript, we clarify this point in the “Computational modelling and QM/MM calculations” section as follows.

      Lines 515 – 517

      “The retinal-binding pocket also contains predicted water molecules (modelled based on homologous GPCR structures) close to the Schiff base and the chloride ion which were not included in the QM region.”

      (3) If the E292 residue is the counterion in the photoproduct state, I would expect the retinal Schiff base to rotate toward this side chain upon isomerization. Can this be modeled based on the recent XFEL results on rhodopsin?

      The recent XFEL studies of rhodopsin reveal that at very early stages (1 ps after photoactivation), structural changes in retinal are limited primarily to the isomerization around the C11=C12 bond of the polyene chain, without significant rotation of the Schiff base.

      Although modelling of a later active state with planar retinal and a rotated Schiff base is feasible—e.g., guided by high-resolution structures of bovine rhodopsin’s Meta II state such as PDB ID: 3PQR, see Author response image 1 below—active states of GPCRs typically exhibit substantial conformational flexibility and heterogeneity, making the generation of precise structural models suitable for accurate QM/MM calculations challenging. Despite these uncertainties, this preliminary modelling does indicate that upon isomerization to the all-trans configuration, the retinal Schiff base would rotate towards E292, supporting our hypothesis that E292 serves as the counterion in the Antho2a photoproduct. This is now shown better in the revised Fig. S10.

      Author response image 1.

      Reviewer #3 (Public review):

      Summary:

      The paper by Saito et al. studies the properties of anthozoan-specific opsins (ASO-II) from organisms found in reef-building coral. Their goal was to test if ASO-II opsins can absorb visible light, and if so, what the key factors involved are.

      The most exciting aspect of this work is their discovery that ASO-II opsins do not have a counterion residue (Asp or Glu) located at any of the previously known sites found in other animal opsins.

      This is very surprising. Opsins are only able to absorb visible (long wavelength light) if the retinal Schiff base is protonated, and the latter requires (as the name implies) a "counter ion". However, the authors clearly show that some ASO-II opsins do absorb visible light.

      To address this conundrum, they tested if the counterion could be provided by exogenous chloride ions (Cl-). Their results find compelling evidence supporting this idea, and their studies of ASO-II mutant E292A suggest E292 also plays a role in G protein activation and is a counterion for a protonated Schiff base in the light-activated form.

      Strengths:

      Overall, the methods are well-described and carefully executed, and the results are very compelling.

      Their analysis of seven ASO-II opsin sequences undoubtedly shows they all lack a Glu or Asp residue at "normal" (previously established) counter-ion sites in mammalian opsins (typically found at positions 94, 113, or 181). The experimental studies clearly demonstrate the necessity of Cl- for visible light absorbance, as do their studies of the effect of altering the pH.

      Importantly, the authors also carried out careful QM/MM computational analysis (and corresponding calculation of the expected absorbance effects), thus providing compelling support for the Cl- acting directly as a counterion to the protonated retinal Schiff base, and thus limiting the possibility that the Cl- is simply altering the absorbance of ASO-II opsins through some indirect effect on the protein.

      Altogether, the authors achieved their aims, and the results support their conclusions. The manuscript is carefully written, and refreshingly, the results and conclusions are not overstated.

      This study is impactful for several reasons. There is increasing interest in optogenetic tools, especially those that leverage G protein-coupled receptor systems. Thus, the authors' demonstration that ASO-II opsins could be useful for such studies is of interest.

      Moreover, the finding that visible light absorbance by an opsin does not absolutely require a negatively charged amino acid to be placed at one of the expected sites (94, 113, or 181) typically found in animal opsins is very intriguing and will help future protein engineering efforts. The argument that the Cl- counterion system they discover here might have been a preliminary step in the evolution of amino acid based counterions used in animal opsins is also interesting.

      Finally, given the ongoing degradation of coral reefs worldwide, the focus on these curious opsins is very timely, as is the authors' proposal that the lower Schiff base pKa they discovered here for ASO-II opsins may cause them to change their spectral sensitivity and G protein activation due to changes in their environmental pH.

      We thank the reviewer for the comprehensive summary of the manuscript and for finding it well-described and impactful.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      (1) p. 5, l. 102: The authors obtained three absorption spectra out of seven. Did the authors examine the reasons for no absorption spectra for the remaining four proteins?

      We have not identified the reasons for the absence of detectable absorption spectra for the remaining four opsins. We speculate that this could result from poor retinal binding under detergent-solubilized conditions, but we have not directly tested this possibility.

      (2) p. 7, l. 141: The pH value is 7.5 in the text and 7.4 in Figure S4B.

      We thank the reviewer for finding this mistake. The correct value is 7.4 and we have revised the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      The structures and the simulations should be made available to the reader by providing them in a repository.

      We have deposited the Antho2a models in Zenodo (https://zenodo.org/; an open-access repository for research data). We have added the following description in the “Data and materials availability” section of the revised manuscript.

      Lines 559 – 560

      “The structural models of wild type Antho2a with a neutral or charged Glu292 and the Antho2a E292A mutant are available in Zenodo (10.5281/zenodo.15064942).”

      Reviewer #3 (Recommendations for the authors):

      (1) In the homology models for the ASO-II opsins, are there any other possible residues that could act as counter-ion residues outside of the "normal" positions at 94, 113, or 181?

      We have updated Fig. 4 to show all residues near the retinylidene Schiff base region, which include Cl−, Glu292, Ser186, Tyr91, Ala94, Leu113, Cys187, and two water molecules.

      Apart from Cl− and Glu292, the homology models of the ASO-II opsins do not reveal any other candidate as the counterion of Schiff base. This is also suggested by the sequence alignment between opsins of the ASO-II group and other animal opsins in Fig. S2, where we show amino acid residues near the Schiff base (in addition to key motifs important for G protein activation).

      (2) It is mentioned that the ASO-II opsins do not appear to be bistable opsins in detergents - do these opsins show any ability to photo-switch back and forth when in cellular membranes?

      We have not directly tested whether Antho2a exhibits photo-switching in cellular membranes due to technical limitations associated with high light scattering in spectroscopic measurements. Instead, we recorded absorption spectra from crude extracts of detergent-solubilized cell membranes expressing Antho2a wild type (without purification) in the dark and after sequential light irradiation (Fig. S3C). This approach, which retains cellular lipids, can better preserve the photochemical properties of opsins, such as thermal stability and photoreactivity of their photoproducts, similar to intact cellular membranes. The first irradiation with green light (500 nm) led to a decrease in absorbance around the 550 nm region and an increase around the 450 nm region, indicating the formation of a photoproduct, consistent with observations using purified Antho2a.

      However, subsequent irradiation with violet light (420 nm) did not reverse these spectral changes but resulted in only a slight decrease in absorbance around 400 nm. Re-exposure to green light produced no further spectral changes aside from baseline distortions. These findings suggest that the Antho2a photoproduct has limited ability to revert to its original dark state under these conditions. Nevertheless, because detergent solubilization may influence these observations, further studies in intact cellular membranes using live-cell assay will be required to conclusively assess bistability or photo-switching properties.

      (3) The idea that E292 acts as a counterion for the protonated active state is intriguing - do the authors think the retinal decay process after light activation occurs with hydrolysis of the non-protonated form with subsequent retinal release?

      We thank the reviewer for raising this important question. We first examined whether the increased UV absorbance observed after incubating the photoproduct for 20 hours in the dark (Fig. S3D, E, violet curves) originated from free retinal released from the opsin pigment. Acid denaturation (performed at pH 1.9) of this photoproduct resulted in a main product absorbing around 400 nm (Fig. S3G). Typically, when retinal binds opsin via the Schiff base (whether protonated or deprotonated), acid denaturation traps the retinal chromophore as a protonated Schiff base, yielding an absorption spectrum with a λ<sub>max</sub> at approximately 440 nm, as observed in the dark state of Antho2a (Fig. S3F). Our results thus indicate that the UV absorbance in the photoproduct did not result from a deprotonated Schiff base but rather from retinal released during incubation. We have not directly tested whether the protonated or deprotonated form is more prone to retinal release. However, the decay of visible absorbance (associated with the protonated photoproduct) occurred more rapidly under alkaline conditions (pH 8.0), which generally favors deprotonation of the Schiff base (Fig. S3H). Thus, it is possible that the deprotonated photoproduct releases retinal more rapidly than the protonated form, but further studies are necessary to confirm this hypothesis.

      To answer the comments (2) and (3) by the reviewer, we have added new panels (C and F–H) to Fig. S3.

      We have revised the Results section as follows:

      Lines 136 – 141

      “The photoproduct remained stable for at least 5 minutes (Fig. S3A, curves 2 and 3) but did not revert to the original dark state upon subsequent irradiation (Fig. S3A and C). Instead, it underwent gradual decay accompanied by retinal release over time (Fig. S3D–G). These findings indicate that purified Antho2a is neither strictly bleach resistant nor bistable (see also Fig. S3 legend). We also observed that the protonated photoproduct decayed more rapidly at pH 8.0 (Fig. S3H) than at pH 6.5 (Fig. 3A, D, E).”

      Text:

      (4) Page 3, line 38. Consider defining eumetazoan (for lay readers).

      As suggested, we have defined eumetazoans and revised the sentence as follows:

      Lines 38 – 40

      “Opsins are present in the genomes of all eumetazoans (i.e., all animal lineages except sponges), and based on their phylogenetic relationships, they can be classified into eight groups…”

      (5) Page 3, line 42. "But, furthermore, ..." should be changed to either word alone.

      Revised as suggested.

      (6) Page 18, line 447. The HPLC method is well-described and helpful. If possible, please add a Reference, or indicate if this is a new variation of the method.

      This is a well-established method for analyzing the composition of retinal isomers bound to different states of rhodopsin pigments. We have now cited a reference describing the methodology (Terakita et al. Vision Res. 6: 639–652, 1989).

      (7) Page 11, line 267. "..type of halide anions in the solution affected the λ<sub>max</sub> values of the dark state of".

      Since the changes are not large (but clearly occur), consider changing this sentence to "..type of halide anions in the solution has a small but visible effect on the λ<sub>max</sub> values of the dark state ..."

      We have revised this sentence as suggested.

      Figures:

      (9) Consider combining Figure FS6 with Figure 2 (effect of anions on visible absorbance).

      As suggested, the previous Fig. S6 has been included in the main text as Fig. 2E and F in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full-length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non-canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. The methodology is of general use for other proteins. The authors also show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state, relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

      Reviewer #2 (Public review):

      The manuscript "Domain Coupling in Allosteric Regulation of SthK Measured Using Time-Resolved Transition Metal Ion FRET" by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

      The results summarize known conformational transitions of the C-helix and provide distance distributions that agree with predicted values based on available structures. The authors first validated their TCSPC approach using the isolated CNBD construct previously employed for similar experiments. They then study the more complex fulllength SthK channel protein. The findings agree with earlier results from this group, demonstrating that the C-helix is more mobile in the closed state than static structures reflect. Upon adding the activating ligand cAMP, the C-helix moves closer to the bound ligand, as indicated by a reduced fluorescence lifetime, suggesting a shorter distance between the donor and acceptor. The observed effects depend on the cAMP concentration, with affinities comparable to functional measurements. Interestingly, a substantial amount of CNBDs appear to be in the activated state even in the absence of cAMP (Figure 6E and F, fA2 ~ 0.4).

      This may be attributed to cooperativity among the CNBDs, which the authors could elaborate on further. In this context, the major limitation of this study is that distance distributions are observed only in one domain. While inter-subunit FRET is detected and accounted for, the results focus exclusively on movements within one domain. Thus, the resulting energetic considerations must be assessed with caution. In the absence of the activator, the closed state is favored, while the presence of cAMP favors the open state. This quantifies the standard assumption; otherwise, an activator would not effectively activate the channel. However, the numerical values of approximately 3 kcal/mol are limited by the fact that only one domain is observed in the experiment, and only one distance (C- helix relative to the CNBD) is probed. Additional conformational changes leading to pore opening (including rotation and upward movement of the CNBD, and radial dilation of the tetrameric assembly) are not captured by the current experiments. These limitations should be taken into account when interpreting the results.

      We agree that these are important limitations to consider in interpreting our results. These limitations and future directions are now largely covered in our discussion. We believe measurements in individual domains provide unique insights into the contributions of different parts of the protein and future work will continue to address conformational energetics in other parts of the protein and subunit cooperativity. 

      Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel.

      The experiments were performed on an isolated C-terminal nucleotide binding domain

      (CNBD) and on a purified full-length channel, with FRET partners placed at two

      positions in the CNBD.

      Strengths:

      The data and quantitative analysis are exemplary, and they provide a roadmap for use of this powerful approach in other proteins.

      Weaknesses/Comments:

      A ~3x lower Kd for nucleotide is seen for the detergent-solubilized full-length channel, compared to electrophysiological experiments. This is worth a comment in the Discussion, particularly in the context of the effect of the pore domain on the CNBD energetics.

      We are cautious to interpret our K<sub>D</sub> values given the high affinity for cAMP and the challenges of accurately determining the total protein concentrations in our experiments. We now state this explicitly in the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript is very well written and clear. Congrats to the authors.

      Minor comment: In "Measuring tmFRET in Full-Length SthK", 3rd paragraph: "... FRET model with both intersubunit and intersubunit FRET." Should read "intersubunit and intrasubunit".

      Thank you for the comment, this is now corrected.  

      Reviewer #2 (Recommendations for the authors):

      Overall, the manuscript is well-written and clearly explained. However, I recommend that the authors discuss the limitations more critically.

      The revised manuscript now largely addresses these limitations. Additional comments are addressed in short below:  

      A) Only one distance is measured.

      We believe validating a single distance as an important first step in determining the use of this technique and beginning to quantify the allosteric mechanism in SthK. Future studies aim to make additional measurements.

      B) Measurements are confined to a single domain in the cooperative tetrameric assembly.

      Isolating conformational changes in individual domains, allows us to determine how different parts of the protein contribute to the activation upon ligand binding.  

      C) The change in distance upon activation mirrors what is observed in the closed state, which casts doubt on whether these conformational changes actually lead to channel opening or merely reflect the upward swinging of the C-helix that contributes to coordinating cAMP in the binding pocket.

      Future studies aim to detect conformational changes in the pore and other parts of the protein.

      D) Rigid body movements, rotations, and dilations are not captured by the measurements. 

      Our measurements combine energetic information with some, although more limited, structural information.   

      E) Cooperativity is not considered in the interpretation of the results.

      It is currently unclear where in SthK cooperativity arises upon ligand activation (ie. at the level of the CNBD, C-Linker or pore). Our results do not provide evidence of cooperativity in the CNBD upon ligand binding. 

      Additionally, the authors directly correlate their results with the functional states of SthK previously reported, but it remains open whether the modified protein for tmFRET behaves similarly to WT SthK. Functional experiments with the protein used for tmFRET, which demonstrate comparable open probabilities and cAMP potency, would considerably strengthen the manuscript.

      Further optimization is needed to express the full-length protein used in tmFRET experiments in spheroplasts to enable electrophysiological recordings from these constructs. 

      Reviewer #3 (Recommendations for the authors):

      In the final paragraph of the Discussion, the sentence "In our experiments, we assumed that deleting the pore and transmembrane domains eliminates the coupling of these regions to the CNBD" seems trivial. Perhaps it would help to add "simply" before eliminates?

      We have taken the advice and added ‘simply’ in this sentence.  

      Can a statement be made about the magnitude of the effect in the C-terminal deletion experiments in refs 27-29?

      Due to the different channels used in the C-terminal deletion experiments in refs 27-29 (HCN1 and spHCN), compared to the channel we used (SthK), it is challenging to compare the magnitude of energetic changes between these studies. Additionally, the HCN experiments measured changes in the pore domain, compared to the conformational changes in the CNBD domain measured here.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.

      Strengths:

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals.

      Weaknesses:

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1.

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.

      Reviewer #2 (Public review):

      Summary:

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact.

      Strengths:

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented.

      Weaknesses:

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success.

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

      Recommendations for the authors:

      Editorial note:

      In the discussion, the reviewers agreed that the present manuscript does not make a sufficient independent contribution and so would be more profitably combined with the companion manuscript. Both reviewers noted that there was not much insight that relied on the single figure. Since neither manuscript is long, and they have overlapping authors (including the same first and last authors), this should not be a difficult merger to achieve.

      Thank you for the recommendation to merge. We have combined both manuscripts into one in this version.

      Reviewer #1 (Recommendations for the authors):

      The jargon of the grant programs could be described as a nightmare. Wellcome is spelled wrong.

      We have attempted to limit the use of jargon and to define acronyms in this version. We have corrected the spelling of Wellcome.

      Reviewer #2 (Recommendations for the authors):

      I suggest that the two manuscripts be combined into a single paper. Although the other manuscript could stand on its own, this one does not.

      The idea of culture change surrounding teams is useful but really forms more of a policy- focused opinion piece than a quantitative analysis of funding impact.

      If the authors insist on keeping these separate, it is critical to remove the team data from the other manuscript.

      We have combined both manuscripts and decided to retain the description of culture change but have edited and condensed this section and will use the supplemental report for qualitative assessments.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Authors' experimental designs have some caveats to definitely support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs (an average of 300,000 up to 500,000 cells per mouse; Mitchell et al., Nature Cell Biology, 2023) can faithfully represent old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Fig. 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture. 

      We sincerely appreciate your insightful comment regarding the existence of approximately 500,000 HSCs per mouse in older mice. To address this, we have conducted a statistical analysis to determine the appropriate sample size needed to estimate the characteristics of a population of 500,000 cells with a 95% confidence level and a ±5% margin of error. This calculation was performed using the finite population correction applied to Cochran’s formula.

      For our calculations, we used a proportion of 50% (p = 0.5), as it has been reported that approximately 50% of HSCs are myeloid-biased1,2. The formula used is as follows:

      N \= 500,000 (total population size)

      Z = 1.96 (Z-score for a 95% confidence level)

      p = 0.5 (expected proportion)

      e \= 0.05 (margin of error)

      Applying this formula, we determined that the required sample size is approximately 384 cells. This sample size ensures that the observed proportion in the sample will reflect the characteristics of the entire population. In our study, we have conducted functional experiments across Figures 2, 3, 5, 6, S3, and S6, with a total sample size of n = 126, which corresponds to over 1260 cells. While it would be ideal to analyze all 500,000 cells, this would necessitate the use of 50,000 recipient mice, which is not feasible. We believe that the number of cells analyzed is reasonable from a statistical standpoint. 

      References

      (1) Dykstra, Brad et al. “Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells.” The Journal of experimental medicine vol. 208,13 (2011): 2691-703. doi:10.1084/jem.20111490

      (2) Beerman, Isabel et al. “Functionally distinct hematopoietic stem cells modulate hematopoietic lineage potential during aging by a mechanism of clonal expansion.” Proceedings of the National Academy of Sciences of the United States of America vol. 107,12 (2010): 5465-70. doi:10.1073/pnas.1000834107

      (2) Authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LTHSCs and ST-HSCs by their gating scheme (Fig. 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Fig. 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since STHSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggest that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. Authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset. 

      Thank you for your thoughtful feedback regarding the lack of myeloid or lymphoid gene set enrichment in aged LT-HSCs and aged ST-HSCs, despite the observed tendency for myeloid-related gene enrichment in aged bulk HSCs.

      First, we acknowledge that the GSEA results vary among the different myeloid gene sets analyzed (Fig. 4, D–F; Fig. S4, C–D). Additionally, a comprehensive analysis of mouse HSC aging using multiple RNA-seq datasets reported that nearly 80% of differentially expressed genes show poor reproducibility across datasets[1]. These factors highlight the challenges of interpreting lineage bias in HSCs based solely on previously published transcriptomic data.

      Given these points, we believe that emphasizing functional experimental results is more critical than incorporating an additional dataset to support our claim. In this regard, we have confirmed that young and aged LT-HSCs have similar differentiation capacity (Figure 3), while myeloid-biased hematopoiesis is observed in aged bulk HSCs (Figure S3). These findings are further corroborated by independent functional experiments. We sincerely appreciate your insightful comments.

      Reference

      (1) Flohr Svendsen, Arthur et al. “A comprehensive transcriptome signature of murine hematopoietic stem cell aging.” Blood vol. 138,6 (2021): 439-451. doi:10.1182/blood.2020009729

      (3) Although authors could not find any molecular evidence for myeloid-biased hematopoiesis from old HSCs (either LT or ST), they argued that the ratio between LT-HSC and ST-HSC causes myeloid-biased hematopoiesis upon aging based on young HSC experiments (Fig. 6). However, old ST-HSC functional data showed that they barely contribute to blood production unlike young Hoxb5- HSCs (ST-HSC) in the transplantation setting (Fig. 2). Is there any evidence that in unperturbed native old hematopoiesis, old Hoxb5- HSCs (ST-HSC) still contribute to blood production?

      If so, what are their lineage potential/output? Without this information, it is hard to argue that the different ratio causes myeloid-biased hematopoiesis in aging context. 

      Thank you for the insightful and important question. The post-transplant chimerism of ST-HSCs was low in Fig. 2, indicating that transplantation induced a short-term loss of hematopoietic potential due to hematopoietic stress per cell. 

      To reduce this stress, we increased the number of HSCs in transplantation setting. In Fig. S6, old LT-HSCs and old ST-HSCs were transplanted in a 50:50 or 20:80 ratio, respectively. As shown in Fig. S6.D, the 20:80 group, which had a higher proportion of old ST-HSCs, exhibited a statistically significant increase in the lymphoid percentage in the peripheral blood post-transplantation. 

      These findings suggest that old ST-HSCs contribute to blood production following transplantation. 

      Reviewer #2 (Public review):

      While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Fig 3; Fig 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section. 

      Response #2-1:

      Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows:

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 ± 8.9 vs. 42.1 ± 35.5%, p = 0.01), even though n = 10.

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high selfrenewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3.

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4±31.5% vs 47.4±39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased.

      Regarding Figure 6, we obtained a statistically significant difference and consider the sample size to be sufficient. In addition, we have performed various functional experiments (Figures 2, 5, 6 and S6), and have obtained consistent results that expansion of myeloid biased HSCs does not occur with aging in Hoxb5+HSCs fraction. Based on the above, we conclude that the LT-HSC fraction does not differ in myeloid differentiation potential with aging.

      As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided. 

      Response #2-2:

      Thank you for the comments. As the reviewer pointed out, we hope we could reconfirm our results using single-cell level technology in the future.

      On the other hand, we have reported that the ratio of myeloid to lymphoid cells in the peripheral blood changes when the number of HSCs transplanted, or the number of supporting cells transplanted with HSCs, is varied[1-2]. Therefore, single-cell transplant data need to be interpreted very carefully to determine differentiation potential.

      From this viewpoint, future experiments will combine the Hoxb5 reporter system with a lineage tracing system that can track HSCs at the single-cell level over time. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. We have reflected this comment by adding the following sentences in the manuscript.

      [P19, L451] “In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system[3-4]. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells.” 

      It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation. 

      Response #2-3:

      Thank you for your comment. We apologize for the insufficient explanation. Our data, as shown in Figures 3 and 4, demonstrate that the differentiation potential of LT-HSCs remains unchanged with age. Therefore, rather than suggesting that an increase in LT-HSCs with a consistent differentiation capacity leads to myeloidbiased hematopoiesis, it seems more accurate to highlight that the relative decrease in the proportion of ST-HSCs, which remain in peripheral blood as lymphocytes, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, if we focus on the increase in the ratio of LT-HSCs, it is also plausible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Based on my understanding of the presented data, the authors argue that myeloidbiased HSCs do not exist, as 

      a) they detect no difference between young/aged HSCs after transplant (mind low nnumbers and large std!!!); b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HSCs in competitive transplants (mind low n-numbers and large std!!!). 

      However, given the low n-numbers and high variance of the results, the argument seems weak and the presented data does not support the claims sufficiently. That the number of downstream progenitors does not change could be explained by other mechanisms, for instance, the frequently reported differentiation short-cuts of HSCs and/or changes in the microenvironment. 

      Response #2-4:

      We appreciate the comments. As mentioned above, we will correct the manuscript regarding the sample size. Regarding the interpreting of the lack of increase in the percentage of myeloid progenitor cells in the bone marrow with age, it is instead possible that various confounding factors, such as differentiation shortcuts or changes in the microenvironment, are involved.

      However, even when aged LT-HSCs and young LT-HSCs are transplanted into the same recipient mice, the timing of the appearance of different cell fractions in peripheral blood is similar (Figure 3 of this paper). Therefore, we have not obtained data suggesting that clear shortcuts exist in the differentiation process of aged HSCs into neutrophils or monocytes. Additionally, it is currently consensually accepted that myeloid cells, including neutrophils and monocytes, differentiate from GMPs[1]. Since there is no changes in the proportion of GMPs in the bone marrow with age, we concluded that the differentiation potential into myeloid cells remains consistent with aging.

      "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Fig. 3, B and C)." 

      [Comment to the authors]: Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity. 

      Response #2-5:

      Thank you for providing these insights. Regarding the sample size, we have addressed this in Response #2-1.

      Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones." 

      Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs? t 

      Response #2-6:

      Thank you for pointing this out. We apologize for the insufficient explanation. We will explain using Figure 8 from the paper.

      First, our data show that LT-HSCs maintain their differentiation capacity with age, while ST-HSCs lose their self-renewal capacity earlier, so that only long-lived memory lymphocytes remain in the peripheral blood after the loss of selfrenewal capacity in ST-HSCs (Figure 8, upper panel). In mouse bone marrow, the proportion of LT-HSCs increases with age, while the proportion of ST-HSCs relatively decreases (Figure 8, lower panel and Figure S5). 

      Our data show that merely reproducing the ratio of LT-HSCs to ST-HSCs observed in aged mice using young LT-HSCs and ST-HSCs can replicate myeloidbiased hematopoiesis. This suggests that the increase in LT-HSC and the relative decrease in ST-HSC within the HSC compartment with aging are likely to contribute to myeloid-biased hematopoiesis.

      As mentioned earlier, since the differentiation capacity of LT-HSCs remain unchaged with age, it seems more accurate to describe that the relative decrease in the proportion of ST-HSCs, which retain long-lived memory lymphocytes in peripheral blood, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, focusing on the increase in the proportion of LT-HSCs, it is also possible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Recommendations for the authors: 

      Reviewer #2 (Recommendations for the authors):

      Summary: 

      Comment #2-1: While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Figure 3; Figure 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors, need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section. 

      Response #2-1

      Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows: 

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 {plus minus} 8.9 vs. 42.1 {plus minus} 35.5%, p = 0.01), even though n = 10. 

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high selfrenewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3. 

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4{plus minus}31.5% vs 47.4{plus minus}39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased. 

      Regarding Figure 6, we obtained a statistically significant difference and consider the sample size to be sufficient. In addition, we have performed various functional experiments (Figures 2, 5, 6 and S6), and have obtained consistent results that expansion of myeloid-biased HSCs does not occur with aging in Hoxb5+HSCs fraction. Based on the above, we conclude that the LT-HSC fraction does not differ in myeloid differentiation potential with aging. 

      [Comment for authors]  

      Paradigm-shifting extraordinary claims require extraordinary data. Unfortunately, the authors do not provide additional data to further support their claims. Instead, the authors argue the following: Because they were able to find significant differences between experimental groups in some experiments, the absence of significant differences in the results of other experiments must be correct, too. 

      This logic is in my view flawed. Any assay/experiment with highly variable data has a very low sensitivity to detect significant differences between groups. If, as in this case, the variance is as large as the entire dynamic range of the readout, it becomes impossible to be able to detect any difference. In these cases, it is not surprising and actually expected that the mean of the group is located close to the center of the dynamic range as is the case here (center of dynamic range: 50%). In other words, this means that the experiments are simply not reproducible. It is absolutely critical to remember that any experiment and its associated statistical analysis has 3 (!!!) instead of 2 possible outcomes: 

      (1) There is a statistically significant difference 

      (2) There is no statistically significant difference 

      (3) The results of the experiment are inconclusive because the replicates are too variable and the results are not reproducible.  

      While most of us are inclined to think about outcomes (1) or (2), outcome (3) cannot be neglected. While it might be painful to accept, the only way to address concerns about data reproducibility is to provide additional data, improve reproducibility, and lower the power of the analysis to an acceptable level (e.g. able to detect difference of 5-10% between groups). 

      Without going into the technical details, the example graph from the link below illustrates that with a power 0.319 as stated by the authors, approx. 25 transplants, instead of 8, would be required. 

      Typically, however, a power of 0.8 is a reasonable value for any power analysis (although it's not a very strong power either). Even if we are optimistic and assume that there might be a reasonably large difference between experimental groups (in the example above P2 = 0.6, which is actually not that large) we can estimate that we would need over 10 transplants per group to say with confidence that two experimental groups likely do not differ. With smaller differences, these numbers increase quickly to 20+ transplants per group as can be seen in the example graph using an Alpha of 0.1 above. 

      Further reading can be found here and in many textbooks or other online resources: https://power-analysis.com/effect_size.htm  https://tss.awf.poznan.pl/pdf-188978-110207? filename=Using%20power%20analysis%20to.pdf 

      Response:

      Thank you for your feedback. We fully agree with the reviewer that paradigmshifting claims must be supported by equally robust data. It has been welldocumented that the frequency of myeloid-biased HSCs increases with age, with reports indicating that over 50% of the HSC compartment in aged mice consists of myeloid-biased HSCs[1,2]. Based on this, we believe that if aged LT-HSCs were substantially myeloid-biased, the difference should be readily detectable.

      To further validate our findings, we showed the similar preliminary experiment. The resulting data are shown below (n = 8). 

      Author response image 1.

      (A) Experimental design for competitive co-transplantation assay. Ten CD45.2<sup>+</sup> young LT-HSCs and ten CD45.2<sup>+</sup> aged LT-HSCs were transplanted with 2 × 10<sup>5</sup> CD45.1<sup>+</sup>/CD45.2<sup>+</sup> supporting cells into lethally irradiated CD45.1<sup>+</sup> recipient mice (n \= 8). (B) Lineage output of young or aged LT-HSCs at 4, 8, 12, 16 weeks after transplantation. Each bar represents an individual mouse. *P < 0.05. **P < 0.01.

      While a slight increase in myeloid-biased hematopoiesis was observed in the aged LT-HSC fraction, the difference was not statistically significant. These new results are presented alongside the original Figure 3, which was generated using a larger sample size (n = 16).

      Author response image 2.

      (A) Experimental design for competitive co-transplantation assay. Ten CD45.2<sup>+</sup> young LT-HSCs and ten CD45.2<sup>+</sup> aged LT-HSCs were transplanted with 2 × 10<sup>5</sup> CD45.1<sup>+</sup>/CD45.2<sup>+</sup> supporting cells into lethally irradiated CD45.1<sup>+</sup> recipient mice (n \= 16). (B) Lineage output of young or aged LT-HSCs at 4, 8, 12, 16 weeks after transplantation. Each bar represents an individual mouse. 

      Consistent with the original data, aged LT-HSCs exhibited a lineage output that was nearly identical to that of young LT-HSCs. Nonetheless, as the reviewer rightly pointed out, we cannot completely exclude the possibility that subtle differences may exist but remain undetected. To address this, we have added the following sentence to the manuscript:  

      [P9, L200] “These findings unmistakably demonstrated that mixed/bulk-HSCs showed myeloid skewed hematopoiesis in PB with aging. In contrast, LT-HSCs maintained a consistent lineage output throughout life, although subtle differences between aged and young LT-HSCs may exist and cannot be entirely ruled out.”

      References

      (1) Dykstra, Brad et al. “Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells.” The Journal of experimental medicine vol. 208,13 (2011): 2691-703. doi:10.1084/jem.20111490

      (2) Beerman, Isabel et al. “Functionally distinct hematopoietic stem cells modulate hematopoietic lineage potential during aging by a mechanism of clonal expansion.” Proceedings of the National Academy of Sciences of the United States of America vol. 107,12 (2010): 5465-70. doi:10.1073/pnas.1000834107

      Comment #2-3: It is also unclear why the authors believe that the observed reduction of STHSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation. 

      Response #2-3:  

      Thank you for your comment. We apologize for the insufficient explanation. Our data, as shown in Figures 3 and 4, demonstrate that the differentiation potential of LTHSCs remains unchanged with age. Therefore, rather than suggesting that an increase in LT-HSCs with a consistent differentiation capacity leads to myeloid biased hematopoiesis, it seems more accurate to highlight that the relative decrease in the proportion of ST-HSCs, which remain in peripheral blood as lymphocytes, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis. However, if we focus on the increase in the ratio of LT-HSCs, it is also plausible to explain that "with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis." 

      [Comment for authors] 

      While this interpretation of the data might make sense the shown data do not exclude alternative explanations. The authors do not exclude the possibility that LTHSCs expand with age and that this expansion in combination with an aging microenvironment drives myeloid bias. The authors should quantify the frequency [%] and absolute number of LT-HSCs and ST-HSCs in young vs. aged animals. Especially analyzing the abs. numbers of cells will be important to support their claims as % can be affected by changes in the frequency of other populations. 

      Thank you for your very important point. As this reviewer pointed out, we do not exclude the possibility that the combination of aged microenvironment drives myeloid bias. Additionally, we acknowledge that myeloid-biased hematopoiesis with age is a complex process likely influenced by multiple factors. We would like to discuss the mechanism mentioned as a future research direction. Thank you for the insightful feedback. Regarding the point about the absolute cell numbers mentioned in the latter half of the paragraph, we will address this in detail in our subsequent response (Response #2-4).

      Comment #2-4: Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std!); b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSCs in myeloid output LTHSCs in competitive transplants (mind low n-numbers and large std!). However, given the low n-numbers and high variance of the results, the argument seems weak and the presented data does not support the claims sufficiently. That the number of downstream progenitors does not change could be explained by other mechanisms, for instance, the frequently reported differentiation short-cuts of HSCs and/or changes in the microenvironment. 

      Response #2-4:  

      We appreciate the comments. As mentioned above, we will correct the manuscript regarding the sample size. Regarding the interpreting of the lack of increase in the percentage of myeloid progenitor cells in the bone marrow with age, it is instead possible that various confounding factors, such as differentiation shortcuts or changes in the microenviroment, are involved. However, even when aged LT-HSCs and young LT-HSCs are transplanted into the same recipient mice, the timing of the appearance of different cell fractions in peripheral blood is similar (Figure 3 of this paper). Therefore, we have not obtained data suggesting that clear shortcuts exist in the differentiation process of aged HSCs into neutrophils or monocytes. Additionally, it is currently consensually accepted that myeloid cells, including neutrophils and monocytes, differentiate from GMPs1. Since there are no changes in the proportion of GMPs in the bone marrow with age, we concluded that the differentiation potential into myeloid cells remains consistent with aging. 

      Reference 

      (1) Akashi K and others, 'A Clonogenic Common Myeloid Progenitor That Gives Rise to All Myeloid Lineages', Nature, 404.6774 (2000), 193-97. 

      [Comment for authors] 

      As the relative frequency of cell population can be misleading, the authors should compare the absolute numbers of progenitors in young vs. aged mice to strengthen their argument. It would also be helpful to quantify the absolute numbers and relative frequencies in WT mice to exclude the possibility the HoxB5-trimcherry mouse model suffers from unexpected aging phenotypes and the hematopoietic system differs from wild-type animals.

      Thank you for your valuable feedback. We understand the importance of comparing the absolute numbers of progenitors in young versus aged mice to provide a more accurate representation of the changes in cell populations.

      Therefore, we quantified the absolute cell count of hematopoietic cells in the bone marrow using flow cytometry data. 

      Author response image 3.

      As previously reported, we observed a 10-fold increase in the number of pHSCs in aged mice compared to young mice. Additionally, our analysis revealed a statistically significant decrease in the number of Flk2+ progenitors and CLPs in aged mice. On the other hand, there was no statistically significant change in the number of myeloid progenitors between the two age groups. We appreciate the suggestion and hope that this additional information strengthens our argument and addresses your concerns.

      Comment #2-5:  

      "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Figure 3, B and C)." Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity. 

      Response #2-5:  

      Thank you for providing these insights. Regarding the sample size, we have addressed this in Response #2-1. 

      [Comment for authors]  

      As explained in detail in the response to #2-1 the provided arguments are not convincing. As the authors pointed out, the power of these experiments is too low to make strong claims. If the author does not intend to provide new data, the language of the manuscript needs to be adjusted to reflect this weakness. A paragraph discussing the limitations of the study mentioning the limited power of the data should be included beyond the above-mentioned rather vague statement that the data should be validated (which is almost always necessary anyway). 

      Thank you for your valuable comment. We agree with the importance of discussing potential limitations in our experimental design. In response to the reviewer’s suggestion, we have revised the manuscript to include the following sentences:

      [P19, L434] "In the co-transplantation assay shown in Figure 3, the myeloid lineage output derived from young and aged LT-HSCs was comparable (Young LT-HSC: 51.4 ± 31.5% vs. Aged LT-HSC: 47.4 ± 39.0%, p = 0.82). Although no significant difference was detected, the small sample size (n = 8) may limit the sensitivity of the assay to detect subtle myeloid-biased phenotypes."

      This addition acknowledges the potential limitations of our analysis and highlights the need for further investigation with larger cohorts.

      Comment #2-6:

      Line 293: "Based on these findings, we concluded that myeloid biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones." Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of STHSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

      Response #2-6:

      Thank you for pointing this out. We apologize for the insufficient explanation. We will explain using attached Figure 8 from the paper. First, our data show that LT-HSCs maintain their differentiation capacity with age, while ST-HSCs lose their self-renewal capacity earlier, so that only long-lived memory lymphocytes remain in the peripheral blood after the loss of self-renewal capacity in ST-HSCs (Figure 8, upper panel). In mouse bone marrow, the proportion of LT-HSCs increases with age, while the proportion of STHSCs relatively decreases (Figure 8, lower panel and Figure S5).

      Our data show that merely reproducing the ratio of LT-HSCs to ST-HSCs observed in aged mice using young LT-HSCs and ST-HSCs can replicate myeloid-biased hematopoiesis. This suggests that the increase in LT-HSC and the relative decrease in ST-HSC within the HSC compartment with aging are likely to contribute to myeloid-biased hematopoiesis.

      As mentioned earlier, since the differentiation capacity of LT-HSCs remain unchanged with age, it seems more accurate to describe that the relative decrease in the proportion of STHSCs, which retain long-lived memory lymphocytes in peripheral blood, leading to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis. However, focusing on the increase in the proportion of LT-HSCs, it is also possible to explain that "with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells become relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid biased hematopoiesis."

      [Comment for authors]

      While I can follow the logic of the argument, my concerns about the interpretation remain as I see discrepancies in other findings in the published literature. For instance, what the authors call ST-HSCs, differs from the classical functional definition of ST-HSCs. It is thus difficult to relate the described observations to previous reports. ST-HSCs typically can contribute significantly to multiple lineages for several weeks (see for example PMID: 29625072). It is somewhat surprising that the ST-HSC in this study don't show this potential and loose their potential much quicker.

      The authors should thus provide a more comprehensive depth of immunophenotypic and molecular characterization to compare their LT-HSCs to ST-HSCs. For instance, are LT-HSCs CD41- HSCs? How do ST-HSCs differ in their surface marker expression from previously used definitions of ST-HSCs? A list of differentially expressed genes between young and old LT-HSCs and ST-HSCs should be done and will likely provide important insights into the molecular programs/markers (beyond the provided GO analysis, which seems superficial).

      Thank you for your valuable feedback. As the reviewer noted, there are indeed multiple definitions of ST-HSCs. We appreciate the opportunity to clarify our definitions of ST-HSCs. We define ST-HSCs functionally, rather than by surface antigens, which we believe is the most classical and widely accepted definition [1]. In our study, we define long-term hematopoietic stem cells (LT-HSCs) as those HSCs that continue to contribute to hematopoiesis after a second transplantation and possess long-term self-renewal potential. Conversely, we define short-term hematopoietic stem cells (ST-HSCs) as those HSCs that do not contribute to hematopoiesis after a second transplantation and only exhibit self-renewal potential in the short term. 

      Next, in the paper referenced by the reviewer[2], the chimerism of each fraction of ST-HSCs also peaked at 4 weeks and then decreased to approximately 0.1% after 12 weeks post-transplantation. Author response image 5 illustrates our ST-HSC donor chimerism in Figure 2. We believe that data in the paper referenced by the reviewer2 is consistent with our own observations of the hematopoietic pattern following ST-HSC transplantation, indicating a characteristic loss of hematopoietic potential 4 weeks after the transplantation. Furthermore, as shown in Figures 2D and 2F, the fraction of ST-HSCs does not exhibit hematopoietic activity after the second transplantation. Therefore, we consider this fraction to be ST-HSCs.

      Author response image 4.

      Additionally, the RNAseq data presented in Figures 4 and S4 revealed that the GSEA results vary among the different myeloid gene sets analyzed (Fig. 4, D–F; Fig. S4, C–D). Moreover, a comprehensive analysis of mouse HSC aging using multiple RNA-seq datasets reported that nearly 80% of differentially expressed genes show poor reproducibility across datasets[3]. From the above, while RNAseq data is indeed helpful, we believe that emphasizing functional experimental results is more critical than incorporating an additional dataset to support our claim. Thank you once again for your insightful feedback.

      References

      (1) Kiel, Mark J et al. “SLAM family receptors distinguish hematopoietic stem and progenitor cells and reveal endothelial niches for stem cells.” Cell vol. 121,7 (2005): 1109-21. doi:10.1016/j.cell.2005.05.026

      (2) Yamamoto, Ryo et al. “Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment.” Cell stem cell vol. 22,4 (2018): 600-607.e4. doi:10.1016/j.stem.2018.03.013

      (3) Flohr Svendsen, Arthur et al. “A comprehensive transcriptome signature of murine hematopoietic stem cell aging.” Blood vol. 138,6 (2021): 439-451. doi:10.1182/blood.2020009729

      Reviewer #3 (Public review): 

      Although the topic is appropriate and the new model provides a new way to think about lineage-biased output observed in multiple hematopoietic contexts, some of the experimental design choices, as well as some of the conclusions drawn from the results could be substantially improved. Also, they do not propose any potential mechanism to explain this process, which reduces the potential impact and novelty of the study. 

      The authors have satisfactorily replied to some of my comments. However, there are multiple key aspects that still remain unresolved.

      Reviewer #3 (Recommendations for the authors): 

      Comment #3-1,2:  

      Although the additional details are much appreciated the core of my original comments remains unanswered. There are still no details about the irradiation dose for each particular experiment. Is any transplant performed using a 9.1 Gy dose? If yes, please indicate it in text or figure legend. If not, please remove this number from the corresponding method section. 

      Again, 9.5 Gy (split in two doses) is commonly reported as sublethal. The fact that the authors used a methodology that deviates from the "standard" for the field makes difficult to put these results in context with previous studies. It is not possible to know if the direct and indirect effects of this conditioning method in the hematopoietic system have any consequences in the presented results. 

      Thank you for your clarification. We confirm that none of the transplantation experiments described were performed using a 9.1 Gy irradiation dose. We have therefore removed the mention of "9.1 Gy" from the relevant section of the Materials and Methods. We appreciate helpful suggestion to improve the clarity of the manuscript.

      [P22, L493] “12-24 hours prior to transplantation, C57BL/6-Ly5.1 mice, or aged C57BL/6J recipient mice were lethally irradiated with single doses of 8.7 Gy.”

      Regarding the reviewer’s concern about the radiation dose used in our experiments, we will address this point in more detail in our subsequent response (see Response #3-4).

      Comment #3-4(Original): When representing the contribution to PB from transplanted cells, the authors show the % of each lineage within the donor-derived cells (Figures 3B-C, 5B, 6B-D, 7C-E, and S3 B-C). To have a better picture of total donor contribution, total PB and BM chimerism should be included for each transplantation assay. Also, for Figures 2C-D and Figures S2A-B, do the graphs represent 100% of the PB cells? Are there any radioresistant cells?

      Response #3-4 (Original): Thank you for highlighting this point. Indeed, donor contribution to total peripheral blood (PB) is important information. We have included the donor contribution data for each figure above mentioned.

      In Figure 2C-D and Figure S2A-B, the percentage of donor chimerism in PB was defined as the percentage of CD45.1-CD45.2+ cells among total CD45.1-CD45.2+ and CD45.1+CD45.2+ cells as described in method section.

      Comment for our #3-4 response:  

      Thanks for sharing these data. These graphs should be included in their corresponding figures along with donor contribution to BM. 

      Regarding Figure2 C-D, as currently shown, the graphs only account for CD45.1CD45.2+ (donor-derived) and CD45.1+CD45.2+ (supporting-derived). What is the percentage of CD45.1+CD45.2- (recipient-derived)? Since the irradiation regiment is atypical, including this information would help to know more about the effects of this conditioning method. 

      Thank you for your insightful comment regarding Figure 2C-D. To address the concern that the reviewer pointed out, we provide the kinetics of the percentage of CD45.1+CD45.2- (recipient-derived) in Author response image 7.

      Author response image 5.

      As the reviewer pointed out, we observed the persistence of recipient-derived cells, particularly in the secondary transplant. As noted, this suggests that our conditioning regimen may have been suboptimal. In response, we will include the donor chimerism analysis in the total cells and add the following statement in the study limitations section to acknowledge this point:

      [P19, L439] “Additionally, in this study, we purified LT-HSCs using the Hoxb5 reporter system and employed a moderate conditioning regimen (8.7 Gy). To have a better picture of total donor contribution, total PB chimerism are presented in Figure S7 and we cannot exclude the possibility that these factors may have influenced the results. Therefore, it would be ideal to validate our findings using alternative LT-HSC markers and different conditioning regimens.”

      Comment #3-5: For BM progenitor frequencies, the authors present the data as the frequency of cKit+ cells. This normalization might be misleading as changes in the proportion of cKit+ between the different experimental conditions could mask differences in these BM subpopulations. Representing this data as the frequency of BM single cells or as absolute numbers (e.g., per femur) would be valuable.

      Response #3-5:

      We appreciate the reviewer's comment on this point. 

      Firstly, as shown in Supplemental Figures S1B and S1C, we analyze the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in different panels. Therefore, normalization is required to assess the differentiation of HSCs from upstream to downstream.

      Additionally, the reason for normalizing by c-Kit+ is that the bone marrow analysis was performed after enrichment using the Anti-c-Kit antibody for both upstream and downstream fractions. Based on this, we calculated the progenitor populations as a frequency within the c-Kit positive cells. Next, the results of normalizing the whole bone marrow cells (live cells) are shown below. 

      Author response image 6.

      Similar to the results of normalizing c-Kit+ cells, myeloid progenitors remained unchanged, including a statistically significant decrease in CMP in aged mice. Additionally, there were no significant differences in CLP. In conclusion, similar results were obtained between the normalization with c-Kit and the normalization with whole bone marrow cells (live cells).

      However, as the reviewer pointed out, it is necessary to explain the reason for normalization with c-Kit. Therefore, we will add the following description.

      [P21, L502] For the combined analysis of the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in Figures 1B, we normalized by cKit+ cells because we performed a c-Kit enrichment for the bone marrow analysis.

      Comment for our #3-5 response:

      I understand that normalization is necessary to compare across different BM populations. However, the best way would be to normalize to single cells. As I mentioned in my original comment, normalizing to cKit+ cells could be misleading, as the proportion of cKit+ cells could be different across the experimental conditions. Further, enriching for cKit+ cells when analyzing BM subpopulation frequencies could introduce similar potential errors. The enrichment would depend on the level of expression of cKit for each of these population, what would alter the final quantification. Indeed, CLP are typically defined as cKit-med/low. Thus, cKit enrichment would not be a great method to analyze the frequency of these cells. 

      The graph in the authors' response to my comment, show similar trend to what is represented Figure 1B for some populations. However, there are multiple statistically significant changes that disappear in this new version. This supports my original concern and, in consequence, I would encourage to represent this data as the frequency of BM single cells or as absolute numbers (e.g., per femur). 

      Thank you for your thoughtful follow-up comment. In response to the reviewer’s suggestion, we will represent the data as the frequency among total BM single cells. These revised graphs have been incorporated into the updated Figure 7F and corresponding figure legend have been revised accordingly to accurately reflect these representations. We appreciate your valuable input, which has helped us improve the clarity and rigor of our data presentation.

      Comment #3-6: Regarding Figure 1B, the authors argue that if myeloid-biased HSC clones increase with age, they should see increased frequency of all components of the myeloid differentiation pathway (CMP, GMP, MEP). This would imply that their results (no changes or reduction in these myeloid subpopulations) suggest the absence of myeloid-biased HSC clones expansion with age. This reviewer believes that differentiation dynamics within the hematopoietic hierarchy can be more complex than a cascade of sequential and compartmentalized events (e.g., accelerated differentiation at the CMP level could cause exhaustion of this compartment and explain its reduction with age and why GMP and MEP are unchanged) and these conclusions should be considered more carefully.

      Response #3-6:

      We wish to thank the reviewer for this comment. We agree with that the differentiation pathway may not be a cascade of sequential events but could be influenced by various factors such as extrinsic factors.

      In Figure 1B, we hypothesized that there may be other mechanisms causing myeloid-biased hematopoiesis besides the age-related increase in myeloid-biased HSCs, given that the percentage of myeloid progenitor cells in the bone marrow did not change with age. However, we do not discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B. 

      Our newly proposed theories—that the differentiation capacity of LT-HSCs remains unchanged with age and that age-related myeloid-biased hematopoiesis is due to changes in the ratio of LT-HSCs to ST-HSCs—are based on functional experiment results. As the reviewer pointed out, to discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B, it is necessary to apply a system that can track HSC differentiation at single-cell level. The technology would clarify changes in the self-renewal capacity of individual HSCs and their differentiation into progenitor cells and peripheral blood cells. The authors believe that those single-cell technologies will be beneficial in understanding the differentiation of HSCs. Based on the above, the following statement has been added to the text.

      [P19, L440] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      Comment for our #3-6 response:

      Thanks for the response. My original comments referred to the statement "On the other hand, in contrast to what we anticipated, the frequency of GMP was stable, and the percentage of CMP actually decreased significantly with age, defying our prediction that the frequency of components of the myeloid differentiation pathway, such as CMP, GMP, and MEP would increase in aged mice if myeloid-biased HSC clones increase with age (Fig. 1 B)" (lines #129-133). Again, the absence of an increase in CMP, GMP and MEP with age does not mean the absence of and increase in myeloid-biased HSC clones. This statement should be considered more carefully. 

      Thank you for the insightful comment. We agree that the absence of an increase in CMP, GMP and MEP with age does not mean the absence of an increase in myeloid-biased HSC clones. In our revised manuscript, we have refined the statement to acknowledge this nuance more clearly. The updated text now reads as follows:

      P6, L129] On the other hand, in contrast to what we anticipated, the frequency of GMP was stable, and the percentage of CMP actually decreased significantly with age, defying our prediction that the frequency of components of the myeloid differentiation pathway, such as CMP, GMP, and MEP may increase in aged mice, if myeloid-biased HSC clones increase with age. 

      Comment #3-7: Within the few recipients showing good donor engraftment in Figure 2C, there is a big proportion of T cells that are "amplified" upon secondary transplantation (Figure 2D). Is this expected?

      Response #3-7:

      We wish to express our deep appreciation to the reviewer for insightful comment on this point. As the reviewers pointed out, in Figure 2D, a few recipients show a very high percentage of T cells. The authors had the same question and considered this phenomenon as follows:

      (1) One reason for the very high percentage of T cells is that we used 1 x 107 whole bone marrow cells in the secondary transplantation. Consequently, the donor cells in the secondary transplantation contained more T-cell progenitor cells, leading to a greater increase in T cells compared to the primary transplantation.

      (2) We also consider that this phenomenon may be influenced by the reduced selfrenewal capacity of aged LT-HSCs, resulting in decreased sustained production of myeloid cells in the secondary recipient mice. As a result, long-lived memorytype lymphocytes may preferentially remain in the peripheral blood, increasing the percentage of T cells in the secondary recipient mice.

      We have discussed our hypothesis regarding this interesting phenomenon. To further clarify the characteristics of the increased T-cell count in the secondary recipient mice, we will analyze TCR clonality and diversity in the future.

      Comment for our #3-7 response:

      Thanks for the potential explanations to my question. This fact is not commonly reported in previous transplantation studies using aged HSCs. Could Hoxb5 label fraction of HSCs that is lymphoid/T-cell biased upon secondary transplantation? The number of recipients with high frequency of lymphoid cells in the peripheral blood (even from young mice) is remarkable. 

      Response:

      Thank you for your insightful suggestion. Based on this comment, we calculated the percentage of lymphoid cells in the donor fraction at 16 weeks following the secondary transplantation, which was 56.1 ± 25.8% (L/M = 1.27). According to the Müller-Sieburg criteria, lymphoid-biased hematopoiesis is defined as having an L/M ratio greater than 10. 

      Given our findings, we concluded that the Hoxb5-labeled fraction does not specifically indicate lymphoid-biased hematopoiesis. We sincerely appreciate the valuable input, which helped us to further clarify the interpretation of our results.

      Comment #3-8: Do the authors have any explanation for the high level of variabilitywithin the recipients of Hoxb5+ cells in Figure 2C?

      Response #3-8:

      We appreciate the reviewer's comment on this point. As noted in our previous report, transplantation of a sufficient number of HSCs results in stable donor chimerism, whereas a small number of HSCs leads to increased variability in donor chimerism1. Additionally, other studies have observed high variability when fewer than 10 HSCs are transplanted2-3. Based on this evidence, we consider that the transplantation of a small number of cells (10 cells) is the primary cause of the high level of variability observed.

      Comment for our #3-8 response:

      I agree that transplanting low number of HSC increases the mouse-to-mouse variability. For that reason, a larger cohort of recipients for this kind of experiment would be ideal. 

      Response:

      Thank you for the insightful comment. We agree that a larger cohort of recipients would be ideal for this type of experiment. In Figure 2, the difference between Hoxb5<suup>+</sup> and Hoxb5⁻ cells are robust, allowing for a clear statistical distinction despite the cohort size. However, we also recognize that a larger cohort would be necessary to detect more subtle differences, particularly in Figure 3. In response, we have added the following statement to the main text to acknowledge this limitation.

      P9, L200] These findings unmistakably demonstrated that mixed/bulk-HSCs showed myeloid skewed hematopoiesis in PB with aging. In contrast, LT-HSCs maintained a consistent lineage output throughout life, although subtle differences between aged and young LT-HSCs may exist and cannot be entirely ruled out.

      Comment #3-10: Is Figure 2G considering all primary recipients or only the ones that were used for secondary transplants? The second option would be a fairer comparison.

      Response #3-10:

      We appreciate the reviewer's comment on this point. We considered all primary recipients in Figure 2G to ensure a fair comparison, given the influence of various factors such as the radiosensitivity of individual recipient mice[1]. Comparing only the primary recipients used in the secondary transplantation would result in n = 3 (primary recipient) vs. n = 12 (secondary recipient). Including all primary recipients yields n = 11 vs. n = 12, providing a more balanced comparison. Therefore, we analyzed all primary recipient mice to ensure the reliability of our results.

      Comment for our #3-10 response:

      I respectfully disagree. Secondary recipients are derived from only 3 of the primary recipients. Therefore, the BM composition is determined by the composition of their donors. Including primary recipients that are not transplanted into secondary recipients for is not the fairest comparison for this analysis. 

      Thank you for your comment and for highlighting this important issue. We acknowledge the concern that including primary recipients that are not transplanted into secondary recipients is not the fairest comparison for this analysis. In response, we have reanalyzed the data using only the primary recipients whose bone marrow was actually transplanted into secondary recipients. 

      Author response image 7.

      Importantly, the reanalysis confirmed that the kinetics of myeloid cell proportions in peripheral blood were consistent between primary and secondary transplant recipients. We sincerely appreciate your thoughtful feedback, which has helped us improve the clarity.

      Comment #3-11: When discussing the transcriptional profile of young and aged HSCs, the authors claim that genes linked to myeloid differentiation remain unchanged in the LT-HSC fraction while there are significant changes in the STHSCs. However, 2 out of the 4 genes shown in Figure S4B show ratios higher than 1 in LT-HSCs.

      Response #3-11:

      Thank you for highlighting this important point. As the reviewer pointed out, when we analyze the expression of myeloid-related genes, some genes are elevated in aged LT-HSCs compared to young LT-HSCs. However, the GSEA analysis using myeloid-related gene sets, which include several hundred genes, shows no significant difference between young and aged LT-HSCs (see Figure S4C in this paper). Furthermore, functional experiments using the co-transplantation system show no difference in differentiation capacity between young and aged LT-HSCs (see Figure 3 in this paper). Based on these results, we conclude that LT-HSCs do not exhibit any change in differentiation capacity with aging.

      Comment for our #3-11 response:

      The authors used the data in Figure S4 to claim that "myeloid genes were tended to be enriched in aged bulk-HSCs but not in aged LT-HSCs compared to their respective controls" (this is the title of the figure; line # 1326). This is based on an increase in gene expression of CD150, vWF, Selp, Itgb3 in aged cells compared to young cells (Figure S4B). However, an increase in Selp and Itgb3 is also observed for LT-HSCs (lower magnitude, but still and increase). 

      Also, regarding the GSEA, the only term showing statistical significance in bulk HSCs is "Myeloid gene set", which does not reach significance in LT-HSCs, but present a trend for enrichment (q = 0.077). None of the terms in shown in this panel present statistical significance in ST-HSCs. 

      Thank you for your valuable point. As the reviewer noted, the current title may cause confusion. Therefore, we propose changing it to the following:

      [P52, L1331] “Figure S4. Compared to their respective young controls, aged bulk-HSCs exhibit greater enrichment of myeloid gene expression than aged LT-HSCs”

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis.

      Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We will further analyze the mutations on the available PHD2 crystal structures in complex with HIFa to discern how these substitution mutations may impact PHD2 structure and function.  

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed.  We will perform additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph will be changed for a clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  We will expand our discussion accordingly. 

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had Kd of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. We will perform additional binding experiments to further interrogate and validate the binding affinity of PHD2 P317R to NODD and CODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K<sub>d</sub>’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  However, we will perform additional binding experiments to further interrogate PHD2 P317R binding to NODD.   

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We will expand on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data will be added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we will correct and clarify our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data will be added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells will be added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We will perform additional experiments as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants will be added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis. 

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2. The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations. The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter. Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants. The limitations of the luciferase assay will be expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif. While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD. We will discuss this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study. We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog. 

      However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

      Thank you for your comments and for pointing out the strengths of the manuscript. Unfortunately, we will not be able to respond to point #1. The protocol for purification of the ribosomes from RNA granules does not work in older brains (See Khandjian et al, 2004 PNAS 101:13357), presumably due to the presence of large concentrations of myelin. While it would be possible to repeat our results later in culture, we have no expectation that it would be different since we do observe DHPG induction of elongation dependent, initiation independent mGLUR-LTD in later cultures (Graber et al, 2017 J. Neuroscience 37:9116)..We will strengthen this caveat in the discussion that our results are only at a snapshot of development and that it is certainly possible that different results may be seen at different times. We agree with point 2 that ‘distal granules’ is a vague term. We will remove the term and clarify that we only quantified granules larger than 50 microns from the cell soma. We do not know if these granules are distinct. We would respectfully disagree with point #3 that the study does not provide molecular insight into the function of FMRP, as disproving that FMRP is important for stalling and determining the position of stalling removes a major hypothesis about the function of FMRP, and showing that something is not true, is at least to me, providing insight.

      Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

      Thank you for your comments. We agree with the issue in point #1 that the equivalence of RPM puncta with the RG fraction is an issue and while we believe that we show in a number of ways that the two are related (anisomycin-resistant puromycylation, puromyclation only at high concentrations consistent with the hybrid state, etc), we would respectfully disagree that our main message results from the equivalence of the RPM-labeled RNA granules in neurites and the ribosomes isolated by sedimentation. We will make this point clearer in our revision. For point #2, we agree that the changes with increased nuclease is somewhat out of place in a narrative sense, but it is clearly relevant to this work. Whether or not one sees this as a ‘correction’ or an interesting point will depend on a better characterization of the structures of the stalled polysomes. My personal view is that the nuclease resistance of cleavage near the RNA entrance site is quite interesting. Since we reproduce our results with a similar nuclease treatment in mice, as reported in our previous publication, I believe the comparison could be of interest in the future and would like to retain it. We agree with point #3 and will temper these claims in our revised version. For point #4, we will determine more carefully why the number of peaks differs and switch the main and supplemental figures. We apologize for the typo in the figure legend in Figure 9, 171, not 5171. The box plot line shows the median not the average and the data is clearly skewed such that the median and average are different (i.e. there is a two-fold decrease in the average density of distal puncta between WT and FMRP, but the average density is actually slightly decreased with HHT and A, although the median increases slightly. We will now report the results in distinct modalities to clarify this, and we will reexamine the statistics to better address the skewed distribution of values in the revised version.

      Summary:

      Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      (1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      (2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      (3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      (4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      (1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      (2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

      Thank you for your comments on the strengths of the manuscript. We agree with point #1 that the mouse RNA granule characterization needs to be more rigorous and we plan to accomplish this in our revised version. Similarly, we will incorporate the additional statistical analysis suggested by the reviewer in a revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies that have put forward the concept of an ’inverse RF’ based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation. Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feedforward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The statistics are pooled across animals, which is less appropriate for hierarchical data. There is no histological confirmation of placement of the electrode in the LGN and there is no analysis of eye or face movements which may have contributed to the surround-induced responses. There are also some missing statistics and methods details which make interpretation more difficult.

      We thank the reviewer for their positive and constructive comments, and have addressed these specific issues in response to the minor comments. For the statistics across animals, we refer to “Reviewer 1 recommendations” point 1. For the histological analysis, we refer to “Reviewer 1 recommendations point 2”. For the eye and facial movements, we refer to “Reviewer 1 recommendations point 5”. Concerning missing statistics and methods details, we refer to various responses to “Reviewer 1 recommendations”. We thoroughly reviewed the manuscript and included all missing statistical and methodological details.

      Reviewer #2 (Public review):

      Cuevas et al. investigate the stimulus selectivity of surround-induced responses in the mouse primary visual cortex (V1). While classical experiments in non-human primates and cats have generally demonstrated that stimuli in the surround receptive field (RF) of V1 neurons only modulate activity to stimuli presented in the center RF, without eliciting responses when presented in isolation, recent studies in mouse V1 have indicated the presence of purely surround-induced responses. These have been linked to prediction error signals. In this study, the authors build on these previous findings by systematically examining the stimulus selectivity of surround-induced responses.

      Using neuropixels recordings in V1 and the dorsal lateral geniculate nucleus (dLGN) of head-fixed, awake mice, the authors presented various stimulus types (gratings, noise, surfaces) to the center and surround, as well as to the surround only, while also varying the size of the stimuli. Their results confirm the existence of surround-induced responses in mouse V1 neurons, demonstrating that these responses do not require spatial or temporal coherence across the surround, as would be expected if they were linked to prediction error signals. Instead, they suggest that surround-induced responses primarily reflect the representation of the achromatic surface itself.

      The literature on center-surround effects in V1 is extensive and sometimes confusing, likely due to the use of different species, stimulus configurations, contrast levels, and stimulus sizes across different studies. It is plausible that surround modulation serves multiple functions depending on these parameters. Within this context, the study by Cuevas et al. makes a significant contribution by exploring the relationship between surround-induced responses in mouse V1 and stimulus statistics. The research is meticulously conducted and incorporates a wide range of experimental stimulus conditions, providing valuable new insights regarding center-surround interactions.

      However, the current manuscript presents challenges in readability for both non-experts and experts. Some conclusions are difficult to follow or not clearly justified.

      I recommend the following improvements to enhance clarity and comprehension:

      (1) Clearly state the hypotheses being tested at the beginning of the manuscript.

      (2) Always specify the species used in referenced studies to avoid confusion (esp. Introduction and Discussion).

      (3) Briefly summarize the main findings at the beginning of each section to provide context.

      (4) Clearly define important terms such as “surface stimulus” and “early vs. late stimulus period” to ensure understanding.

      (5) Provide a rationale for each result section, explaining the significance of the findings.

      (6) Offer a detailed explanation of why the results do not support the prediction error signal hypothesis but instead suggest an encoding of the achromatic surface.

      These adjustments will help make the manuscript more accessible and its conclusions more compelling.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      • We explicitly mentioned the species used in the referenced studies.

      • We provided a clearer rationale for each experiment in the Results section.

      We have also always clearly stated the species that previous studies used, both in the Introduction and Discussion section.

      Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually the opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      The paper is not particularly clear. I came out of it rather confused as to which hypotheses were still standing and which hypotheses were ruled out. There are numerous ways to make it clearer.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      ** Recommendations for the Authors:**

      Reviewer #1 (Recommendations for the Authors):

      (1) Given the data is hierarchical with neurons clustered within 6 mice (how many recording sessions per animal?) I would recommend the use of Linear Mixed Effects models. Simply pooling all neurons increases the risk of false alarms.

      To clarify: We used the standard method for analyzing single-unit recordings, by comparing the responses of a population of single neurons between two different conditions. This means that the responses of each single neuron were measured in the different conditions, and the statistics were therefore based on the pairwise differences computed for each neuron separately. This is a common and standard procedure in systems neuroscience, and was also used in the previous studies on this topic (Keller et al., 2020; Kirchberger et al., 2023). We were not concerned with comparing two groups of animals, for which hierarchical analyses are recommended. To address the reviewer’s concern, we did examine whether differences between baseline and the gray/drift condition, as well as the gray/drift compared to the grating condition, were consistent across sessions, which was indeed the case. These findings are presented in Supplementary Figure 6.

      (2) Line 432: “The study utilized three to eight-month-old mice of both genders”. This is confusing, I assume they mean six mice in total, please restate. What about the LGN recordings, were these done in the same mice? Can the authors please clarify how many animals, how many total units, how many included units, how many recording sessions per animal, and whether the same units were recorded in all experiments?

      We have now clarified the information regarding the animals used in the Methods section.

      • We state that “We included female and male mice (C57BL/6), a total of six animals for V1 recordings between three and eight months old. In two of those animals, we recorded simultaneously from LGN and V1.”

      • We state that“For each animal, we recorded around 2-3 sessions from each hemisphere, and we recorded from both hemispheres.”

      • We noted that the number of neurons was not mentioned for each figure caption. We apologize for this omission. We have now added the number for all of the figures and protocols to the revised manuscript. We note that the same neurons were recorded for the different conditions within each protocol, however because a few sessions were short we recorded more units for the grating protocol. Note that we did not make statistical comparisons between protocols.

      (3) I see no histology for confirmation of placement of the electrode in the LGN, how can they be sure they were recording from the LGN? There is also little description of the LGN experiments in the methods.

      For better clarity, we have included a reconstruction of the electrode track from histological sections of one animal post-experiment (Figure S4). The LGN was targeted via stereotactical surgery, and the visual responses in this area are highly distinct. In addition, we used a flash protocol to identify the early-latency responses typical for the LGN, which is described in the Methods section: “A flash stimulus was employed to confirm the locations of LGN at the beginning of the recording sessions, similar to our previous work in which we recorded from LGN and V1 simultaneously (Schneider et al., 2023). This stimulus consisted of a 100 ms white screen and a 2 s gray screen as the inter-stimulus interval, designed to identify visually responsive areas. The responses of multi-unit activity (MUA) to the flash stimulus were extracted and a CSD analysis was then performed on the MUA, sampling every two channels. The resulting CSD profiles were plotted to identify channels corresponding to the LGN. During LGN recordings, simultaneous recordings were made from V1, revealing visually responsive areas interspersed with non-responsive channels.”

      (4) Many statements are not backed up by statistics, for example, each time the authors report that the response at 90degree sign is higher than baseline (Line 121 amongst other places) there is no test to support this. Also Line 140 (negative correlation), Line 145, Line 180.

      For comparison purposes, we only presented statistical analyses across conditions. However, we have now added information to the figure captions stating that all conditions show values higher than the baseline.

      (5) As far as I can see there is no analysis of eye movements or facial movements. This could be an issue, for example, if the onset of the far surround stimuli induces movements this may lead to spurious activations in V1 that would be interpreted as surround-induced responses.

      To address this point, we have included a supplementary figure analyzing facial movements across different sessions and comparing them between conditions (Supplementary Figure 5). A detailed explanation of this analysis has been added to the Methods section. Overall, we observed no significant differences in face movements between trials with gratings, trials with the gray patch, and trials with the gray screen presented during baseline. Animals exhibited similar face movements across all three conditions, supporting the conclusion that the observed neural firing rate increases for the gray-patch condition are not related to face movements.

      (6) The experiments with the rectangular patch (Figure 3) seem to give a slightly different result as the responses for large sizes (75, 90) don’t appear to be above baseline. This condition is also perceptually the least consistent with a grey surface in the RF, the grey patch doesn’t appear to occlude the surface in this condition. I think this is largely consistent with their conclusions and it could merit some discussion in the results/discussion section.

      While the effect is maybe a bit weaker, the total surround stimulated also covers a smaller area because of the large rectangular gray patch. Furthermore, the early responses are clearly elevated above baseline, and the responses up to 70 degrees are still higher than baseline. Hence we think this data point for 90 degrees does not warrant a strong interpretation.

      Minor points:

      (1) Figure 1h: What is the statistical test reported in the panel (I guess a signed rank based on later figures)? Figure 4d doesn’t appear to be significantly different but is reported as so. Perhaps the median can be indicated on the distribution?

      We explained that we used a signed rank test for Figure 1h and now included the median of the distributions in Figure 4d.

      (2) What was the reason for having the gratings only extend to half the x-axis of the screen, rather than being full-screen? This creates a percept (in humans at least) that is more consistent with the grey patch being a hole in the grating as the grey patch has the same luminance as the background outside the grating.

      We explained in the Methods section that “We presented only half of the x-axis due to the large size of our monitor, in order to avoid over-stimulation of the animals with very large grating stimuli.”. Perceptually speaking, the gray patch appears as something occluding the grating, not as a “hole”.

      (3) Line 103: “and, importantly, had less than 10degree sign (absolute) distance to the grating stimulus’ RF center.” Re-phrase, a stimulus doesn’t have an RF center.

      We corrected this to “We included only single units into the analysis that met several criteria in terms of visual responses (see Methods) and, importantly, the RF center had less than 10(absolute) distance to the grating stimulus’ center. ”.

      (4) Line 143: “We recorded single neurons LGN” - should be “single LGN neurons”.

      We corrected this to “we recorded single LGN neurons”.

      (5) Line 200: They could spell out here that the latency is consistent with the latency observed for the grey patch conditions in the previous experiments. (6) Line 465: This is very brief. What criteria did they use for single-unit assignation? Were all units well-isolated or were multi-units included?

      We clarified in the Methods section that “We isolated single units with Kilosort 2.5 (Steinmetz et al., 2021) and manually curated them with Phy2 (Rossant et al., 2021). We included only single units with a maximum contamination of 10 percent.”

      (7) Line 469: “The experiment was run on a Windows 10”. Typo.

      We corrected this to “The experiment was run on Windows 10”.

      (9) Line 481: “We averaged the response over all trials and positions of the screen”. What do they mean by ’positions of the screen’?

      We changed this to “We computed the response for each position separately right, by averaging the response across all the trials where a square was presented at a given position.”

      (9) Line 483: “We fitted an ellipse in the center of the response”. How?

      We additionally explain how we preferred the detection of the RF using an ellipse fitting: “A heatmap of the response was computed. This heatmap was then smoothed, and we calculated the location of the peak response. From the heatmap we calculated the centroid of the response using the function regionprops.m that finds unique objects, we then selected the biggest area detected. Using the centroids provided as output. We then fitted an ellipse centered on this peak response location to the smoothed heatmap using the MATLAB function ellipse.m.“

      (10) Line 485 “...and positioned the stimulus at the response peak previously found”. Unclear wording, do you mean the center of the ellipse fit to the MUA response averaged across channels or something else? (11) Line 487: “We performed a permutation test of the responses inside the RF detected vs a circle from the same area where the screen was gray for the same trials.”. The wording is a bit unclear here, can they clarify what they mean by the ’same trials’, what is being compared to what here?

      We used a permutation test to compare the neuron’s responses to black and white squares inside the RF to the condition where there was no square in the RF (i.e. the RF was covered by the gray background).

      (12) Was the pink noise background regenerated on each trial or as the same noise pattern shown on each trial?

      We explain that “We randomly presented one of two different pink noise images”

      (13) Line 552: “...used a time window of the Gaussian smoothing kernel from-.05 to .05”. Missing units.

      We explained that “we used a time window of the Gaussian smoothing kernel from -.05 s to .05 s, with a standard deviation of 0.0125 s.”

      (14) Line 565: “Additionally, for the occluded stimulus, we included patch sizes of 70 degree sign and larger.”. Not sure what they’re referring to here.

      We changed this to: “For the population analyses, we analyzed the conditions in which the gray patch sizes were 70 degrees and 90 degrees”.

      (15) Line 569: What is perplexity, and how does changing it affect the t-SNE embeddings?

      Note that t-SNE is only used for visualization purposes. In the revised manuscript, we have expanded our explanation regarding the use of t-SNE and the choice of perplexity values. Specifically, we have clarified that we used a perplexity value of 20 for the Gratings with circular and rectangular occluders and 100 for the black-and-white condition. These values were empirically selected to ensure that the groups in the data were clearly separable while maintaining the balance between local and global relationships in the projected space. This choice allowed us to visually distinguish the different groups while preserving the meaningful structure encoded in the dissimilarity matrices. In particular, varying the perplexity values would not alter the conclusions drawn from the visualization, as t-SNE does not affect the underlying analytical steps of our study.

      (16) Line 572: “We trained a C-Support Vector Classifier based on dissimilarity matrices”. This is overly brief, please describe the construction of the dissimilarity matrices and how the training was implemented. Was this binary, multi-class? What conditions were compared exactly?

      In the revised manuscript, we have expanded our explanation regarding the construction of the dissimilarity matrices and the implementation of the C-Support Vector Classification (C-SVC) model (See Methods section).

      The dissimilarity matrices were calculated using the Euclidean distance between firing rate vectors for all pairs of trials (as shown in Figure 6a-b). These matrices were used directly as input for the classifier. It is important to note that t-SNE was not used for classification but only for visualization purposes. The classifier was binary, distinguishing between two classes (e.g., Dr vs St). We trained the model using 60% of the data for training and used 40% for testing. The C-SVC was implemented using sklearn, and the classification score corresponds to the average accuracy across 20 repetitions.

      Reviewer #2 (Recommendations for the Authors):

      The relationship between the current paper and Keller et al. is challenging to understand. It seems like the study is critiquing the previous study but rather implicitly and not directly. I would suggest either directly stating the criticism or presenting the current study as a follow-up investigation that further explores the observed effect or provides an alternative function. Additionally, defining the inverse RF versus surround-induced responses earlier than in the discussion would be beneficial. Some suggestions:

      (1) The introduction is well-written, but it would be helpful to clearly define the hypotheses regarding the function of surround-induced responses and revisit these hypotheses one by one in the results section.

      Indeed, we have generally improved the Introduction of the manuscript, and stated the hypotheses and their relationships to the Experiments more clearly.

      (2) Explicitly mention how you compare classic grating stimuli of varying sizes with gray patch stimuli. Do the patch stimuli all come with a full-field grating? For the full-field grating, you have one size parameter, while for the patch stimuli, you have two (size of the patch and size of the grating).

      We now clearly describe how we compare grating stimuli of varying sizes with gray patch stimuli.

      (3) The third paragraph in the introduction reads more like a discussion and might be better placed there.

      We have moved content from the third paragraph of the Introduction to the Discussion, where it fits more naturally.

      (4) Include 1-2 sentences explaining how you center RFs and detail the resolution of your method.

      We have added an explanation to the Methods: “To center the visual stimuli during the recording session, we averaged the multiunit activity across the responsive channels and positioned the stimulus at the center of the ellipse fit to the MUA response averaged across channels.”.

      (5) Motivate the use of achromatic stimuli. This section is generally quite hard to understand, so try to simplify it.

      We explained better in the Introduction why we performed this particular experiment.

      (6) The decoding analysis is great, but it is somewhat difficult to understand the most important results. Consider summarizing the key findings at the beginning of this section.

      We now provide a clearer motivation at the start of the Decoding section.

      Reviewer #3 (Recommendations for the Authors):

      I have a few suggestions to improve the clarity of the presentation.

      Abstract: it lists a series of observations and it ends with a conclusion (“based on these findings...”). However, it provides little explanation for how this conclusion would arise from the observations. It would be more helpful to introduce the reasoning at the top and show what is consistent with it.

      We have improved the abstract of the paper incorporating this feedback.

      To some extent, this applies to Results too. Sometimes we are shown the results of some experiment just because others have done a similar experiment. Would it be better to tell us which hypotheses it tests and whether the results are consistent with all 3 hypotheses or might rule one or more out? I came out of the paper rather confused as to which hypotheses were still standing and which hypotheses were ruled out.

      We have strongly improved our explanation of the hypotheses and the relationships to the experiments in the Introduction.

      It would be best if the Results section focused on the results of the study, without much emphasis on what previous studies did or did not measure. Here, instead, in the middle of Results we are told multiple times what Keller et al. (2020) did or did not measure, and what they did or did not find. Please focus on the questions and on the results. Where they agree or disagree with previous papers, tell us briefly that this is the case.

      We have revised the Results section in the revised manuscript, and ensured that there is much less focus on what previous studies did in the Results. Differences to previous work are now discussed in the Discussion section.

      The notation is extremely awkward. For instance “Gc” stands for two words (Gray center) but “Gr” stands for a single word (Grating). The double meaning of G is one of many sources of confusion.

      This notation needs to be revised. Here is one way to make it simpler: choose one word for each type of stimulus (e.g. Gray, White, Black, Drift, Stat, Noise) and use it without abbreviations. To indicate the configuration, combine two of those words (e.g. Gray/Drift for Gray in the center and Drift in the surround).

      We have corrected the notation in the figures and text to enhance readability and improve the reader’s understanding.

      Figure 1e and many subsequent ones: it is not clear why the firing rate is shown in a logarithmic scale. Why not show it in a linear scale? Anyway, if the logarithmic scale is preferred for some reason, then please give us ticks at numbers that we can interpret, like 0.1,1,10,100... or 0.5,1,2,4... Also, please use the same y-scale across figures so we can compare.

      To clarify: it is necessary to normalize the firing rates relative to baseline, in order to pool across neurons. However such a divisive normalization would be by itself problematic, as e.g. a change from 1 to 2 is the same as a change from 1 to 0.5, on a linear scale. Furthermore such division is highly outlier sensitive. For this reason taking the logarithm (base 10) of the ratio is an appropriate transformation. We changed the tick labels to 1, 2, 4 like the reviewer suggested.

      Figure 3: it is not clear what “size” refers to in the stimuli where there is no gray center. Is it the horizontal size of the overall stimulus? Some cartoons might help. Or just some words to explain.

      Figure 3: if my understanding of “size” above is correct, the results are remarkable: there is no effect whatsoever of replacing the center stimulus with a gray rectangle. Shouldn’t this be remarked upon?

      We have added a paragraph under figure 3 and in the Methods section explaining that the sizes represent the varying horizontal dimensions of the rectangular patch. In this protocol, the classical condition (i.e. without gray patch) was shown only as full-field gratings, which is depicted in the plot as size 0, indicating no rectangular patch was present.

      DETAILS The word “achromatic” appears many times in the paper and is essentially uninformative (all stimuli in this study are achromatic, including the gratings). It could be removed in most places except a few, where it is actually used to mean “uniform”. In those cases, it should be replaced by “uniform”.

      Ditto for the word “luminous”, which appears twice and has no apparent meaning. Please replace it with “uniform”.

      We have replaced the words achromatic and luminous with “uniform” stimuli to improve the clarity when we refer to only black or white stimuli.

      Page 3, line 70: “We raise some important factors to consider when describing responses to only surround stimulation.” This sentence might belong in the Discussion but not in the middle of a paragraph of Results.

      We removed this sentence.

      Neuropixel - Neuropixels (plural)

      “area LGN” - LGN

      We corrected for misspellings.

      References

      Keller, A.J., Roth, M.M., Scanziani, M., 2020. Feedback generates a second receptive field in neurons of the visual cortex. Nature 582, 545–549. doi:10.1038/s41586-020-2319-4.

      Kirchberger, L., Mukherjee, S., Self, M.W., Roelfsema, P.R., 2023. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science Advances 9, eadd2498. doi:10. 1126/sciadv.add2498.

      Rossant, C., et al., 2021. phy: Interactive analysis of large-scale electrophysiological data. https://github.com/cortex-lab/phy.

      Schneider, M., Tzanou, A., Uran, C., Vinck, M., 2023. Cell-type-specific propagation of visual flicker. Cell Reports 42.

      Steinmetz, N.A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M., Beau, M., Bhagat, J., B¨ohm, C., Broux, M., Chen, S., Colonell, J., Gardner, R.J., Karsh, B., Kloosterman, F., Kostadinov, D., Mora-Lopez, C., O’Callaghan, J., Park, J., Putzeys, J., Sauerbrei, B., van Daal,R.J.J., Vollan, A.Z., Wang, S., Welkenhuysen, M., Ye, Z., Dudman, J.T., Dutta, B., Hantman, A.W., Harris, K.D., Lee, A.K., Moser, E.I., O’Keefe, J., Renart, A., Svoboda, K., H¨ausser, M., Haesler, S., Carandini, M., Harris, T.D., 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588. doi:10.1126/science.abf4588.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Thank you for the positive assessment of our work and the nice summary of the manuscript!

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

      We agree that experimental manipulation would allow us to extend our work. Unfortunately, this is not possible with wild, endangered gorillas.

      We have now added more references (Watts 1994; Watts 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influence the likelihood to receive aggression.

      We have now clearly stated that reproductive state is an indirect proxy for energetic needs. We agree with your point about energy intake and expenditure, but unfortunately, we do not have data on energy expenditure or caloric intake to allow us to delve into more fine-grained analyses.

      Overall, we have tried to enrich the justification and empirical support to strengthen our conclusions by clarifying the text and adding more examples and references.

      Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      We did not use group size as a main predictor, as has been commonly done in other species, because of potentially conflating opposing effects of males and females. To further stress this point, we have specifically added in the introduction: “group size, the overall number of individuals in the group, might not be a good predictor of aggression heuristics, as it can conflate the effects of different kinds of individuals on aggression (see Smit & Robbins 2024 for an example of opposing effects of the number of females and number of males on female gorilla aggression).”

      We also “ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, [and] its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      We rephrased accordingly: “We used all avoidance and displacement interactions throughout the study period and we used the function elo.seq from R package EloRating to infer daily individual female Elo-scores”. We also clarified that “This method takes into account the temporal sequence of interactions and updates an individual’s Elo-scores each day the individual interacted with another...”

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      We have shown in Reference 25 (Smit & Robbins 2025) after Reference 11 (Smit & Robbins 2024) that females form highly stable hierarchies, and that dyadic dominance relationships are not influenced by dispersal or death of third individuals. Notably, new immigrant females usually start at and remain low ranking, without large fluctuations in rank. Therefore, the presence of any fluctuation periods have limited influence in the aggressive interactions in our study system.

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      In fact, the females involved in the present analysis and the analysis of Smit & Robbins 2025 are the same. Our present analysis is based on the hierarchies of Smit & Robbins 2025. Note that female gorillas disperse and occasionally immigrate to another study group. This is why some females may appear in the hierarchies of more than one group, giving the impression that there are more females involved in the analysis of Smit & Robbins 2025 (e.g. by counting the lines in the Elo-rating plots). We now specifically state that “We present these interactions and hierarchies in detail in Smit & Robbins 2025”, to clarify that the hierarchies are the same.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

      Thank you for pointing this out. First, when we considered one pregnancy stage, pregnant females showed indeed a significantly greater interaction score than females in other reproductive stages. We have now included that in the manuscript. However, we still find relevant to test for the different stages of pregnancy, given the difference of energetic needs in these stages. We have now included the pairwise comparisons in a new table (Table 2).

      Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically:

      • ATA: 6 years

      • BIT: 8 years

      • KYA: 18 years

      • MUK: 6 years

      • ORU: 8 years

      I recommend that the authors clarify how the 25-year duration was derived.

      Indeed none of the five study “groups” has been studied for 25 years in a row. However, MUK emerged from a fission of group KYA in early 2016. So, from the start of group KYA in October 1998 to the end of group MUK in December 2023, there are 25 years and 2 months. We have now rephrased to “...starting in 1998 in one of the mountain gorilla groups” in the introduction, and to “We use a long-term behavioural dataset on five wild groups of the two gorilla species, starting in 1998” in the abstract.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      We have now added the suggested extra test: “When we ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      Regarding species differences: In our analysis, we test for species (mountain vs western) and we find no significant differences between the two. This is stated in the results.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      We now added short explanations into brackets for behaviours that are not obvious. We also added a sentence in the text to clarify the difference with the behaviours used to calculate Elo scores: “These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

      The sentence we added above (“These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”) and the first paragraph of the results hopefully clarify that ritualized agonistic interactions are generally directionally consistent and more reliably capture the highly stable dominance relationships of female gorillas. This approach has been used to calculate dominance rank in gorillas in all studies that have considered it, dating back to the 1970s (namely in studies by Harcourt and Watts). On the other hand, aggression can be context dependent (we now clearly note that in the beginning of the Methods paragraph on aggressive interactions). Therefore, we use Eloscores inferred from ritualized interactions as base and a reliable proxy of power relationships; then we test if the direction of aggression within these relationships is driven also by energetic needs or the social environment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the molecular mechanisms underlying HIV-1 persistence and host immune dysfunction in CD4+ T cells during early infection (<6 months). Using single-cell multi-omics technologies-including scRNA-seq, scATAC-seq, and single-cell multiome analyses-they characterized the transcriptional and epigenomic landscapes of HIV-1-infected CD4+ T cells. They identified key transcription factors (TFs), signaling pathways, and T cell subtypes involved in HIV-1 persistence, particularly highlighting KLF2 and Th17 cells as critical regulators of immune suppression. The study provides new insights into immune dysregulation during early HIV-1 infection and reveals potential epigenetic regulatory mechanisms in HIV-1-infected T cells.

      Strengths:

      The study excels through its innovative integration of single-cell multi-omics technologies, enabling detailed analysis of gene regulatory networks in HIV-1-infected cells. Focusing on early infection stages, it fills a crucial knowledge gap in understanding initial immune responses and viral reservoir establishment. The identification of KLF2 as a key transcription factor and Th17 cells as major viral reservoirs, supported by comprehensive bioinformatics analyses, provides robust evidence for the study's conclusions. These findings have immediate clinical relevance by identifying potential therapeutic targets for HIV-1 reservoir eradication.

      We sincerely appreciate the reviewer’s positive evaluation of our work.

      Weaknesses:

      Despite its strengths, the study has several limitations. By focusing exclusively on CD4+ T cells, the study overlooks other relevant immune cells such as CD14+ monocytes, NK cells, and B cells. Additionally, while the authors generated their own single-cell datasets, they need to validate their findings using other publicly available single-cell data from HIV-1-infected PBMCs.

      Thank you to Reviewer #1 for your feedback on our work. In response to this feedback, we have examined cell-cell interactions between HIV-1-infected CD4+ T cells and other innate immune cells, including monocytes and NK cells. We identified altered interaction signaling patterns (e.g., MIF, ICAM2, CCL5, CLEC2B) that contribute to immune dysfunction and viral persistence (page 9, Supplementary Fig. 5) In addition, we validated the expression of KLF2 and its target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which includes both healthy donors and individuals with chronic HIV-1 infection. The upregulation of key KLF2 targets in HIV-1-infected CD4+ T cells from this dataset supports the reproducibility of our findings. We have incorporated into the revised Results, Discussion, and Supplementary Materials (page 8, page 12 and Supplementary Fig. 4A).

      Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      This key observation of up-regulation KLF2 associated genes family might be important in the HIV field for early diagnosis and viral clearance. However, with the limited sample size and in-vivo study model, it will be hard to conclude. I highly recommend increasing the sample size of early HIV-1-infected patients.

      Thank you to Reviewer #2 for this important comment. We acknowledge the limitations of our modest sample size, which reflects the challenges of recruiting well-characterized individuals in early HIV-1 infection (<6 months) and obtaining high-quality PBMCs for single-cell multi-omic profiling. To strengthen our findings, we validated the upregulation of KLF2 target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which showed similar expression patterns in HIV-1 RNA+ CD4+ T cells (page 8 and Supplementary Fig. 4A).

      Reviewer #3 (Public review):

      Summary:

      This manuscript studies intracellular changes and immune processes during early HIV-1 infection with an additional focus on the small CD4+ T cell subsets. The authors used single-cell omics to achieve high resolution of transcriptomic and epigenomic data on the infected cells which were verified by viral RNA expression. The results add to understanding of transcriptional regulation which may allow progression or HIV latency later in infected cells. The biosamples were derived from early HIV infection cases, providing particularly valuable data for the HIV research field.

      Strengths:

      The authors examined the heterogeneity of infected cells within CD4 T cell populations, identified a significant and unexpected difference between naive and effector CD4 T cells, and highlighted the differences in Th2 and Th17 cells. Multiple methods were used to show the role of the increased KLF2 factor in infected cells. This is a valuable finding of a new role for the major transcription factor in further disease progression and/or persistence.

      The methods employed by the authors are robust. Single-cell RNA-Seq from PBMC samples was followed by a comprehensive annotation of immune cell subsets, 16 in total. This manuscript presents to the scientific community a valuable multi-omics dataset of good quality, which could be further analyzed in the context of larger studies.

      We sincerely thank the reviewer for the insightful and concise summary of our work.

      Weaknesses:

      Methods and Supplementary materials

      Some technical aspects could be described in more detail. For example, it is unclear how the authors filtered out cells that did not pass quality control, such as doublets and cells with low transcript/UMI content. Next, in cell annotation, what is the variability in cell types between donors? This information is important to include in the supplementary materials, especially with such a small sample size. Without this, it is difficult to determine, whether the differences between subsets on transcriptomic level, viral RNA expression level, and chromatin assessment are observed due to cell type variations or individual patient-specific variations. For the DEG analysis, did the authors exclude the most variable genes?

      Thank you to Reviewer #3 for these detailed comments and observations. In the revised Methods section (page 16), we have added information on our quality control filtering process. Specifically, we excluded cells with fewer than 200 detected genes, high mitochondrial content (>30%), or low UMI counts. Doublets were identified and removed using DoubletFinder.

      To address inter-donor variability, we included a new supplementary figure (Supplementary Fig. 1B) showing the distribution of major immune cell types across individual donors. While we observed some variation in cell-type composition between individuals, this likely reflects natural biological heterogeneity in early HIV-1 infection. Additionally, we applied fastMNN batch correction to mitigate donor-specific technical variation. After correction, the overall patterns of gene expression within each major CD4+ T cell subset were consistent across individuals (Supplementary Fig. 1C).

      Regarding the DEG analysis, we used ‘FindMarkers’ function in Seurat (v.3.2.1), which does not exclude highly variable genes. These details have been clarified in the updated Methods section (page 18).

      The annotation of 16 cell types from PBMC samples is impressive and of good quality, however, not all cell types get attention for further analysis. It’s natural to focus primarily on the CD4 T cells according to the research objectives. The authors also study potential interactions between CD4 and CD8 T cells by cell communication inference. It would be interesting to ask additional questions for other underexplored immune cell subsets, such as: 1) Could viral RNA be detected in monocytes or macrophages during early infection? 2) What are the inferred interactions between NK cells and infected CD4 T cells, are interactions similar to CD4-CD8 results? 3) What are the inferred interactions between monocytes or macrophages and infected CD4 T cells?

      In line with our study objectives, we initially focused on CD4+ T cells as primary HIV-1 targets. However, in response to the reviewer’s comment, we examined the inferred communications between HIV-1-infected CD4+ T cells and other immune cells.

      (1) With regard to the presence of viral RNA in monocytes or macrophages, we observed negligible HIV-1 RNA signal in these cell types in our dataset, consistent with their low permissiveness in early-stage infection [2]. However, we acknowledge the limitations of detecting rare infected cells at the single-cell level.

      (2) We identified increased MIF and ICAM2 signaling between NK cells and HIV-1-infected CD4+ T cells, which are associated with KLF2-mediated immune modulation. These patterns are consistent with the CD4–CD8 interaction results observed in our dataset. (Supplementary Fig. 5A)

      (3) Through the cell-cell interaction analysis with differential expression analysis, we inferred reduced CCL5 and CD55 signaling between monocytes and HIV-1-infected CD4+ T cells (Supplementary Fig. 5B). These reductions may potentially impair immune responses and antiviral defense.

      We appreciate the reviewer’s suggestions and believe that the analysis of underexplored immune subsets strengthens the relevance of our findings. These results have been incorporated into the revised Results (page 9).

      Discussion

      It would be interesting to see more discussion of the observation of how naïve T cells produce more viral RNA compared to effector T cells. It seems counterintuitive according to general levels of transcriptional and translational activity in subsets.

      Another discussion block could be added regarding the results and conclusion comparison with Ashokkumar et al. paper published earlier in 2024 (10.1093/gpbjnl/qzae003). This earlier publication used both a cell line-based HIV infection model and primary infected CD4 T cells and identified certain transcription factors correlated with viral RNA expression.

      Thank you to Reviewer #3 for the insightful suggestions. We observed that the proportion of HIV-1-infected naïve CD4 T cells is higher compared to effector T cells. Although effector CD4 T cells are generally more active, previous studies have suggested that naïve CD4 T cells are susceptible to HIV-1 infection during early infection that may associate with initial expansion and rapid progression [3, 4]. This may be due to less restriction by antiviral signaling or more accessible chromatin states in resting cells. We have added this context and cited relevant papers to address this observation (page 11)

      In addition, we have incorporated a comparative discussion with the recent study [5], which identified FOXP1 and GATA3 as transcriptional regulators associated with HIV-1 RNA expression. While these TFs were not significantly differentially expressed in our dataset, we discuss potential reasons for this discrepancy—including differences in infection model (in vitro vs. ex vivo), infection stage (latency vs. acute), and T cell subset composition—and emphasize that both studies highlight the importance of transcriptional regulation in HIV-1 persistence (page 12 and Supplementary Fig. 4B).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study has several notable limitations.

      First, it was restricted to early-stage HIV-1 infection (<6 months) without longitudinal data, preventing the authors from capturing temporal changes in immune cell populations, gene expression profiles, and epigenetic landscapes throughout disease progression.

      Thank you to Reviewer #1 for this important limitation. As noted, our study focused exclusively on early-stage HIV-1 infection (<6 months) to capture the initial immune dysregulation and epigenetic alterations. We agree that longitudinal analysis would provide valuable insights into disease progression. However, due to the limited availability of early-infection patient samples suitable for performing multi-omics profiling, we prioritized capturing a detailed snapshot at this early stage. To address this limitation, future studies incorporating longitudinal sampling—including chronic infection and long-term non-progressors—will be essential to fully elucidate the temporal dynamics of HIV-1 pathogenesis.

      Second, while the bioinformatic analysis compared "Uninfected" and "HIV-1-infected" cells from patients, the authors could have strengthened their findings by incorporating publicly available single-cell data from healthy donors and chronically infected HIV-1 patients to validate their arguments across all figures.

      To support the robustness of our findings, we incorporated a publicly available single-cell RNA-seq dataset [1], which includes both healthy donors and individuals with chronic HIV-1 infection. In this dataset, we validated the upregulation of KLF2 and its target genes in HIV-1-infected CD4+ T cells and observed generally consistent expression patterns with those in our early-infection cohort (page 8; page 12 and Supplementary Fig. S4). While not all gene-level trends were identically reflecting differences in infection stage and immune activation status, this external comparison reinforces the reproducibility of key observations and highlights the unique transcriptional features associated with early HIV-1 infection.

      Third, although the study focused on CD4+ T cells as primary HIV-1 targets, it overlooked other important immune cells such as CD8+ T cells, monocytes, and NK cells, which may contribute to viral persistence and immune dysfunction through cell-cell interactions.

      In the revised manuscript, we expanded our analysis to include predicted ligand–receptor interactions between HIV-1-infected and uninfected CD4+ T cells with innate and cytotoxic immune cells using CellChat v.2.1.1. Specifically, we evaluated interactions with NK cells and monocytes and identified altered signaling pathways such as MIF, ICAM2, CCL5, and CLEC2B, which are associated with immune modulation (Supplementary Fig. 5A). We have added these results to the revised Results (page 9).

      Lastly, comparing these findings with other chronic viral infections (e.g., HBV, HCV) would have positioned this work more effectively within the broader field of viral immunology and enhanced its impact.

      We agree that broader comparisons with other chronic viral infections could enhance the impact of our findings. In the current discussion, we noted similarities in interferon signaling disruption with viruses such as HCV and HSV. (page 11). Our observation that HIV-1-infected CD4+ T cells exhibit impaired interferon responses is consistent with immune evasion mechanisms reported in HCV and HSV infections. These results underscore both the shared and specific features of immune modulation and persistence during HIV-1 early infection.

      Reviewer #3 (Recommendations for the authors):

      Supplementary Table S1 should indicate which technique was used for sequencing. However, the current version of the table marks no protocol applied to the majority of the samples, which is confusing and needs to be corrected.

      Thank you to Reviewer #3 for pointing out this important oversight. We have revised Supplementary Table S1 to clearly indicate the sequencing method used for each sample. Separate columns for scRNA-seq, scATAC-seq, and sc-Multiome now specify whether each technique was applied (“Yes” or “No”) to improve clarity and transparency.

      (1) Wang, S., et al., An atlas of immune cell exhaustion in HIV-infected individuals revealed by single-cell transcriptomics. Emerg Microbes Infect, 2020. 9(1): p. 2333-2347.

      (2) Arfi, V., et al., Characterization of the early steps of infection of primary blood monocytes by human immunodeficiency virus type 1. J Virol, 2008. 82(13): p. 6557-65.

      (3) Douek, D.C., et al., HIV preferentially infects HIV-specific CD4+ T cells. Nature, 2002. 417(6884): p. 95-8.

      (4) Jiao, Y., et al., Higher HIV DNA in CD4+ naive T-cells during acute HIV-1 infection in rapid progressors. Viral Immunol, 2014. 27(6): p. 316-8.

      (5) Ashokkumar, M., et al., Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation. Genomics Proteomics Bioinformatics, 2024. 22(1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.

      The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      We are pleased that the reviewers recognized the strength and significance of our findings describing a gut-to-brain cytokine signaling mechanism involving the blood-brain barrier (BBB) and its role in regulating sleep, and we thank them for their comments.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      We thank the reviewer for raising this point. Although several midgut cell types (including the absorptive enterocytes) may indeed produce Unpaired (Upd) cytokines, our study specifically focused on enteroendocrine cells (EECs), which are well-characterized as secretory endocrine cells capable of exerting systemic effects. As detailed in our response to Results point #2 (please see below), we show that EEC-specific manipulation of Upd signaling is both necessary and sufficient to regulate sleep in response to intestinal oxidative stress. These findings support the role of EECs as a primary source of gut-derived cytokine signaling to the brain. To acknowledge the possible involvement of other source, we have also added a statement to the Discussion in the revised manuscript noting that other, non-endocrine gut cell types may contribute to systemic Unpaired signaling that modulates sleep.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      We agree with the reviewer that this is an important point. To address it, we performed additional validation experiments to assess whether the voilà-GAL4 driver in combination with R57C10-GAL80 (EEC>) influences upd2 or upd3 expression in the brain. Our results show that manipulation using EEC> alters upd2 and upd3 expression in the gut (Fig. 1a,b), with new data showing that this does not affect their expression levels in neuronal tissues (Fig. S1a), supporting the specificity of our approach. These new data are now included in the revised manuscript and described in the Results section. This additional validation strengthens our conclusion that the observed sleep phenotypes result from gut-specific cytokine signaling, rather than from effects on Unpaired cytokines produced in the brain.

      (1) >(3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      We thank the reviewer for raising this point. We agree that the references originally cited did not sufficiently justify the use of H<sub>2</sub>O<sub>2</sub> feeding as a model of gut inflammation. To address this, we have revised the Results section to clarify that we use H<sub>2</sub>O<sub>2</sub> feeding as a controlled method to elevate intestinal ROS levels, rather than as a general model of inflammation. This approach allows us to investigate the specific effects of ROS-induced cytokine signaling in the gut. We have also added additional citations to support the physiological relevance of this model. For instance, Tamamouna et al. (2021) demonstrated that H<sub>2</sub>O<sub>2</sub> feeding induces intestinal stem-cell proliferation – a response also observed during bacterial infection – and Jiang et al. (2009) showed that enteric infections increase upd2 and upd3 expression, which we similarly observe following H<sub>2</sub>O<sub>2</sub> feeding (Fig. 3a). These findings support the use of H<sub>2</sub>O<sub>2</sub> as a tool to mimic specific ROS-linked responses in the gut. We believe this targeted and tractable model is a strength of our study, enabling us to dissect how intestinal ROS modulates systemic physiology through cytokine signaling

      Additionally, we have included a statement in the Discussion acknowledging that ROS generated during infection may activate signaling mechanisms distinct from those triggered by chemically induced oxidative stress, and that exploring these differences in future studies may yield important insights into gut–brain communication. These revisions provide a stronger justification for our model while more accurately conveying both its relevance and its limitations.

      (2) >(4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-02341540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      We thank the reviewer for this thoughtful follow-up point. We would like to clarify that we do not claim that the effects observed in our study directly reflect the full response to enteric infection. As outlined in our revised response to comment 3, we have updated the manuscript to more precisely describe the H<sub>2</sub>O<sub>2</sub>-feeding paradigm as a model that induces local intestinal ROS responses comparable to, but not equivalent to, those observed during pathogenic challenges. This revised framing highlights both the potential similarities and differences between chemically induced oxidative stress and infection-induced responses. Indeed, in the revised Discussion, we now explicitly acknowledge that ROS generated during infection may engage distinct signaling mechanisms compared to exogenous H<sub>2</sub>O<sub>2</sub> and emphasize the value of future studies in delineating these pathways. We are currently pursuing this direction in an independent ongoing study investigating the effects of enteric infections. However, for the present work, we chose to focus on the effects of ROS-induced responses in isolation, as this provides a clean and well-controlled context to dissect the specific contribution of oxidative stress to cytokine signaling and sleep regulation.

      To further address the reviewer’s concern, we have also included new data (a TUNEL stain for apoptotic DNA fragmentation) in the revised manuscript showing that H<sub>2</sub>O<sub>2</sub> feeding does not damage neuronal tissues under our experimental conditions (Fig. S3f,g). This addresses the point raised regarding the potential neurotoxicity of H<sub>2</sub>O<sub>2</sub>, as described by Majcin Dorcikova et al. (2023), and supports the specificity of the sleep phenotypes observed in our study. We believe these revisions and clarifications strengthen the manuscript and make our interpretation more precise.

      (3) >(5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

      Our work highlights a distinct role for gut-derived AstA in sleep regulation compared to findings by Lin et al. (Cell Discovery, 2023)[1], who showed that gut AstA mediates energy wasting during sleep deprivation. Their study focused on the metabolic consequences of sleep loss, proposing that sleep deprivation increases ROS in the gut, which then promotes the release of the glucagon-like hormone adipokinetic hormone (AKH) through gut AstA signaling, thereby triggering energy expenditure.

      In contrast, our study addresses the inverse question – how ROS in the gut influences sleep. In our model, intestinal ROS promotes sleep, raising the intriguing possibility – cleverly pointed out by the reviewers – that ROS generated during sleep deprivation might promote sleep by inducing Unpaired cytokine signaling in the gut. According to our findings, this suppresses wake-promoting AstA signaling in the BBB, providing a mechanism to promote sleep as a restorative response to gut-derived oxidative stress and potentially limiting further ROS accumulation. Importantly, our findings support a wakepromoting role for EEC-derived AstA, demonstrated by several lines of evidence. First, EEC-specific knockdown of AstA increases sleep. Second, activation of AstA<sup>+</sup> EECs using the heat-sensitive cation channel Transient Receptor Potential A1 (TrpA1) reduces sleep, and this effect is abolished by simultaneous knockdown of AstA, indicating that the sleep-suppressing effect is mediated by AstA and not by other peptides or secreted factors released by these cells. Third, downregulation of AstA receptor expression in BBB glial cells increases sleep, further supporting the existence of a functional gut AstA– glia arousal pathway. We have now included new data in the revised manuscript showing that AstA release from EECs is downregulated during intestinal oxidative stress (Fig. 7k,l,m). This suggests that this wake-promoting signal is suppressed both at its source (the gut endocrine cells), by unknown means, and at its target, the BBB, via Unpaired cytokine signaling that downregulates AstA receptor expression. This coordinated downregulation may serve to efficiently silence this arousal-promoting pathway and facilitate sleep during intestinal stress. These new data, along with an expanded discussion, provide further mechanistic insight into gut-derived AstA signaling and strengthen our proposed model.

      This contrasts with the interpretation by Lin et al., who observed increased AstA peptide levels in EECs after antioxidant treatment and interpreted this as peptide retention. However, peptide accumulation may result from either increased production or decreased release, and peptide levels alone are insufficient to distinguish between these possibilities. To resolve this, we examined AstA transcript levels, which can serve as a proxy for production. Following oxidative stress (24 h of 1% H<sub>2</sub>O<sub>2</sub> feeding and the following day), when animals show increased sleep (Fig. 7e), we observed a decrease in AstA transcript levels followed by an increase in peptide levels (Fig. 7k,l,m), suggesting that oxidative stress leads to reduced gut AstA production and release. Furthermore, we recently found that a class of EECs that produce the hormone Tachykinin (Tk) and are distinct from the AstA<sup>+</sup> EECs express the ROSsensitive cation channel TrpA1 (Ahrentløv et al., 2025, Nature Metabolism2). In these Tk<sup>+</sup> EECs, TrpA1 mediates ROS-induced Tk hormone release. In contrast, single-cell RNA-seq data[3] do not support TrpA1 expression in AstA<sup>+</sup> EECs, consistent with our findings that ROS does not promote AstA release – an effect that would be expected if TrpA1 were functionally expressed in AstA<sup>+</sup> EECs. This contradicts the findings of Lin et al., who reported TrpA1 expression in AstA<sup>+</sup> EECs. We have now included relevant single-cell data in the revised manuscript (Fig. S6f) showing that TrpA1 is specifically expressed in Tk<sup>+</sup> EECs, but not in AstA<sup>+</sup> EECs, and we have expanded the discussion to address discrepancies in TrpA1 expression and AstA regulation.

      Taken together, our results reveal a dual-site regulatory mechanism in which Unpaired cytokines released from the gut act at the BBB to downregulate AstA receptor expression, while AstA release from EECs is simultaneously suppressed. We thank the reviewers for raising this important point. We have also included a discussion the other point raised by the reviewers – the possibility that ROS generated during sleep deprivation may engage the same signaling pathways described here, providing a mechanistic link between sleep deprivation, intestinal stress, and sleep regulation.

      Recommendations for the authors:

      A- Material and Methods:

      (1) Feeding Assay: The cited publication (doi.org:10.1371/journal.pone.0006063) states: "For the amount of label in the fly to reflect feeding, measurements must therefore be confined to the time period before label egestion commences, about 40 minutes in Drosophila, a time period during which disturbance of the flies affects their feeding behavior. There is thus a requirement for a method of measuring feeding in undisturbed conditions." Was blue fecal matter already present on the tube when flies were homogenized at 1 hour? If so, the assay may reflect gut capacity rather than food passage (as a proxy for food intake). In addition, was the variability of food intake among flies in the same tube tested (to make sure that 1-2 flies are a good proxy for the whole population)?

      We agree that this is an important point for feeding experiments. We are aware of the methodological considerations highlighted in the cited study and have extensive experience using a range of feeding assays in Drosophila, including both short- and long-term consumption assays (e.g., dye-based and CAFE assays), as well as automated platforms such as FLIC and FlyPAD (Nature Communications, 2022; Nature Metabolism, 2022; and Nature Metabolism, 2025)[2,4,5].

      For the dye-based assay, we carefully selected a 1-hour feeding window based on prior optimization. Since animals were not starved prior to the assay, shorter time points (e.g., 30 minutes) typically result in insufficient ingestion for reliable quantification. A 1-hour period provides a robust readout while remaining within the timeframe before significant label excretion occurs under our experimental conditions. To support the robustness of our findings, we complemented the dye-based assay with data from FLIC, which enables automated, high-resolution monitoring of feeding behavior in undisturbed animals over extended periods. The FLIC results were consistent with the dye-based data, strengthening our confidence in the conclusions. To minimize variability and ensure consistency across experiments, all feeding assays were performed at the same circadian time – Zeitgeber Time 0 (ZT0), corresponding to 10:00 AM when lights are turned on in our incubators. This time point coincides with the animals' natural morning feeding peak, allowing for reproducible comparisons across conditions. Regarding variability among flies within tubes, each biological replicate in the dye assay consisted of 1–2 flies, and results were averaged across multiple replicates. We observed good consistency across samples, suggesting that these small groups reliably reflect group-level feeding behavior under our conditions.

      (2) Biological replicates: whereas the number of samples is clearly reported in each figure, the number of biological replicates is not indicated. Please include this information either in Material and methods or in the relevant figure legends. Please also include a description of what was considered a biological replicate.

      We have now clarified in the Materials and Methods section under Statistics that all replicates represent independent biological samples, as suggested by the reviewers.

      (3) Control Lines: please indicate which control lines were used instead of citing another publication. If preferred, this information could be supplied as a supplementary table.

      We now provide a clear description of the control lines used in the Materials and Methods section. Specifically, all GAL4 and GAL80 lines used in this study were backcrossed for several generations into a shared w<sup>1118</sup> background and then crossed to the same w<sup>1118</sup> strain used as the genetic background for the UAS-RNAi, <i.CRISPR, or overexpression lines. This approach ensures, to a strong approximation, that the only difference between control and experimental animals is the presence or absence of the UAS transgene.

      (4) Statistical analyses: for some results (e.g., those shown in Figure 3d), it could be useful to test the interaction between genotype and treatment.

      We thank the reviewer for this helpful suggestion. In response, we have now performed two-way ANOVA analyses to assess genotype × treatment (diet) interaction effects for the relevant data, including those shown in Figure 3d as well as additional panels where animals were exposed to oxidative stress and sleep phenotypes were measured. We have added the corresponding interaction p-values in the updated figure legends for Figures 3d, 3k, 5a–c, 5f, 5h, 5i, 6c, 6e, and 7e. All of these tests revealed significant interaction effects, supporting the conclusion that the observed differences in sleep phenotypes are specifically dependent on the interaction between genetic manipulation (e.g., cytokine or receptor knockdown) and oxidative stress. These additions reinforce the interpretation that Unpaired cytokine signaling, glial JAK-STAT pathway activity, and AstA receptor regulation functionally interact with intestinal ROS exposure to modulate sleep. We thank the reviewer for suggesting this improvement.

      (5) Reporting of p values. Some are reported as specific values whereas others are reported as less than a specific value. Please make this reporting consistent across different figures.

      All p-values reported in the manuscript are exact, except in cases where values fall below p < 0.0001. In those instances, we use the inequality because the Prism software package (GraphPad, version 10), which was used for all statistical analyses, does not report more precise values. We believe this reporting approach reflects standard practice in the field.

      (6) Please include the color code used in each figure, either in the figure itself or in the legend.

      We have now clarified the color coding in all relevant figures. In particular, we acknowledge that the meaning of the half-colored circles used to indicate H<sub>2</sub>O<sub>2</sub> treatment was not previously explained. These have now been clearly labeled in each figure to indicate treatment conditions.

      (7) The scheme describing the experimental conditions and the associated chart is confusing. Please improve.

      We have improved the schematic by replacing “ROS” with “H<sub>2</sub>O<sub>2</sub>” to more clearly indicate the experimental condition used. Additionally, we have added the corresponding circle annotations so that they now also appear consistently above the relevant charts. This revised layout enhances clarity and helps readers more easily interpret the experimental conditions. We believe these changes address the reviewer’s concern and make the figure significantly more intuitive.

      8) Please indicate which line was used for upd-Gal4 and the evidence that it faithfully reflects upd3 expression.

      We have now clarified in the Materials and Methods section that the upd3-GAL4 line used in our study is Bloomington stock #98420, which drives GAL4 expression under the control of approximately 2 kb of sequence upstream of the upd3 start codon. This line has previously been used as a transcriptional reporter for upd3 activity. The only use of this line was to illustrate reporter expression in the EECs. To support this aspect of Upd3 expression, we now include new data in the revised manuscript using fluorescent in situ hybridization (FISH) against upd3, which confirms the presence of upd3 transcripts in prospero-positive EECs of the adult midgut (Fig. S1b). Additionally, we show that upd3 transcript levels are significantly reduced in dissected midguts following EEC-specific knockdown using multiple independent RNAi lines driven by voilà-GAL4, both alone and in combination with R57C10-GAL80, consistent with endogenous expression in these cells (Fig. 1a,b).

      To further address the reviewer’s concern and provide additional support for the endogenous expression of upd3 in EECs, we performed targeted knockdown experiments focusing on molecularly defined EEC subpopulations. The adult Drosophila midgut contains two major EEC subtypes characterized by their expression of Allatostatin C (AstC) or Tachykinin (Tk), which together encompass the vast majority of EECs. To selectively manipulate these populations, we used AstC-GAL4 and Tk-GAL4 drivers – both knock-in lines in which GAL4 is inserted at the respective endogenous hormone loci. This design enables precise GAL4 expression in AstC- or Tk-expressing EECs based on their native transcriptional profile. To eliminate confounding neuronal expression, we combined these drivers with R57C10GAL80, restricting GAL4 activity to the gut and generating AstC<sup>Gut</sup>> and Tk<sup>Gut</sup>> drivers. Using these tools, we knocked down upd2 and upd3 selectively in the AstC- or Tk-positive EECs. Knockdown of either cytokine in AstC-positive EECs significantly increased sleep under homeostatic conditions, recapitulating the phenotype observed with knockdown in all EECs (Fig. 1m-o). In contrast, knockdown of upd2 or upd3 in Tk-positive EECs had no effect on sleep (Fig. 1p-r). Furthermore, we show in the revised manuscript that selective knockdown of upd2 or upd3 in AstC-positive EECs abolishes the H<sub>2</sub>O<sub>2</sub>-induced increase in sleep (Fig. 3f–h). These findings demonstrate that Unpaired cytokine signaling from AstC-positive EECs is essential for mediating the sleep response to intestinal oxidative stress, highlighting this specific EEC subtype as a key source of cytokine-driven regulation in this context. These new results indicate that AstC-positive EECs are a primary source of the Unpaired cytokines that regulate sleep, while Tk-positive EECs do not appear to contribute to this function. Importantly, upd3 transcript levels were significantly reduced in dissected midguts following AstC<sup>Gut</sup> driven knockdown (Fig. S1r), further confirming that upd3 is endogenously expressed in AstC-positive EECs. Thus we have bolstered our confidence that upd3 is indeed expressed in EECs, as illustrated by the reporter line, through several means.

      (9) Please indicate which GFP line was used with upd-Gal4 (CD8, NLS, un-tagged, etc). The Material and Methods section states that it was "UAS-mCD8::GFP (#5137);", however, the stain does not seem to match a cell membrane pattern but rather a nuclear or cytoplasmic pattern. This information would help the interpretation of Figure 1C.

      We confirm that the GFP reporter line used with upd3-GAL4 was obtained from Bloomington stock #98420. As noted by the Bloomington Drosophila Stock Center, “the identity of the UAS-GFP transgene is a guess,” and the subcellular localization of the GFP fusion is therefore uncertain. We agree with the reviewer that the signal observed in Figure 1c does not display clear membrane localization and instead appears diffuse, consistent with cytoplasmic or partially nuclear localization. In any case, what we find most salient is the reporter’s labeling of Prospero-positive EECs in the adult midgut, consistent with upd3 expression in these cells. This conclusion is further supported by multiple lines of evidence presented in the revised manuscript, as mentioned above in response to question #8: (1) fluorescent in situ hybridization (FISH) for upd3 confirms expression in EECs (Fig. S1b), (2) EEC-specific RNAi knockdown of upd3 reduces transcript levels in dissected midguts, and (3) publicly available single-cell RNA sequencing datasets[3] also indicate that upd3 is expressed at low levels in a subset of adult midgut EECs under normal conditions. We have also clarified in the revised Materials and Methods section that GFP localization is undefined in the upd3-GAL4 line, to guide interpretation of the reporter signal.

      B- Results

      (1) Figure 1: According to previous work (10.1016/j.celrep.2015.06.009, http://flygutseq.buchonlab.com/data?gene=upd3%0D%0A), in basal conditions upd3 is expressed as following: ISC (35 RPKM), EB (98 RPKM), EC (57 RPKM), and EEC (8 RPKM). Accordingly, even complete KO in EECs should eliminate only a small fraction of upd3 from whole guts, even less considering the greater abundance of other cell types such as ECs compared to EECs. It would be useful to understand where this discrepancy comes from, in case it is affecting the conclusion of the manuscript. While this point per se does not affect the main conclusions of the manuscript, it makes the interpretation of the results more difficult.

      We acknowledge the previously reported low expression of upd3 in EECs. However, the FlyGut-seq site appears to be no longer available, so we could not directly compare other related genes. Nonetheless, our data – based on in situ hybridization, reporter expression, and multiple RNAi knockdowns – consistently support upd3 expression in EECs. These complementary approaches strengthen the conclusion that EECs are an important source of systemic upd3 under the conditions tested.

      (2) Figure 1: The upd2-3 mutants show sleep defects very similar to those of EEC>RNAi and >Cas9. It would thus be helpful to try to KO upd3 with other midgut drivers (An EC driver like Myo1A or 5966GS and a progenitor driver like Esg or 5961GS) to validate these results. Such experiments might identify precisely which cells are involved in the gut-brain signaling reported here.

      We appreciate the reviewer’s suggestion and agree that exploring other potential sources of Upd3 in the gut is an interesting direction. In this study, we have focused on EECs, which are the primary hormone-secreting cells in the intestine and thus the most likely candidates for mediating systemic effects such as gut-to-brain signaling. While it is possible that other gut cell types – such as enterocytes (e.g., Myo1A<sup>+</sup>) or intestinal progenitors (e.g., Esg<sup>+</sup>) – also contribute to Upd3 production, these cells are not typically endocrine in nature. Demonstrating their involvement in gutto-brain communication would therefore require additional, extensive validation beyond the scope of the current study. Importantly, our data show that manipulating Upd3 specifically in EECs is both necessary and sufficient to modulate sleep in response to intestinal ROS, strongly supporting the conclusion that EEC-derived cytokine signaling underlies the observed phenotype. In contrast, manipulating cytokines in other gut cells could produce indirect effects – such as altered proliferation, epithelial integrity, or immune responses – that complicate the interpretation of behavioral outcomes like sleep. For these reasons, we chose to focus on EECs as the source of endocrine signals mediating gut-to-brain communication. However, to address this point raised by the reviewer, we have now included a statement in the Discussion acknowledging that other non-endocrine gut cell types may also contribute to the systemic Unpaired signaling that modulates sleep in response to intestinal oxidative stress.

      (3) Figure 3: "This effect mirrored the upregulation observed with EEC-specific overexpression of upd3, indicating that it reflects physiologically relevant production of upd3 by the gut in response to oxidative stress." Please add (Figure 3a) at the end of this sentence.

      We have now added “(Figure 3a)” at the end of the sentence to clearly reference the relevant data.

      (4) For Figure 3b, do you have data showing that the increased amount of sleep was due to the addition of H2O2 per se, rather than the procedure of adding it?

      We have added new data to address this point. To ensure that the observed sleep increase was specifically due to the presence of H<sub>2</sub>O<sub>2</sub> and not an effect of the food replacement procedure, we performed a control experiment in which animals were fed standard food prepared using the same protocol and replaced daily, but without H<sub>2</sub>O<sub>2</sub>. These animals did not exhibit increased sleep, confirming that the sleep effect is attributable to intestinal ROS rather than the supplementation procedure itself (Fig. S3a). Thanks for the suggestion.

      (5) In the text it is stated that "Since 1% H2O2 feeding induced robust responses both in upd3 expression and in sleep behavior, we asked whether gut-derived Unpaired signaling might be essential for the observed ROS-induced sleep modulation. Indeed, EEC-specific RNAi targeting upd2 or upd3 abolished the sleep response to 1% H2O2 feeding." While it is indeed true that there is no additional increase in sleep time due to EEC>upd3 RNAi, it is also true that EEC>upd3 RNAi flies, without any treatment, have already increased their sleep in the first place. It is then possible that rather than unpaired signaling being essential, an upper threshold for maximum sleep allowed by manipulation of these processes was reached. It would be useful to discuss this point.

      Several findings argue against a ceiling effect and instead support a requirement for Unpaired signaling in mediating ROS-induced sleep. Animals with EEC-specific upd2 or upd3 knockdown or null mutation not only fail to increase sleep following H<sub>2</sub>O<sub>2</sub> treatment but actually exhibit reduced sleep during oxidative stress (Fig. 3e, k, l; Fig. 5e, f), suggesting that Unpaired signaling is required to sustain sleep under these conditions. Similarly, animals with glial dome knockdown also show reduced sleep under oxidative stress, closely mirroring the phenotype of EEC-specific upd3 RNAi animals (Fig. 5a–c, g–i). These results support the conclusion that gut-to-glia Unpaired cytokine signaling is necessary for maintaining elevated sleep during oxidative stress. In the absence of this signaling, animals exhibit increased wakefulness. We identify AstA as one such wake-promoting signal that is suppressed during intestinal stress. We present new data showing that this pathway is downregulated not only via Unpaired-JAK/STAT signaling in glial cells but also through reduced AstA release from the gut in the revised manuscript. This model, in which Unpaired cytokines promote sleep during intestinal stress by suppressing arousal pathways, is discussed throughout the manuscript to address the reviewer’s point.

      (6) In Figure 3k, the dots highlighting the experiment show an empty profile, a full one, and a half one. Please define what the half dots represent.

      We have now clarified the color coding in all relevant figures. Specifically, we acknowledge that the meaning of the half-colored circles indicating H<sub>2</sub>O<sub>2</sub> treatment was not previously defined – it indicates washout or recovery time. In the revised version, these symbols are now clearly labeled in each figure to indicate the treatment condition, ensuring consistent and intuitive interpretation across all panels.

      (7) The authors used appropriate GAL4 and RNAi lines to the knockdown dome, a upd2/3 JAK-STATlinked receptor, specifically in neurons and glia, respectively, in order to identify the CNS targets of upd2/3 cytokines produced by enteroendocrine cells (EECs). Pan-neuronal dome knockdown did not alter daytime sleep in adult females, yet pan-glial dome knockdown phenocopied effects of upd2/3 knockdown in EECs. They also observed that EEC-specific knockdown of upd2 and upd3 led to a decrease in JAK-STAT reporter activity in repo-positive glial cells. This supports the authors' conclusion that glial cells, not neurons, are the targets by which unpaired cytokines regulate sleep via JAK-STAT signaling. However, they do not show nighttime sleep data of pan-neuronal and pan-glial dome knockdowns. It would strengthen their conclusion if the nighttime sleep of pan-glial dome knockdown phenocopied the upd2/3 knockdowns as well, provided the pan-neuronal dome knockdown did not alter nighttime sleep.

      We have now added nighttime sleep data for both pan-glial and pan-neuronal domeless knockdowns in the revised manuscript (Fig. 2a). Glial knockdown increased nighttime sleep, similar to EEC-specific upd2/3 knockdown, while neuronal knockdown had no effect. These results further support the glial cells’ being the relevant target of gut-derived Unpaired signaling.

      (8) The authors only used one method to induce oxidative stress (hydrogen peroxide feeding). It would strengthen their argument to test multiple methods of inducing oxidative stress, such as lipopolysaccharide (LPS) feeding. In addition, it would be useful to use a direct bacterial infection to confirm that in flies, the infection promotes sleep. Additionally, flies deficient in Dome in the BBB and infected should not be affected in their sleep by the infection. These experiments would provide direct support for the mechanism proposed. Finally, the authors should add a primary reference for using ROS as a model of bacterial infection and justify their choice better.

      We agree that directly comparing different models of intestinal stress, such as bacterial infection or LPS feeding, would provide valuable insight into how gut-derived signals influence sleep in response to infection. As noted in our detailed responses above, we now include an expanded rationale for our use of H<sub>2</sub>O<sub>2</sub> feeding as a controlled and well-established method for inducing intestinal ROS – one of the key physiological responses to enteric infection and inflammation. In the revised Discussion, we explicitly acknowledge that pathogenic infections – which trigger both intestinal ROS and additional immune pathways – may engage distinct or complementary mechanisms compared to chemically induced oxidative stress. We emphasize the importance of future studies aimed at dissecting these differences. In fact, we are actively pursuing this direction in ongoing work examining sleep responses to enteric infection. For the purposes of the present study, however, we chose to focus on a tractable and specific model of ROS-induced stress to define the contribution of Unpaired cytokine signaling to gut-brain communication and sleep regulation. This approach allowed us to isolate the effect of oxidative stress from other confounding immune stimuli and identify a glia-mediated signaling mechanism linking gut epithelial stress to changes in sleep behavior.

      (9) To confirm that animals lacking EEC Unpaired signaling are not more susceptible to ROS-induced damage, the authors assessed the survival of upd2 and upd3 knockdowns on 1% H2O2 and concluded they display no additional sensitivity to oxidative stress compared to controls. It may be useful to include other tests of sensitivity to oxidative stress, in addition to survival.

      We appreciate the reviewer’s suggestion. In our view, survival is a highly informative and stringent readout, as it reflects the overall physiological capacity of the animal to withstand oxidative stress. Importantly, our data show that animals lacking EEC-derived Unpaired signaling do not exhibit reduced survival following H<sub>2</sub>O<sub>2</sub> exposure, indicating that their oxidative stress resistance is not compromised. Furthermore, we previously confirmed that feeding behavior is unaffected in these animals, suggesting that their ability to ingest food (and thus the stressor) is not impaired. As a molecular complement to these assays in response to this point and others, we have also performed an assessment of neuronal apoptosis (a TUNEL assay, Fig. S3f,g). This assay did not identify an increase in cell death in the brains of animals fed peroxide-containing medium. Thus, gross neurological health, behavior, and overall survival appear to be resilient to the environmental treatment regime we apply here, suggesting that the outcomes we observe arise from signaling per se.

      (10) The authors confirmed that animals lacking EEC-derived upd3 displayed sleep suppression similar to controls in response to starvation. These results led the authors to conclude that there is a specific requirement for EEC-derived Unpaired signaling in responding to intestinal oxidative stress. However, they previously showed that EEC-specific knockdown of upd3 and upd2 led to increased daytime sleep under normal feeding conditions. Their interpretations of their data are inconsistent.

      We appreciate the reviewer’s comment. While animals lacking EEC-derived Unpaired signaling show increased baseline sleep under normal feeding conditions, they still exhibit a robust reduction in sleep when subjected to starvation – comparable to that of control animals (Fig. S3h–j). This demonstrates that they retain the capacity to appropriately modulate sleep in response to metabolic stress. Thus, the sleep-promoting phenotype under normal conditions does not reflect a generalized inability to adjust sleep behavior. Rather, it highlights a specific role for Unpaired signaling in mediating sleep responses to intestinal oxidative stress, not in broadly regulating all sleep-modulating stimuli.

      (11) The authors report a significant increase in JAK-STAT activity in surface glial cells at ZT0 in animals fed 1% H2O2-containing food for 20 hours. This response was abolished in animals with EECspecific knockdown of upd2 or upd3. The authors confirmed there were no unintended neuronal effects on upd2 or upd3 expression in the heads. They also observed an upregulation of dome transcript levels in the heads of animals with EEC-specific knockdown of upd3 fed 1% H2O2-containing food for 15 hours, which they interpret to be a compensatory mechanism in response to low levels of the ligand. This assay is inconsistent with previous experiments in which animals were fed hydrogen peroxide for 20 hours.

      We thank the reviewer for identifying this discrepancy. The inconsistency arose from a labeling error in the manuscript. Both the JAK-STAT reporter assays in glial cells and the dome expression measurements were performed following 15 hours of H<sub>2</sub>O<sub>2</sub> feeding, not 20 hours as previously stated. We have now corrected this in the revised manuscript.

      (12) The authors show that animals with glia-specific dome knockdown did not have decreased survival on H2O2-containing food, and displayed normal rebound sleep in the morning following sleep deprivation. These results potentially undermine the significance of the paper. If the normal sleep response to oxidative stress is an important protective mechanism, why would oxidative stress not decrease survival in dome knockdown flies (that don't have the normal sleep response to oxidative stress)? This suggests that the proposed mechanism is not important for survival. The authors conclude that Dome-mediated JAK-STAT signaling in the glial cells specifically regulates ROS-induced sleep responses, which their results support.

      We agree that our survival data show that glial dome knockdown does not reduce survival under continuous oxidative stress. However, we believe this does not undermine the importance of the sleep response as an adaptive mechanism. In our survival assay, animals were continuously exposed to 1% H<sub>2</sub>O<sub>2</sub> without the opportunity to recover. In contrast, under natural conditions, oxidative stress is likely to be intermittent, and the ability to mount a sleep response may be particularly important for promoting recovery and maintaining homeostasis during or after transient stress episodes. Thus, while the JAK-STAT-mediated sleep response may not directly enhance survival under constant oxidative challenge, it likely plays a critical role in adaptive recovery under natural conditions.

      (13) Altogether, the authors conclude that enteric oxidative stress induces the release of Unpaired cytokines which activate the JAK-STAT pathway in subperineurial glia of the BBB, which leads to the glial downregulation of receptors for AstA, which is a wake-promoting factor also released by EECs. This mechanism is supported by their results, however, this research raises some intriguing questions, such as the role of upd2 versus upd3, the role of AstA-R1 versus AstA-R2, the importance of this mechanism in terms of survival, the sex-specific nature of this mechanism, and the role that nutritional availability plays in the dual functionality of Unpaired cytokine signaling in regards to sleep.

      We thank the reviewer for highlighting these important questions. Our data suggest that Upd2 and Upd3, while often considered partially redundant, both contribute to sleep regulation, with stronger effects observed for Upd3. This is consistent with prior studies indicating overlapping but non-identical roles for these cytokines. Similarly, although AstA-R1 and AstA-R2 can both be activated by AstA, knockdown of AstA-R2 consistently produces more robust sleep phenotypes, suggesting a predominant role in mediating this effect. The possibility of sex-specific regulation is indeed compelling. While our study focused on females, many gut hormones show sex-dependent activity, and we recognize this as an important avenue for future research. Finally, we have included new data in the revised manuscript showing that gut-derived AstA is downregulated under oxidative stress, further supporting our model in which Unpaired signaling suppresses arousal pathways during intestinal stress

      (14)Data Availability: It is indicated that: "Reasonable data requests will be fulfilled by the lead author". However, eLife's guidelines for data sharing require that all data associated with an article to be made freely and widely available.

      We thank the reviewer for pointing this out. We have revised the Data Availability section of the manuscript to clarify that all data will be made freely available from the lead contact without restriction, in accordance with eLife’s open data policy.

      References

      (1) Li, Y., Zhou, X., Cheng, C., Ding, G., Zhao, P., Tan, K., Chen, L., Perrimon, N., Veenstra, J.A., Zhang, L., and Song, W. (2023). Gut AstA mediates sleep deprivaPon-induced energy wasPng in Drosophila. Cell Discov 9, 49. 10.1038/s41421-023-00541-3. (2) Ahrentlov, N., Kubrak, O., Lassen, M., Malita, A., Koyama, T., Frederiksen, A.S., Sigvardsen, C.M., John, A., Madsen, P., Halberg, K.A., et al. (2025). Protein-responsive gut hormone Tachykinin directs food choice and impacts lifespan. Nature Metabolism. 10.1038/s42255-025-01267-0.

      (3) Li, H., Janssens, J., De Waegeneer, M., Kolluru, S.S., Davie, K., Gardeux, V., Saelens, W., David, F.P.A., Brbic, M., Spanier, K., et al. (2022). Fly Cell Atlas: A single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432. 10.1126/science.abk2432.

      (4) Kubrak, O., Koyama, T., Ahrentlov, N., Jensen, L., Malita, A., Naseem, M.T., Lassen, M., Nagy, S., Texada, M.J., Halberg, K.V., and Rewitz, K. (2022). The gut hormone AllatostaPn C/SomatostaPn regulates food intake and metabolic homeostasis under nutrient stress. Nature communicaPons 13, 692. 10.1038/s41467-022-28268-x.

      (5) Malita, A., Kubrak, O., Koyama, T., Ahrentlov, N., Texada, M.J., Nagy, S., Halberg, K.V., and Rewitz, K. (2022). A gut-derived hormone suppresses sugar appePte and regulates food choice in Drosophila. Nature Metabolism 4, 1532-1550. 10.1038/s42255-022-00672-z.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Changes in wing morphology..." Roy et al investigate the potential allometric scaling in wing morphology and wing kinematics in 8 different hoverfly species. Their study nicely combines different new and classic techniques, investigating flight in an important, yet understudied alternative pollinator. I want to emphasize that I have been asked to review this from a hoverfly biology perspective, as I do not work on flight kinematics. I will thus not review that part of the work.

      Strengths:

      The paper is well-written and the figures are well laid out. The methods are easy to follow, and the rationale and logic for each experiment are easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      The ability to hover is described as useful for either feeding or mating. However, several of the North European species studied here would not use hovering for feeding, as they tend to land on the flowers that they feed from. I would therefore argue that the main selection pressure for hovering ability could be courtship and mating. If the authors disagree with this, they could back up their claims with the literature.

      We thank the reviewer for this insight on potential selection pressures on hovering flight. As suggested, we now put the main emphasize on selection related to mating flight (lines 106–111).

      On that note, a weakness of this paper is that the data for both sexes are merged. If we agree that hovering may be a sexually dimorphic behaviour, then merging flight dynamics from males and females could be an issue in the interpretation. I understand that separating males from females in the movies is difficult, but this could be addressed in the Discussion, to explain why you do not (or do) think that this could cause an issue in the interpretation.

      We acknowledge that not distinguishing sexes in the flight experiment prevents investigating the hypothesis that selection may act especially on male’s flight. This weakness was not addressed in our first manuscript and is now discussed in the revised Discussion section. We nuanced the interpretation and suggested further investigation on flight dimorphism (lines 726–729).

      The flight arena is not very big. In my experience, it is very difficult to get hoverflies to fly properly in smaller spaces, and definitely almost impossible to get proper hovering. Do you have evidence that they were flying "normally" and not just bouncing between the walls? How long was each 'flight sequence'? You selected the parts with the slowest flight speed, presumably to get as close to hovering as possible, but how sure are you that this represented proper hovering and not a brief slowdown of thrust?

      We very much agree with the reviewer that flight studied in laboratory conditions does not perfectly reflects natural flight behavior. Moreover, having individual hoverflies performing stable hovering in the flight arena, in the intersecting field of view of all three cameras, is quite challenging. Therefore, we do not claim that we studied “true” hovering (i.e. flight speed = 0 m/s), but that we attempted to get as close as possible to true hovering by selecting the flight sections with the lowest flight speeds for our analysis.

      In most animal flight studies, hovering is defined as flight with advance ratios J<0.1, i.e. when the forward flight speed is less than 10% of the wingbeat-induced speed of the wingtip (Ellington, 1984a; Fry et al., 2005; Liu and Sun, 2008). By selecting the low flight-speed wingbeats for our analysis, the mean advance ratio in our experiment was 0.08±0.02 (mean±sd), providing evidence that the hoverflies were operating close to a hovering flight mode. This is explained in both the methods and results sections (lines 228–231 and 467–469, respectively).

      We however acknowledge that this definition of hovering, although generally accepted, is not perfect. We edited the manuscript to clarify that our experiment does not quantify perfect hovering (lines 186–188). We moreover added the mean±sd duration of the recorded flight sequence from which the slowest wingbeat was selected (line 179), as this info was missing, and we further describe the behaviour of the hoverflies during the experiment (lines 168–169).

      Your 8 species are evolutionarily well-spaced, but as they were all selected from a similar habitat (your campus), their ecology is presumably very similar. Can this affect your interpretation of your data? I don't think all 6000 species of hoverflies could be said to have similar ecology - they live across too many different habitats. For example, on line 541 you say that wingbeat kinematics were stable across hoverfly species. Could this be caused by their similar habitat?

      We agree with the reviewer that similarity in habitat and ecology might partially explain the similarity in the wingbeat kinematics that we observe. But this similarity in ecology between the eight studied species is in fact a design feature of our study. Here, we aim to study the effect of size on hoverfly flight, and so we designed our study such that we maximize size differences and phylogenetic spread among the eight species, while minimizing variations in habitat, ecology and flight behavior (~hovering). This allows us to best test for the effect of differences in size on the morphology, kinematics and aerodynamics of hovering flight.

      Despite this, we agree with the reviewer that it would be interesting to test whether the observed allometric morphological scaling and kinematic similarity is also present beyond the species that we studied. In our revision, we therefore extended our analysis to address this question. Performing additional flight experiments and fluid mechanics simulations was beyond the scope of our current study, but extending the morphological scaling analyses was certainly possible.

      In our revised study, we therefore extended our morphological scaling analysis by including the morphology of twenty additional hoverfly species. This extended dataset includes wing morphology data of 74 museum specimens from Naturalis Biodiversity Centre (Leiden, the Netherlands), including two males and two females per species, whenever possible (4.2±1.7 individuals per species (mean±sd)). This extended analysis shows that the allometric scaling of wing morphology with size is robust along the larger sample of species, from a wider range of habitats and ecologies. Nevertheless, we advocate for additional flight measurement in species from different habitats to ascertain the generality of our results (lines 729–732).

      Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across eight hoverfly species that differ in body mass; the aim is to identify how weight support during hovering is ensured. Wing shape and relative wing size vary significantly with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology and that these changes enabled hoverflies to decrease in size throughout their phylogenetic history. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be under strong evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analysis, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly and convincingly places the results in broad biomechanical, ecological, evolutionary, and comparative contexts.

      We thank the reviewer for appreciating the strengths of our study.

      Weaknesses

      (1) In assessing evolutionary allometry, it is key to identify the variation expected from changes in size alone. The null hypothesis for wing morphology is well-defined (isometry), but the equivalent predictions for kinematic parameters remain unclear. Explicit and well-justified null hypotheses for the expected size-specific variation in angular velocity, angle-of-attack, stroke amplitude, and wingbeat frequency would substantially strengthen the paper, and clarify its evolutionary implications.

      We agree with the reviewer that the expected scaling of wingbeat kinematics with size was indeed unclear in our initial version of the manuscript. In our revised manuscript (and supplement), we now explicitly define how all kinematic parameters should scale with size under kinematic similarity, and how they should scale for maintaining weight support across various sizes. These are explained in the introduction (lines 46–78), method section (lines 316–327), and dedicated supplementary text (see Supplementary Info section “Geometric and kinematic similarity and scaling for weight support”). Here, we now also provide a thorough description of the isometric scaling of morphology, and scaling of the kinematics parameters under kinematic similarity.

      (2) By relating the aerodynamic output force to wing morphology and kinematics, it is concluded that smaller hoverflies will find it more challenging to support their body mass - a scaling argument that provides the framework for this work. This hypothesis appears to stand in direct contrast to classic scaling theory, where the gravitational force is thought to present a bigger challenge for larger animals, due to their disadvantageous surface-to-volume ratios. The same problem ought to occur in hoverflies, for wing kinematics must ultimately be the result of the energy injected by the flight engine: muscle. Much like in terrestrial animals, equivalent weight support in flying animals thus requires a positive allometry of muscle force output. In other words, if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too (but not vice versa). Clarifying the relation between the scaling of muscle force input, wing kinematics, and weight support would resolve the conflict between these two contrasting hypotheses, and considerably strengthen the biomechanical motivation and interpretation.

      The reviewer highlights a crucial aspect of our study: our perspective on the aerodynamic challenges associated with becoming smaller or larger. This comment made us realize that our viewpoint might be unconventional regarding general scaling literature and requires further clarification.

      Our approach is focused on the disadvantage of a reduction in size, in contrast with classic scaling theory focusing on the disadvantage of increasing in size. As correctly stated by the reviewer, producing an upward directed force to maintain weight support is often considered as the main challenge, constrained by size. Hereby, researchers often focus on the limitations on the motor system, and specifically muscle force: as animals increase in size, the ability to achieve weight support is limited by muscle force availability. An isometric growth in muscle cannot sustained the increased weight, due to the disadvantageous surface-to-volume ratio.

      In animal flight, this detrimental effect of size on the muscular motor system is also present, particularly for large flying birds. But for natural flyers, there is also a detrimental effect of size on the propulsion system, being the flapping wings. The aerodynamic forces produced by a beating wing scales linearly with the second-moment-of-area of the wing. Under isometry, this second-moment-of-area decreases at higher rate than body mass, and thus producing enough lift for weight support becomes more challenging with reducing size. Because we study tiny insects, our study focuses precisely on this constraint on the wing-based propulsion system, and not on the muscular motor system.

      We revised the manuscript to better explain how physical scaling laws differentially affect force production by the muscular flight motor system and the wingbeat-induced propulsion system (lines 46–78).

      (3) The main conclusion - that evolutionary miniaturization is enabled by changes in wing morphology - is only weakly supported by the evidence. First, although wing morphology deviates from the null hypothesis of isometry, the difference is small, and hoverflies about an order of magnitude lighter than the smallest species included in the study exist. Including morphological data on these species, likely accessible through museum collections, would substantially enhance the confidence that size-specific variation in wing morphology occurs not only within medium-sized but also in the smallest hoverflies, and has thus indeed played a key role in evolutionary miniaturization.

      We thank the reviewer for the suggestion to add additional specimens from museum collections to strengthen the conclusions of our work. In our revised study, we did so by adding the morphology of 20 additional hoverfly species, from the Naturalis Biodiversity Centre (Leiden, the Netherlands). This extended dataset includes wing morphology data of 74 museum specimens, and whenever possible we sampled at least two males and two females (4.2±1.7 individuals per species (mean±sd)). This extended analysis shows that the allometric scaling of wing morphology with size is robust along the larger sample of species, including smaller ones. We discuss these additional results now explicitly in the revised manuscript (see Discussion).

      Second, although wing kinematics do not vary significantly with size, clear trends are visible; indeed, the numerical simulations revealed that weight support is only achieved if variations in wing beat frequency across species are included. A more critical discussion of both observations may render the main conclusions less clear-cut, but would provide a more balanced representation of the experimental and computational results.

      We agree with the reviewer that variations in wingbeat kinematics between species, and specifically wingbeat frequency, are important and non-negligible. As mentioned by the reviewer, this is most apparent for the fact that weight support is only achieved with the species-specific wingbeat frequency. To address this in a more balanced and thorough way, we revised the final section of our analysis approach, by including changes in wingbeat kinematics to that analysis. By doing so, we now explicitly show that allometric changes in wingbeat frequency are important for maintaining weight support across the sampled size range, but that allometric scaling of morphology has a stronger effect. In fact, the relative contributions of morphology and kinematics to maintaining weight-support across sizes is 81% and 22%, respectively (Figure 7). We discuss this new analysis and results now thoroughly in the revised manuscript (lines 621–629, 650–664), resulting in a more balanced discussion and conclusion about the outcome of our study. We sincerely thank the reviewer for suggesting to look closer into the effect of variations in wingbeat kinematics on aerodynamic force production, as the revised analysis strengthened the study and its results.

      In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship. It also illustrates a key difficulty for the field: comparative data is challenging and time-consuming to procure, and behavioural parameters are characteristically noisy. Major methodological advances are needed to obtain data across large numbers of species that vary drastically in size with reasonable effort, so that statistically robust conclusions are possible.

      We thank the reviewer for their encouraging words about the scholarship of our work. We will continue to improve our methods and techniques for performing comparative evolutionary biomechanics research, and are happy to jointly develop this emerging field of research.

      Reviewer #3 (Public review):

      The paper by Le Roy and colleagues seeks to ask whether wing morphology or wing kinematics enable miniaturization in an interesting clade of agile flying insects. Isometry argues that insects cannot maintain both the same kinematics and the same wing morphology as body size changes. This raises a long-standing question of which varies allometrically. The authors do a deep dive into the morphology and kinematics of eight specific species across the hoverfly phylogeny. They show broadly that wing kinematics do not scale strongly with body size, but several parameters of wing morphology do in a manner different from isometry leading to the conclusion that these species have changed wing shape and size more than kinematics. The authors find no phylogenetic signal in the specific traits they analyze and conclude that they can therefore ignore phylogeny in the later analyses. They use both a quasi-steady simplification of flight aerodynamics and a series of CFD analyses to attribute specific components of wing shape and size to the variation in body size observed. However, the link to specific correlated evolution, and especially the suggestion of enabling or promoting miniaturization, is fraught and not as strongly supported by the available evidence.

      We thank the reviewer for the accurate description of our work, and the time and energy put into reviewing our paper. We regret that the reviewer found our conclusions with respect to miniaturization fraught and not strongly supported by the evidence. In our revision, we addressed this by no longer focusing primarily on miniaturization, by extending our morphology analysis to 20 additional species (Figures 4 and 5), improving our analysis of both the kinematics and morphology data (Figure 7), and by discussing our results in a more balanced way (see Discussion). We hope that the reviewer finds the revised manuscript of sufficient quality for publication in eLife.

      The aerodynamic and morphological data collection, modeling, and interpretation are very strong. The authors do an excellent job combining a highly interpretable quasi-steady model with CFD and geometric morphometrics. This allows them to directly parse out the effects of size, shape, and kinematics.

      We thank the reviewer for assessing our experimental and modelling approach as very strong.

      Despite the lack of a relationship between wing kinematics and size, there is a large amount of kinematic variation across the species and individual wing strokes. The absolute differences in Figure 3F - I could have a very large impact on force production but they do indeed not seem to change with body size. This is quite interesting and is supported by aerodynamic analyses.

      We agree with the reviewer that there are important and non-negligible variations in wingbeat kinematics between species. As mentioned by the reviewer, although these kinematics do not significant scale with body mass, the interspecific variations are important for maintaining weight support during hovering flight. We thus also agree with the reviewer that these kinematics variations are interesting and deserve further investigations.

      In our revised study, we did so by including these wingbeat kinematic variations in our analysis on the effect of variations in morphology and kinematics on aerodynamic force production for maintaining in-flight weight support across the sampled size range (lines 422–444, Figure 7). By doing so, we now explicitly show that variations in wingbeat kinematics are important for maintaining weight across sizes, but that allometric scaling of morphology has a stronger effect. In fact, the relative contributions of adaptations in morphology and kinematics to maintaining weight support across sizes is 81% and 22%, respectively (Figure 7). We discuss these new analysis and results now in the revised manuscript (lines 621–629, 650–664), resulting in a more balanced discussion about the relative importance of adaptations in morphology and kinematics. We hope the reviewer appreciates this newly added analysis.

      The authors switch between analyzing their data based on individuals and based on species. This creates some pseudoreplication concerns in Figures 4 and S2 and it is confusing why the analysis approach is not consistent between Figures 4 and 5. In general, the trends appear to be robust to this, although the presence of one much larger species weighs the regressions heavily. Care should be taken in interpreting the statistical results that mix intra- and inter-specific variation in the same trend.

      We agree that it was sometimes unclear whether our analysis is performed at the individual or species level. To improve clarity and avoid pseudoreplication, we now analyze all data at the species level, using phylogenetically informed analyses. Because we think that showing within-species variation is nonetheless informative, we included dedicated figures to the supplement (Figures S3 and S5) in which we show data at the individual level, as equivalent to figures 4 and 5 with data at the species level. Note that this cannot be done for flight data due to our experimental procedure. Indeed, we performed flight experiments with multiple individuals in a single experimental setup, pseudoreplication is thus possible for these flight data. This is explained in the manuscript (lines 167–175). All morphological measurements were however done on a carefully organized series of specimens and thus pseudoreplication is hereby not possible.

      The authors based much of their analyses on the lack of a statistically significant phylogenetic signal. The statistical power for detecting such a signal is likely very weak with 8 species. Even if there is no phylogenetic signal in specific traits, that does not necessarily mean that there is no phylogenetic impact on the covariation between traits. Many comparative methods can test the association of two traits across a phylogeny (e.g. a phylogenetic GLM) and a phylogenetic PCA would test if the patterns of variation in shape are robust to phylogeny.

      After extending our morphological dataset from 8 to 28 species, by including 20 additional species from a museum collection, we increased statistical power and found a significant phylogenetic signal on all morphological traits, except for the second moment of area (lines 458–460, Table S2). Although we do not detect an effect of phylogeny on flight traits, likely due to the limited number of species for which flight was quantified (n=8), we agree with the reviewer’s observation that the absence of a phylogenetic signal does not rule out the potential influence of phylogeny on the covariation between traits. This is now explicitly discussed in the manuscript (lines 599–608). As mentioned in the previous comment, we now test all relationships between body mass and other traits using phylogenetic generalized least squares (PGLS) regressions, therefore accounting for the impact of phylogeny everywhere. The revised analyses produce sensibly similar results as for our initial study, and so the main conclusions remain valid. We sincerely thank the reviewer for their suggestion for revising our statistical analysis, because the revised phylogenetic analysis strengthens our study as a whole.

      The analysis of miniaturization on the broader phylogeny is incomplete. The conclusion that hoverflies tend towards smaller sizes is based on an ancestral state reconstruction. This is difficult to assess because of some important missing information. Specifically, such reconstructions depend on branch lengths and the model of evolution used, which were not specified. It was unclear how the tree was time-calibrated. Most often ancestral state reconstructions utilize a maximum likelihood estimate based on a Brownian motion model of evolution but this would be at odds with the hypothesis that the clade is miniaturizing over time. Indeed such an analysis will be biased to look like it produces a lot of changes towards smaller body size if there is one very large taxa because this will heavily weight the internal nodes. Even within this analysis, there is little quantitative support for the conclusion of miniaturization, and the discussion is restricted to a general statement about more recently diverged species. Such analyses are better supported by phylogenetic tests of directedness in the trait over time, such as fitting a model with an adaptive peak or others.

      We thank the reviewer for their expert insight in our ancestral state estimate of body size. We agree that the accuracy of this estimate is rather low. Based on the comments by the reviewer we have now revised our main analysis and results, by no longer basing it on the apparent evolutionary miniaturization of hoverflies, but instead on the observed variations in size in our studied hoverfly species. As a result, we removed the figure mapping ancestral state estimates (called figure S1 in the first version) from the manuscript. We now explicitly mention that ascertaining the evolutionary directedness of body size is beyond the scope of our work, but that we nonetheless focus on the aerodynamic challenge of size reduction (lines 609–615).

      Setting aside whether the clade as a whole tends towards smaller size, there is a further concern about the correlation of variation in wing morphology and changes in size (and the corresponding conclusion about lack of co-evolution in wing kinematics). Showing that there is a trend towards smaller size and a change in wing morphology does not test explicitly that these two are correlated with the phylogeny. Moreover, the subsample of species considered does not appear to recapitulate the miniaturization result of the larger ancestral state reconstruction.

      As also mentioned above, we agree with the reviewer that we cannot ascertain the trajectory of body size evolution in the diversification of hoverflies. We therefore revised our manuscript such that we do no longer focus explicitly on miniaturization; instead, we discuss how morphology and kinematics scale with size, independently of potential trends over the phylogeny. To do so, we revised the title, abstract results and discussion accordingly.

      Given the limitations of the phylogenetic comparative methods presented, the authors did not fully support the general conclusion that changes in wing morphology, rather than kinematics, correlate with or enable miniaturization. The aerodynamic analysis across the 8 species does however hold significant value and the data support the conclusion as far as it extends to these 8 species. This is suggestive but not conclusive that the analysis of consistent kinematics and allometric morphology will extend across the group and extend to miniaturization. Nonetheless, hoverflies face many shared ecological pressures on performance and the authors summarize these well. The conclusions of morphological allometry and conserved kinematics are supported in this subset and point to a clade-wide pattern without having to support an explicit hypothesis about miniaturization.

      The reviewer argues here fully correct that we should be careful about extending our analysis based on eight species to hoverflies in general, and especially to extend it to miniaturization in this family of insects. As mentioned above, we therefore do no longer specifically focus on miniaturization. Moreover, we extended our analysis by including the morphology of 20 additional species of hoverflies, sampled from a museum collection. We hope that the reviewer agrees with this more balanced and focused discussion of our study.

      The data and analyses on these 8 species provide an important piece of work on a group of insects that are receiving growing attention for their interesting behaviors, accessibility, and ecologies. The conclusions about morphology vs. kinematics provide an important piece to a growing discussion of the different ways in which insects fly. Sometimes morphology varies, and sometimes kinematics depending on the clade, but it is clear that morphology plays a large role in this group. The discussion also relates to similar themes being investigated in other flying organisms. Given the limitations of the miniaturization analyses, the impact of this study will be limited to the general question of what promotes or at least correlates with evolutionary trends towards smaller body size and at what phylogenetic scale body size is systematically decreasing.

      We thank the reviewer for their encouraging words about the importance of our work on hoverfly flight. As suggested by the reviewer, we narrowed down the main question of our study by no longer focusing on apparent miniaturization, but instead on the correlation between wing morphology, wingbeat kinematics and variations in size.

      In general, there is an important place for work that combines broad phylogenetic comparison of traits with more detailed mechanistic studies on a subset of species, but a lot of care has to be taken about how the conclusions generalize. In this case, since the miniaturization trend does not extend to the 8 species subsample of the phylogeny and is only minimally supported in the broader phylogeny, the paper warrants a narrower conclusion about the connection between conserved kinematics and shared life history/ecology.

      We truly appreciated the reviewer’s positive assessment of the importance of our work and study. We also thank the reviewer for their advice to generalize the outcome of our work in a more balanced way. Based on the above comments and suggestions of the reviewer, we did so by revising several aspects of our study, including adding additional species to our study, amending the analysis, and revising the title, abstract, results and discussion sections. We hope that the reviewer warrants the revised manuscript of sufficient quality for final publication in eLife.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations for the authors):

      Figure S1 is lovely. I would recommend merging it with Figure 1 so that it does not disappear.

      We appreciate the reviewer comment. However, reviewer 3 had several points of concern about the underlying analysis, which made us realize that our ancestral state estimation analysis does not conclusively support a miniaturization trend. We therefore are no longer focusing on miniaturization when interpreting our results.

      Figure 4 is beautiful. The consistent color coding throughout is very helpful.

      We thank the reviewer for this comment.

      Sometimes spaces are missing before brackets, and sometimes there are double brackets, or random line break.

      We did our best to remove these typos.

      Should line 367 refer to Table S2?

      Table S2 is now referred to when mentioning the result of phylogenetic signal (line 460 in the revised manuscript)

      Can you also refer to Figure 2 on line 377?

      Good suggestion, and so we now do so (line 462 in the revised manuscript).

      Lines 497-512: Please refer to relevant figures.

      We now refer to figure 4, and its panels (lines 621–629 in the revised manuscript).

      Figure legend 1: Do you need to say that the second author took the photos?

      We removed this reference.

      Figure legend 4: "(see top of A and B)" is not aligned with the figure layout.

      We corrected this.

      Figure 5 seems to have a double legend, A, B then A, B. Panel A says it's color-coded for body mass, but the figure seems to be color-coded for species.

      Thank you for noting this. We corrected this in the figure legend.

      Figure 6 legend: Can you confidently say that they were hovering, or do you need to modify this to flying?

      The CFD simulations were performed in full hovering (U<sub>¥</sub>=0 m/s), but any true flying hoverflies will per definition never hover perfectly. But as explained in our manuscript, we define a hovering flight mode as flying with advance ratios smaller than 0.1 (Ellington, 1984a). Based on this we can state that our hoverflies were flying in a hovering mode. We hope that the reviewer agrees with this approach.

      Reviewer #2 (Recommendations for the authors):

      Below, I provide more details on the arguments made in the public review, as well as a few additional comments and observations; further detailed comments are provided in the word document of the manuscript file, which was shared with the authors via email (I am not expecting a point-by-point reply to all comments in the word document!).

      We thank the reviewer for this detailed list of additional comments, here and in the manuscript. As suggested by the reviewer, we did not provide a point-by-point respond to all comments in the manuscript file, but did take them into account when improving our revised manuscript. Most importantly, we now define explicitly kinematic similarity as the equivalent from morphological similarity (isometry), we added a null hypothesis and the proposed references, and we revised the figures based on the reviewer suggestions.

      Null hypotheses for kinematic parameters.

      Angular amplitudes should be size-invariant under isometry. The angular velocity is more challenging to predict, and two reasonable options exist. Conservation of energy implies:

      W = 1/2 I ω2

      where I is the mass moment of inertia and W is the muscle work output (I note that this result is approximate, for it ignores external forces; this is likely not a bad assumption to first order. See the reference provided below for a more detailed discussion and more complicated calculations). From this expression, two reasonable hypotheses may be derived.

      First, in line with classic scaling theory (Hill, Borelli, etc), it may be assumed that W∝m; isometry implies that I∝m5/3 from which ω ∝m-1/3 follows at once. Note well the implication with respect to eq. 1: isometry now implies F∝m2/3, so that weight support presents a bigger challenge for larger animals; this result is completely analogous to the same problem in terrestrial animals, which has received much attention, but in strong contrast to the argument made by the authors: weight support is more challenging for larger animals, not for smaller animals.

      Second, in line with recent arguments, one may surmise that the work output is limited by the muscle shortening speed instead, which, assuming isometry and isophysiology, implies ω ∝m0 = constant; smaller animals would then indeed be at a seeming disadvantage, as suggested by the authors (but see below).

      The following references contain a more detailed discussion of the arguments for and against these two possibilities:

      Labonte, D. A theory of physiological similarity for muscle-driven motion. PNAS, 2023, 120, e2221217120

      Labonte, D.; Bishop, P.; Dick, T. & Clemente, C. J. Dynamics similarity and the peculiar allometry of maximum running speed. Nat Comms., 2024, 15, 2181

      Labonte, D. & Holt, N. Beyond power limits: the kinetic energy capacity of skeletal muscle. bioRxiv doi: 10.1101/2024.03.02.583090, 2024

      Polet, D. & Labonte, D. Optimising the flow of mechanical energy in musculoskeletal systems through gearing. bioRxiv doi: 10.1101/2024.04.05.588347, 2024

      Labonte et al 2024 also highlight that, due to force-velocity effects, the scaling of the velocity that muscle can impart will fall somewhere in between the extremes presented by the two hypotheses introduced above, so that, in general, the angular velocity should decrease with size with a slope of around -1/6 to -2/9 --- very close to the slope estimated in this manuscript, and to data on other flying animals.

      We greatly appreciate the reviewer's detailed insights on null hypotheses for kinematics, along with the accompanying references. As noted in the Public Review section (comment/reply 2.3), our study primarily explores how small-sized insects adapt to constraints imposed by the wing-based propulsion system, rather than by the muscular motor system.

      In this context, we chose to contrast the observed scaling of morphology and flight traits with a hypothetical scenario of geometric similarity (isometry) and kinematic similarity, where all size-independent kinematic parameters remain constant with body mass. While isometric expectations for morphological traits are well-defined (i.e., ), those for kinematic traits are more debatable (as pointed out by the reviewer). For this reason, we believe that adopting a simple approach based on kinematic similarity across sizes (f~m0, etcetera) enhances the interpretability of our results and strengthens the overall narrative.

      Size range

      The study would significantly benefit from a larger size range; it is unreasonable to ask for kinematic measurements, as these experiments become insanely challenging as animals get smaller; but it should be quite straightforward for wing shape and size, as this can be measured with reasonable effort from museum specimens. In particular, if a strong point on miniaturization is to be made, I believe it is imperative to include data points for or close to the smallest species.

      We appreciate that the reviewer recognizes the difficulty of performing additional kinematic measurements. Collecting additional morphological data to extend the size range was however feasible. In our revised study, we therefore extended our morphological scaling analysis by including the morphology of twenty additional hoverfly species. This extended dataset includes wing morphology data of 74 museum specimens (4.2±1.7 individuals per species (mean±sd)) from Naturalis Biodiversity Centre (Leiden, the Netherlands). This increased the studied mass range of our hoverfly species from 5 100 mg to 3 132 mg, and strengthened our results and conclusions on the morphological scaling in hoverflies.

      Is weight support the main problem?

      Phrasing scaling arguments in terms of weight support is consistent with the classic literature, but I am not convinced this is appropriate (neither here nor in the classic scaling literature): animals must be able to move, and so, by strict physical necessity, muscle forces must exceed weight forces; balancing weight is thus never really a concern for the vast majority of animals. The only impact of the differential scaling may be a variation in peak locomotor speed (this is unpacked in more detail in the reference provided above). In other words, the very fact that these hoverfly species exist implies that their muscle force output is sufficient to balance weight, and the arguably more pertinent scaling question is how the differential scaling of muscle and weight force influences peak locomotor performance. I appreciate that this is beyond the scope of this study, but it may well be worth it to hedge the language around the presentation of the scaling problem to reflect this observation, and to, perhaps, motivate future work.

      We agree with the reviewer that a question focused on muscle force would be inappropriate for this study, as muscle force and power availability is not under selection in the context of hovering flight, but instead in situation where producing increased output is advantageous (for example during take-off or rapid evasive maneuvers). But as explained in our revised manuscript (lines 81-85), we here do not focus on the scaling of the muscular motor with size and throughout phylogeny, but instead we focus on scaling of the flapping wing-based propulsion system. For this system there are known physical scaling laws that predict how this propulsion system should scale with size (in morphology and kinematics) for maintaining weight-support across sizes. In our study, we test in what way hoverflies achieve this weight support in hovering flight.

      Of course, it would be interesting to also test how peak thrust is produced by the propulsion system, for example during evasive maneuvers. In the revised manuscript, we now explicitly mention this as potential future research (lines 733–735).

      Other relevant literature

      Taylor, G. & Thomas, A. Evolutionary biomechanics: selection, phylogeny, and constraint, Oxford University Press, 2014

      This book has quite detailed analyses of the allometry of wing size and shape in birds in an explicit phylogenetic context. It was a while ago that I read it, but I think it may provide much relevant information for the discussion in this work.

      Schilder, R. J. & Marden, J. H. A hierarchical analysis of the scaling of force and power production by dragonfly flight motors J. Exp. Biol., 2004, 207, 767

      This paper also addresses the question of allometry of flight forces (if in dragonflies). I believe it is relevant for this study, as it argues that positive allometry of forces is partially achieved through variation of the mechanical advantage, in remarkable resemblance to Biewener's classic work on EMA in terrestrial animals (this is discussed and unpacked in more detail also in Polet and Labonte, cited above). Of course, the authors should not measure the mechanical advantage of this work, but perhaps this is an interesting avenue for future work.

      We thank the reviewer for these valuable literature suggestions and the insights they offer for future work.

      More generally, I thought the introduction misses an opportunity to broaden the perspective even further, by making explicit that running and flying animals face an analogous problem (with swimming likely being a curious exception!); some other references related to the role of phylogeny in biomechanical scaling analyses are provided in the comments in the word file.

      The introduction has been revised to better emphasize the generality of the scaling question addressed in our study. Specifically, we now explicitly highlight the similar constraints associated with increasing or decreasing size in both terrestrial and flying animals (lines 53–59). We thank the reviewer for this suggestion, which has improved our manuscript.

      Numerical results vs measurements

      I felt that the paper did not make the strongest possible use of the very nice numerical simulations. Part of the motivation, as I understood it, was to conduct more complex simulations to also probe the validity of the quasi-steady aerodynamics assumption on which eq. 1 is based. All parameters in eq. 1 are known (or can be approximated within reasonable bounds) - if the force output is evaluated analytically, what is the result? Is it comparable to the numerical simulations in magnitude? Is it way off? Is it sufficient to support body mass? The interplay between experiments and numerics is a main potential strength of the paper, which in my opinion is currently sold short.

      We agree with the reviewer that we did not make full use of the numerical simulations results. In fact, we did so deliberately because we aim to focus more on the fluid mechanics of hoverfly flight in a future study. That said, we thank the reviewer for suggesting to use the CFD for validating our quasi-steady model. We now do so by correlating the vertical aerodynamic force with variations in morphology and kinematics (revised Figure 7A). The striking similarity between the predicted and empirical fit shows that the quasi-steady model captures the aerodynamic force production during hovering flight surprisingly well.

      Statistics

      There are errors in the Confidence Intervals in Tab 2 (and perhaps elsewhere). Please inspect all tables carefully, and correct these mistakes. The disagreement between confidence intervals and p-values suggests a significant problem with the statistics; after a brief consultation with the authors, it appears that this result arises because Standard Major Axis regression was used (and not Reduced Major Axis regression, as stated in the manuscript). This is problematic because SMA confidence intervals become unreliable if the variables are uncorrelated, as appears to be the case for some parameters here (see https://cran.r-project.org/web/packages/lmodel2/vignettes/mod2user.pdf for more details on this point). I strongly recommend that the authors avoid SMA, and use MA, RMA or OLS instead. My recommendation would be to use RMA and OLS to inspect if the conclusions are consistent, in which case one can be shown in the SI; this is what I usually do in scaling papers, as there are some colleagues who have very strong and diverging opinions about which technique is appropriate. If the results differ, further critical analysis may be required.

      The reviewer correctly identified an error in the statistical approach: a Standard Major Axis was indeed used under inappropriate conditions. Following Reviewer #3’s comments, the expanded sample size and the resulting increase in statistical power to detect phylogenetic signal, our revised analysis now accounts for phylogenetic effects in these regressions. We therefore now report the results from Phylogenetic Least Square (PGLS) regressions (the phylogenetic equivalent of an OLS).

      Figures

      Please plot 3E-F in log space, add trendlines, and the expectation from isometry/isophysiology, to make the presentation consistent, and comparison of effect strengths across results more straightforward.

      The reviewer probably mentioned Figure 3F-I and not E-F (the four panels depicting the relationships between kinematics variables and body mass). As requested, we added the expectation for kinematic similarity to the revised figure, but prefer to not show the non-significant PGLS fits, as they are not used in any analysis. For completeness, we did add the requested figure in log-space with all trendlines to the supplement (Figure S2), and refer to it in the figure legend.

      The visual impression of the effect strength in D is a bit misleading, due to the very narrow y-axis range; it took me a moment to figure this out. I suggest either increasing the y-range to avoid this incorrect impression or to notify the reader explicitly in the caption.

      We believe the reviewer is referring to Figure 4D. As rightly pointed out, variation in non-dimensional second moment of area() is very low among species, which is consistent with literature (Ellington, 1984b). We agree that the small range on the y-axis might be confusing, and thus we increased it somewhat. More importantly, we now show, next to the trend line, the scaling for isometry (~m<sup>0</sup>) and for single-metric weight support. Especially the steepness of the last trend line shows the relatively small effect of on aerodynamic force production. This is even further highlighted by the newly added pie charts of the relative allometric scaling factor, where variations in contribute only 5% to maintaining weight support across sizes.

      Despite this small variation, these adaptations in wing shape are still significant and are highly interesting in the context of our work. We now discuss this in more detail in the revised manuscript (lines 645–649).

      In Figure 7b, one species appears as a very strong outlier, driving the regression result. Data of the same species seems to be consistent with the other species in 7a, c, and d - where does this strong departure come from? Is this data point flagged as an outlier by any typical regression metric (Cook's distance etc) for the analysis in 7b?

      We agree with the reviewer: the species in dark green (Eristalis tenax) appears as an outlier on the in Figure 7B ( vs. vertical force) in our original manuscript. This is most likely due to the narrow range of variation in ( — as the reviewer pointed out in the previous comment — which amplifies differences among species. We expanded the y-axis range in the revised Figure 7, so that the point no longer appears as an outlier (see updated graph, now on Figure 7F).

      In Figure 1, second species from the top, it reads "Eristalix tenax" when it is "Eristalis tenax" (relayed info by the Editor).

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      I really like the biomechanical and aerodynamic analyses and think that these alone make for a strong paper, albeit with narrower conclusions. I think it is perfectly valid and interesting to analyze these questions within the scope of the species studied and even to say that these patterns may therefore extend to the hoverflies as a whole group given the great discussion about the shared ecology and behavior of much of the clade. However, the extension to miniaturization is too tenuous. This would need much more support, especially from the phylogenetic methods which are not rigorously presented and likely need additional tests.

      We thank the reviewer for the positive words about our study. We agree that our attempt to infer the directedness of size evolution was too simplistic, and thus the miniaturization aspect of our study would need more support. As suggested by the reviewer, we therefore do no longer focus on miniaturization, and thus removed these aspects from the title, abstract and main conclusion of our revised manuscript.

      There is a lot of missing data about the tree and the parameters used for the phylogenetic methods that should be added (especially branch lengths and models of evolution). Phylogenetic tests for the relationships of traits should go beyond the analysis of phylogenetic signals in the specific traits. My understanding is also that phylogenetic signal is not properly interpreted as a "control" on the effect of phylogeny. The PCA should probably be a phylogenetic PCA with a corresponding morphospace reconstruction.

      We agree with the reviewer that our phylogenetic approach based on phylogenetic signal only was incomplete. In our revised manuscript, we not only test for phylogenetic signal but also account for phylogeny in all regressions between traits and body mass using Phylogenetic Generalized Least Squares (PGLS) regressions. Additionally, we have provided more details about the model of evolution and the parameter estimation method in the Methods section (275–278).

      Following the reviewer suggestion, in our revised study we now also performed a phylogenetic PCA instead of a traditional PCA on the superimposed wing shape coordinates. The resulting morphospace was however almost identical to the traditional PCA (Figure S4). We nonetheless included it in the revised manuscript for completion. We thank the reviewer for this suggestion, as the revised phylogenetic analysis strengthens our study as a whole.

      For the miniaturization conclusion, my suggestion is a more rigorous phylogenetic analysis of directionality in the change in size across the larger phylogeny. However, even given this, I think the conclusion will be limited because it appears this trend does not hold up under the 8 species subsample. To support that morphology is evolutionarily correlated with miniaturization would for me require an analysis of how the change in body size relates to the change in wing shape and kinematics which is beyond what a scaling relationship does. In other words, you would need to test if the changes in body morphology occur in the same location phylogenetically with a shrinking of body size. I think even more would be required to use the words "enable" or "promote" when referring to the relationship of morphology to miniaturization because those imply evolutionary causality to me. To me, this wording would at least require an analysis that shows something like an increase in the ability of the wing morphological traits preceding the reduction in body size. Even that would likely be controversial. Both seem to be beyond the scope of what you could analyze with the given dataset.

      As mentioned in reply 3.1, we agree with the reviewer that the miniaturization aspect of our study would need more support. And thus, as suggested by the reviewer, we therefore do no longer focus primarily on miniaturization, by removing these aspects from the title, abstract and main conclusion of our revised manuscript.

      The pseudoreplication should be corrected. You can certainly report the data with all individuals, but you should also indicate in all cases if the analysis is consistent if only species are considered.

      As mentioned in the Public Review section, our revised approach avoids pseudoreplication by analyzing all data at the species level. Nonetheless, we have included supplementary figures (Figures S3 and S5) to visualize within-species variation.

      My overall suggestion is to remove the analysis of miniaturization and cast the conclusions with respect to the sampling you have. Add a basic phylogenetic test for the correlated trait analysis (like a phylogenetic GLM) which will likely still support your conclusions over the eight species and emphasize the specific conclusion about hoverflies' scaling relationships. I think that is still a very good study better supported by the extent of the data.

      We thank the reviewer for the positive assessment of our study, and their detailed and constructive feedback. As suggested by the reviewer, miniaturization is no longer the primary focus of our study, and we revised our analysis by extending the morphology dataset to more species, and by using phylogenetic regressions.

      References

      Ellington C. 1984a. The aerodynamics of hovering insect flight. III. Kinematics. Philosophical Transactions of the Royal Society of London B: Biological Sciences 305:41–78.

      Ellington C. 1984b. The aerodynamics of insect flight. II. Morphological parameters. Phil Trans R Soc Lond B 305:17–40.

      Fry SN, Sayaman R, Dickinson MH. 2005. The aerodynamics of hovering flight in Drosophila. Journal of Experimental Biology 208:2303–2318. doi:10.1242/jeb.01612

      Liu Y, Sun M. 2008. Wing kinematics measurement and aerodynamics of hovering droneflies. Journal of Experimental Biology 211:2014–2025. doi:10.1242/jeb.016931

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)>

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      We thank the reviewer for this very positive evaluation of our work and greatly appreciate their helpful comments and suggestions for improving the manuscript. We agree with the comment that the term “neuroestrogens” is misleading. Therefore, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript, including the title.

      In the following sections, “neuroestrogens” has been revised to align with the surrounding context.

      Line 21: “in the brain, also known as neuroestrogens,” → “in the brain.”

      Line 28: “neuroestrogens” → “these estrogens.”

      Line 30: “mechanism of action of neuroestrogens” → “mode of action of brain-derived estrogens.”

      Line 43: “brain-derived estrogens, also called neuroestrogens,” → “estrogens.”

      Line 74: “neuroestrogen synthesis is selectively impaired while gonadal estrogen synthesis remains intact” → “estrogen synthesis in the brain is selectively impaired while that in the gonads remains intact.”

      Line 77: “neuroestrogens” → “these estrogens.”

      Line 335: “levels of neuroestrogens” → “brain estrogen levels.”

      Line 338: “neuroestrogens” → “these estrogens.”

      Line 351: “neuroestrogens” → “these estrogens.”

      Line 357: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 359: “neuroestrogens” → “estrogen synthesis in the brain.”

      Line 390: “active synthesis of neuroestrogens” → “active estrogen synthesis in the brain.”

      Line 431: “neuroestrogens” → “estrogens in the brain.”

      Line 431: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 433: “neuroestrogen action” → “their action.”

      Strengths:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones.

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation.

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'.

      Includes multiple follow-up experiments, which lead to tests of internal replication and an impactful mechanistic proposal.

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionarily ancient.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) As stated in the summary, the authors attribute the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either.

      As noted in Response to reviewer #1’s summary comment, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” Following this addition, “This observation suggests” in the subsequent sentence has been replaced with “These observations suggest.”

      The following references (#18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (2) The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering.

      Line 282: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wildtype siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P = 0.1512) (Fig. 5L; Fig. S8E), showing less aggression.”

      (3) Lack of attribution of previously published work from other research groups that would provide the proper context of the present study.

      In response to this and other comments from this reviewer, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in maletypical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction. This addition provides an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list, with other references renumbered accordingly:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      (4) There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation".

      Line 68: As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      The following revisions have also been made to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (5) The experimental design for studying aggression in males has flaws. A standard test like a resident intruder test should be used.

      We agree that the resident-intruder test is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (6) While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside.

      We agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021, Curr Biol, 1699–1710). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (7) The statistics comparing "experimental to experimental" and "control to experimental" aren't appropriate.

      The reviewer raises concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.

      Line 619: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and Methods. We have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The novelty of this study stems from the observations that neuro-estrogens appear to interact with brain androgen receptors to support male-typical behaviors. The study provides a step forward in clarifying the somewhat contradictory findings that, in teleosts and unlike other vertebrates, androgens regulate male-typical behaviors without requiring aromatization, but at the same time estrogens appear to also be involved in regulating male-typical behaviors. They manipulate the expression of one aromatase isoform, cyp19a1b, that is purported to be brain-specific in teleosts. Their findings are important in that brain estrogen content is sensitive to the brain-specific cyp19a1b deficiency, leading to alterations in both sexual behavior and aggressive behavior. Interestingly, these males have relatively intact fertility rates, despite the effects on the brain.

      We thank this reviewer for their positive evaluation of our work and constructive comments, which we found very helpful in improving the manuscript.

      That said, the framing of the study, the relevant context, and several aspects of the methods and results raise concerns. Two interpretations need to be addressed/tempered:

      (1) that the rescue of cyp19a1b deficiency by tank-applied estradiol is not necessarily a brain/neuroestrogen mode of action, and

      Line 155: cyp19a1b-deficient males exhibited a severe reduction in brain E2 levels, yet their peripheral E2 levels remained comparable to those in wild-type males. Given this hormonal milieu and the lack of behavioral change in wild-type males following E2 treatment, the observed recovery of mating behavior in cyp19a1b-deficient males following E2 treatment can be best explained by the restoration of brain E2 levels. However, as the reviewer pointed out, we cannot rule out the possibility that bath-immersed E2 influenced behavior through an indirect peripheral mechanism. To address this concern, we have modified the text as follows: “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (2) the large increases in peripheral and brain androgen levels in the cyp19a1b deficient animals imply some indirect/compensatory effects of lifelong cyp19a1b deficiency.

      As stated in line 151, androgen/AR signaling has a strong facilitative effect on male-typical behaviors in teleosts. If increased androgen levels in the periphery and brain affected behavior, the expected effect would be facilitative. However, cyp19a1b-deficient males exhibited impaired male-typical behaviors, suggesting that elevated androgen levels were unlikely to be responsible. Although chronic androgen elevation could cause androgen receptor desensitization, which could lead to behavioral suppression, our long-term androgen treatments have consistently promoted, rather than inhibited, male-typical behaviors (e.g., Nishiike et al., Proc Natl Acad Sci USA 121:e2316459121). Hence, this possibility is also highly unlikely.

      Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of neuro-estrogens in the control of sexual and aggressive behavior in teleost fish. The constitutive deletion of Cyp19a1b reduced brain estrogen content by 87% in males and about 50% in females. It led to reduced sexual and aggressive behavior in males and reduced sexual behavior in females. These effects are reversed by adult treatment with estradiol thus indicating that they are activational in nature. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara, and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of neuro-estrogens in social behavior in the most abundant vertebrate taxa. While estrogens are involved in the organization of the brain and behavior of some birds and rodents, neuro-estrogens appear to play an activational role in fish through a facilitatory action of androgen signaling.

      We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.

      Strengths:

      Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxa are more abundant and yet proportionally less studied than the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      Results obtained from multiple mutant lines converge to show that estrogen signaling drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) The new transgenic lines are under-characterized. There is no evaluation of the mRNA and protein products of Cyp19a1b and ESR2a.

      We did not directly assess the function of cyp19a1b and esr2a in our mutant fish. However, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly supports the loss of cyp19a1b function. This is stated in the Results section (line 97) as follows: “These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function.”

      Line 473: A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and Methods: “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (59). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”

      The following reference (#59), cited in the newly added text above, have been included in the reference list:

      D. Kayo, B. Zempo, S. Tomihara, Y. Oka, S. Kanda, Gene knockout analysis reveals essentiality of estrogen receptor β1 (Esr2a) for female reproduction in medaka. Sci. Rep. 9, 8868 (2019).

      (2) The stereotypic sequence of sexual behavior is poorly described, in particular, the part played by the two sexual partners, such that the conclusions are not easily understandable, notably with regards to the distinction between motivation and performance.

      Line 103: To provide a more detailed description of medaka mating behavior, we have revised the text from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning).”

      (3) The behavior of females is only assessed from the perspective of the male, which raises questions about the interpretation of the reduced behavior of the males.

      In medaka, female mating behavior is largely passive, except for rejecting courtship attempts and releasing eggs. Therefore, its analysis relies on measuring the latency to receive following, courtship displays, or wrappings from the male and the frequency of courtship rejection or wrapping refusal. We understand the reviewer’s perspective that cyp19a1b-deficient females might not be less receptive but instead less attractive to males, potentially leading to reduced male mating efforts. However, since these females are approached and followed by males at levels comparable to wild-type females, this possibility appears unlikely. Moreover, cyp19a1b-deficient females tend to avoid males and exhibit a slightly female-oriented sexual preference. While these traits are closely associated with reduced sexual receptivity, they do not readily align with reduced sexual attractiveness. Therefore, it is more plausible to conclude that these females have decreased receptivity rather than being less attractive to males.

      (4) At no point do the authors seem to consider that a reduced behavior of one sex could result from a reduced sensory perception from this sex or a reduced attractivity or sensory communication from the other sex.

      Line 112: As noted above, the impaired mating behavior of cyp19a1b-deficient females is unlikely to be due to reduced attractiveness to males. Similarly, mating behavior tests using esr2b-deficient females as stimulus females suggest that the impaired mating behavior of cyp19a1b-deficient males cannot be attributed to reduced attractiveness to females. However, the possibility that their impaired mating behavior could be attributed to altered cognition or sexual preference cannot be ruled out. To reflect this in the manuscript, we have revised the text “, suggesting that they are less motivated to mate” to “. These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (5) Aspects of the methods are not detailed enough to allow proper evaluation of their quality or replication of the data.

      In response to this and other specific comments from this reviewer, we have revised the Materials and Methods section to include more detailed descriptions of the methods.

      Line 469: The following text has been added to describe the method for domain identification in medaka Esr2a: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference list:

      S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      Line 540: The text “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to include additional details on the single ISH method as follows: “. The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      Line 596: The following text has been added to include additional details on the double ISH method: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (6) It seems very dangerous to use the response to a mutant abnormal behavior (ESR2-KO females) as a test, given that it is not clear what is the cause of the disrupted behavior.

      esr2b-deficient females have fully developed ovaries, a normal sex steroid milieu, and sexual attractiveness to males comparable to wild-type females, yet they are completely unreceptive to male courtship (Nishiike et al., 2021, Curr Biol, 1699–1710). Although, as the reviewer noted, the detailed mechanisms underlying this phenotype remain unclear, it is evident that the loss of estrogen/Esr2b signaling in the brain severely impairs sexual receptivity. Therefore, using esr2b-deficient females as stimulus females in the mating behavior test eliminates the influence of female sexual receptivity and male attractiveness to females, enabling the exclusive assessment of male mating motivation. This rationale is already presented in the Results section (lines 116–120), and we believe this experimental design offers a robust framework for assessing male mating motivation.

      Additionally, the mating behavior test with esr2b-deficient females complemented the test with wildtype females, and its results were not the sole basis for our discussion of the male mating behavior phenotype. The results of both tests were largely concordant, and we believe that the conclusions drawn from them are highly reliable.

      Meanwhile, in the test with esr2b-deficient females, cyp19a1b-deficient males were courted more frequently by these females than wild-type males. As the reviewer noted, this may suggest an anomaly in the test. Accordingly, we have confined our discussion to the possibility that “Perhaps cyp19a1b<sup>−/−</sup> males are misidentified as females by esr2b-deficient females because they are reluctant to court or they exhibit some female-like behavior” (line 131).

      (7) Most experiments are weakly powered (low sample size) and analyzed by multiple T-tests while 2 way ANOVA could have been used in several instances. No mention of T or F values, or degrees of freedom.

      Histological analysis was conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. As significant differences were detected in many analyses, further increasing the sample size is unnecessary.

      Although two-way ANOVA could be used instead of multiple T-tests for analyzing the data in Figures 4D, 4F, 6D, S4A, and S4B, we applied the Bonferroni–Dunn correction to control for multiple pairwise comparisons in multiple T-tests. As this comparison method is equivalent to the post hoc test following two-way ANOVA, the statistical results are identical regardless of whether T-tests or two-way ANOVA are used.

      For the data in Figures 4D, 4F, S4A, and S4B, the primary focus is on whether relative luciferase activity differs between E2-treated and untreated conditions for each mutant construct. Therefore, two-way ANOVA is not particularly relevant, as assessing the main effect of construct type or its interaction with E2 treatment does not provide meaningful insights. Similarly, in Figure 6D, the focus is solely on whether wild-type and mutant females differ in time spent at each distance. Given this, two-way ANOVA is unnecessary, as analyzing the main effect of distance is not meaningful.

      Accordingly, two-way ANOVA was not employed in this study, and therefore, its corresponding F values were not included. As the figure legends specify the sample sizes for all analyses, specifying degrees of freedom separately was deemed unnecessary.

      (8) The variability of the mRNA content for the same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (9) The discussion confuses the effects of estrogens on sexual differentiation (developmental programming = permanent) and activation (= reversible activation of brain circuits in adulthood) of the brain and behavior. Whether sex differences in the circuits underlying social behaviors exist is not clear.

      We recognize that the effects of adult steroids are sometimes not considered to be sexual differentiation, as they do not differentiate the neural substrate, but rather transiently activate the already masculinized or feminized substrate. Arnold (2017, J Neurosci Res 95:291–300) contends that all factors that cause sex differences, including the transient effects of adult steroids, should be incorporated into a theory of sexual differentiation, and indeed, these effects may be the most potent proximate factors that make males and females different. We concur with this perspective and have adopted it as a foundation for our manuscript.

      In teleosts, early developmental exposure to steroids has minimal impact, and sexual differentiation relies primarily on steroid action in adulthood (Okubo et al., 2022, Spectrum of Sex, pp. 111–133). This is evidenced by the effective reversal of sex-typical behaviors through experimental hormonal manipulation in adult teleosts and the absence of transient early-life steroid surges observed in mammals and birds. Accordingly, our discussion on brain sexual differentiation, including the statement in line 347, “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species”, remains well-supported. Additionally, given these considerations, while sex differences in neural circuit activation are evident in teleosts, substantial structural differences in these circuits are unlikely.

      (10) Overall, the claims regarding the activational role of neuro-estrogens on male sexual behavior are supported by converging evidence from multiple mutant lines. The role of neuroestrogens on gene expression in the brain is mostly solid too. The data for females are comparatively weaker. Conclusions regarding sexual differentiation should be considered carefully.

      We agree that the data for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when integrated with our prior findings, the data on females in this study provide significant insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors set out to answer an intriguing question regarding the hormonal control of innate social behaviors in medaka. Specifically, they wanted to test the effects of cyp19a1b mutation on mating and aggression in males. They also test these effects in females. Their approach takes them down several distinct experimental pathways, including one investigating how cyp19a1a function is related to androgen receptor expression and how estrogens themselves may act on the androgen receptor to modulate its expression, as well as how different esr genes may be involved. The study and its results are valuable and a clear, general conclusion of a pathway from brain aromatase>estrogens>esr genes> androgen receptor can be made. This is important, novel, and impactful. However, there are issues with how the study logic is set up, the approach for assessing certain behaviors, the statistics used, the interpretation of findings, and placing the findings in the proper context based on previous work, which manifests as a general issue where previous work is not properly attributed to.

      Thank you for your thoughtful review. We have carefully addressed each specific comment, as detailed below.

      Major comments:

      (1) The background for the rationale of the current study is misleading and lacks proper context. The authors root the logic of their experiment in determining whether estrogens regulate male-typical behaviors because the current assumption is androgens are "solely responsible" for male-typical behaviors in teleosts. This is not the case. Previous studies have shown aromatase/estrogens are involved in male-typical aggression in teleosts. For example, to name a couple:

      Huffman, L. S., O'Connell, L. A., & Hofmann, H. A. (2013). Aromatase regulates aggression in the African cichlid fish Astatotilapia burtoni. Physiology & behavior, 112, 77-83.

      O'Connell, L. A., & Hofmann, H. A. (2012). Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology, 153(3), 1341-1351.

      And even a recent paper sheds light on a possible AR>aromatase.estradiol hypothesis of male typical behaviors:

      Lopez, M. S., & Alward, B. A. (2024). Androgen receptor deficiency is associated with reduced aromatase expression in the ventromedial hypothalamus of male cichlids. Annals of the New York Academy of Sciences.

      Interestingly, the authors cite Hufmann et al in the discussion, so I don't understand why they make the claims they do about estrogens and male-typical behavior.

      Related to this, is an issue of proper attribution to published work. Indeed, missing are key references from lab groups using AR mutant teleosts. Here are a couple:

      Yong, L., Thet, Z., & Zhu, Y. (2017). Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology, 220(17), 3017-3021.

      Alward, B. A., Laud, V. A., Skalnik, C. J., York, R. A., Juntti, S. A., & Fernald, R. D. (2020). Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences, 117(45), 28167-28174.

      Ogino, Y., Ansai, S., Watanabe, E., Yasugi, M., Katayama, Y., Sakamoto, H., ... & Iguchi, T. (2023). Evolutionary differentiation of androgen receptor is responsible for sexual characteristic development in a teleost fish. Nature communications, 14(1), 1428.

      As noted in Response to reviewer #1’s comment 3 on weaknesses, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: The text “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      While Lopez and Alward (2024) provide valuable insights into the regulation of cyp19a1b expression by androgens, our study focuses specifically on the functional aspects of cyp19a1b. Expanding the discussion to include expression regulation would divert from the primary focus of our manuscript. For this reason, we have opted not to cite this reference.

      (2) As it is now, the authors are only citing a book chapter/review from their own group. This is a serious issue as it does not provide the proper context for the work. The authors need to fix their issues of attribution to previously published work and the proper interpretation of the work that they are aware of as it pertains to ideas proposed on the roles of androgens and estrogens in the control of male-typical behaviors. This is also important to get the citations right because the common use of "contrary to expectations" when describing their results is actually not correct. Many of the observations are expected to a degree. However, this doesn't take away from a generally stellar experimental design and mostly clear results. The authors do not need to rely on enhancing the impact of their paper by making false claims of unexpected findings. The depth and clarity of your findings are where the impact of your work is.

      As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      Additionally, as noted in Response to reviewer #1’s comment 4 on weaknesses, we have made the following revisions to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (3) The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used. An assay in which only male mutants are housed together? I do not understand the logic there and the logic for the approach isn't even explained. Too many confounds that are not controlled for. It makes it seem like an aspect of the study that was thrown in as an aside.

      As noted in Response to reviewer #1’s comment 5 on weaknesses, medaka form shoals and lack strong territoriality. As a result, even slight differences in dominance between the resident and intruder can substantially impact the outcomes of the resident-intruder test. Therefore, we adopted an alternative approach in this study.

      (4) Hormonal differences in the mutants seem to vary based on sex, and the meaning of these differences, or how they affect interpreting the findings, wasn't discussed. There was no acknowledegment of the fact that female central E2 was still at 50%, meaning the "rescue" experiments using peripheral injections are not given the proper context. For example, this is different than giving a fish with only 16% of their normal central E2 an E2 injection. Missing as well is a clear hypothesis for why E2 injections did not rescue aggression deficits in cyp19a1b mutant males.

      Line 385: As the reviewer pointed out, the degree of brain estrogen reduction in cyp19a1b-deficient fish differs greatly between males and females. This is likely because females receive a large supply of estrogens from the ovaries. Given that estrogen levels in cyp19a1b-deficient females were 50% of those in wild-type females, it can be inferred that half of their brain estrogens are synthesized locally, while the other half originates from the ovaries. This is an important finding, and we have already noted in the Discussion that “females have higher brain levels of estrogens, half of which are synthesized locally in the brain (i.e., neuroestrogens)” However, as this explanation was not sufficiently clear, we have revised it to “females have higher brain levels of estrogens, with half being synthesized locally and the other half supplied by the ovaries.”

      The reviewer raised a concern that conducting the estrogen rescue experiment in females, where 50% of brain estrogens remain, might be inappropriate. However, as this experiment was conducted exclusively in males, this concern is not applicable.

      Line 377: As noted in the reviewer’s subsequent comment, the failure of aggression recovery in E2treated cyp19a1b-deficient males could be due to insufficient induction of ara/arb expression in aggression-relevant brain regions. To address this concern, we have inserted the following statement into the Discussion after “the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry”: “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.”

      (5) In relation to that, the "null" results may have some of the most interesting implications, but they are barely discussed. For example, what does it mean that E2 didn't restore aggression in male cyp19 mutants? Is this a brain region factor? Could this relate to findings from Lopez et al NYAS, where male and female Ara mutants show different effects on brain-region-specific aromatase expression? And maybe this relates to the different impact of estrogens on ar expression. Were the different effects impacted in aggression areas? Maybe this is why E2 injection didn't retore aggression in males. You could make the argument that: (1) E2 doesn't restore ar expression in aggression regions and that's why there was no rescue. Or (2) that the circuits in adulthood that regulate aggression are NOT dependent on aggression but in early development they are. Another null finding not expanded on is why the two esr2a mutant lines showed differences. There is no reason to trust one line over the other, meaning we still don't know whether esr2a is required for latency to follow.

      As stated in our response to the previous comment, we have added the following text to the Discussion (line 377): “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.” Meanwhile, as discussed in lines 341–342, it is highly unlikely that the neural circuits regulating aggression are primarily influenced by early-life estrogen exposure, because androgen administration in adulthood alone is sufficient to induce high levels of aggression in both sexes. This notion is further supported by previous observations that cyp19a1b expression in the brain is minimal during embryonic development (Okubo et al., 2011, J Neuroendocrinol, 23:412–423).

      The findings of Lopez and Alward (2024) pertain to the regulation of cyp19a1b expression by androgen receptors. While this represents an important aspect of neuroendocrine regulation, it does not appear to be directly relevant to our discussion on cyp19a1b-mediated regulation of androgen receptor expression.

      To ensure the reliability of behavioral analyses in mutant fish, we consider a phenotype valid only when it is consistently observed in two independent mutant lines. In the mating behavior test examining esr2adeficient males using esr2b-deficient females as stimulus females, Δ8 line males exhibited a shorter latency to initiate following than wild-type males, whereas Δ4 line males did not. This discrepancy led us to refrain from drawing conclusions about the role of esr2a in mating behavior, even though the mating behavior test using wild-type females as stimulus females yielded consistent results in the Δ8 and Δ4 lines. Therefore, we do not consider the reviewer’s concern to be a significant issue.

      (6) Not sure what's going on with the statistics, but it is not appropriate here to treat a "control" group as special. All groups are "experimental" groups. There is nothing special about the control group in this context. all should be Bonferroni post-hoc tests.

      Line 619: As detailed in Response to reviewer #1’s comment 7 on weaknesses, we consider Dunnett’s test the most appropriate choice for the experiments presented in Figures 4C and 4E. We acknowledge that the reviewer’s concern may stem from the phrase “comparisons between control and experimental groups” in the Materials and Methods section. To clarify this point, we have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Minor comments:

      Line 47: then how can you say the aromatization hypothesis is "correct"? it only applies to a few species so far. Need to change the framing, not state so strongly such a vague thing as a hypothesis being "correct".

      Line 45: To address this concern, we have modified “widely accepted as correct” to “widely acknowledged”, ensuring a more precise characterization.

      Figure 1: looks like a dosage effect in males but not females. this should be discussed at some point, even if just to mention a dosage effect exists and put it in context.

      Line 91: We have revised the sentence “In males, brain E2 in heterozygotes (cyp19a1b+/−) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A)” by adding “, indicating a dosage effect of cyp19a1b mutation” to make this point explicit.

      Were male cyp19 KO aggressive towards females?

      We have not observed cyp19a1b-deficient males exhibiting aggressive behavior towards females in our experiments. Therefore, we do not consider them aggressive toward females.

      Please explain how infertility would lead to reduced mating.

      Line 142: As the reviewer has questioned, even if cyp19a1b-deficient males exhibit infertility due to efferent duct obstruction, it is difficult to imagine that this directly leads to reduced mating. However, the inability to release sperm could indirectly affect behavior. To address this, we have added “, possibly due to the perception of impaired sperm release” after “If this is also the case in medaka, the observed behavioral defects might be secondary to infertility.”

      Describe something about the timing of the treatment here. How can peripheral E2 injections restore it when peripheral levels are normal? Did these injections restore central levels? This needs to be shown experimentally.

      Line 517: As described in the Materials and Methods, E2 treatment was conducted by immersing fish in E2-containing water for 4 days. However, we had not explicitly stated that the water was changed daily to maintain the nominal concentration. To clarify this and address reviewer #2’s comment 9, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      Line 522: The treatment effectively restored mating activity and ara/arb expression in the brain, suggesting a sufficient increase in brain E2 levels. However, we did not measure the actual increase, and its extent remains uncertain. To reflect this in the manuscript, we have now added the following sentence: “Although the exact increase in brain E2 levels following E2 treatment was not quantified, the observed positive effects on behavior and gene expression suggest that it was sufficient.”

      I know the nomenclature differs among those who study teleosts, but it's ARa and then gene is ar1 (as an example; arb would be ar2). You're recommended the following citation to remain consistent:

      Munley, K. M., Hoadley, A. P., & Alward, B. A. (2023). A phylogenetics-based nomenclature system for steroid receptors in teleost fishes. General and Comparative Endocrinology, 114436.

      Paralogous genes resulting from the third round of whole-genome duplication in teleosts are typically designated by adding the suffixes “a” and “b” to their gene symbols. This convention also applies to the two androgen receptor genes, commonly referred to as ara and arb. While the alternative names ar1 and ar2 may gain broader acceptance in the future, ara and arb remain more widely used at present. Therefore, we have chosen to retain ara and arb in this manuscript.

      Line 268: how is this "suggesting" less aggression? They literally showed fewer aggressive displays, so it doesn't suggest it - it literally shows it.

      Line 285: Following this thoughtful suggestion, we have changed “suggesting less aggression” to “showing less aggression.”

      Line 317: how can you still call it the primary driver?

      The stimulatory effects of aromatase/estrogens on male-typical behaviors are exerted through the potentiation of androgen/AR signaling. Thus, we still believe that androgens—specifically 11KT in teleosts—serve as the primary drivers of these behaviors.

      Line 318: not all deficits, like aggression, were rescued.

      Line 334: To address this comment, “These behavioral deficits were rescued by estrogen administration, indicating that reduced levels of neuroestrogens are the primary cause of the observed phenotypes: in other words, neuroestrogens are pivotal for male-typical behaviors in teleosts” has been modified and now reads “Deficits in mating were rescued by estrogen administration, indicating that reduced brain estrogen levels are the primary cause of the observed mating impairment; in other words, brain-derived estrogens are pivotal at least for male-typical mating behaviors in teleosts.”

      Line 324: what do you mean by "sufficient"? To show that, you'd have to castrate the male and only give estrogen back. the authors continue to overstate virtually every aspect of their study, seemingly in an unnecessary manner.

      Line 341: Our intention was to convey that brain-derived estrogens early in life are not essential for the expression of male-typical behaviors in teleosts. However, we recognize that the term “sufficient” could be misinterpreted as implying that estrogens alone are adequate, without contributions from other factors such as androgens. To clarify this, we have revised the text from “neuroestrogen activity in adulthood is sufficient for the execution of male-typical behaviors, while that in early in life is not requisite. Thus, while” to “brain-derived estrogens early in life is not essential for the execution of male-typical behaviors. While.”

      Line 329: so? in adult mice, amygdala aromatase neurons still regulate aggression. The amount in adulthood seems less important compared to site-specific functions.

      Line 346: We do not intend to suggest that brain aromatase activity in adulthood plays a negligible role in male behaviors in rodents, as we have already acknowledged its necessity in the Introduction (lines 42–43). To enhance clarity and prevent misinterpretation, we have added “, although it remains important for male behavior in adulthood” to the end of the sentence: “brain aromatase activity in rodents reaches its peak during the perinatal period and thereafter declines with age.”

      Line 351: This contradicts what you all have been saying.

      Line 65: As mentioned in Response to reviewer #1’s comment 3 on weaknesses, the following text has been added to the Introduction: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)”, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015). With this revision, we believe the inconsistency has been addressed.

      Line 367: Additionally, we have revised the sentence from “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31–33)” to “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      Line 360: change to "...possibility that is not mutually exclusive,"

      Line 378: We have revised the phrase as suggested from “Another possibility, not mutually exclusive,” to “Another possibility that is not mutually exclusive.”

      Line 363: but it didn't rescue aggression

      Line 381: In response, we have revised the sentence from “This possibility is supported by the present observation that estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings” to “This possibility is at least likely for mating behavior, as estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings.”

      Line 367: on average

      To explain the sex differences in the role of aromatase, what about the downstream molecular or neural targets? In mammals, hodology is related to sex differences. there could be convergent sex differences in regulating the same type of behaviors as well.

      Our findings demonstrate that brain-derived estrogens promote the expression of ara, arb, and their downstream target genes vt and gal in males, while enhancing the expression of npba, a downstream target of Esr2b signaling, in females. The identity of additional target genes and their roles in specific neural circuits remain to be elucidated, and we aim to address these in future research.

      Lines 378-382: this doesn't logically follow. pgf2a could be the target of estrogens which in the intact animal do regulate female sexual receptivity. And how can you say this given that your lab has shown in esr2b mutants females don't mate?

      We agree that PGF2α signaling may be activated by estrogen signaling, as stated in lines 404–407: “the present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.” The observation that esr2b-deficient females do not accept male courtship is also stated in lines 401–403: “we recently challenged it by showing that female medaka deficient for esr2b are completely unreceptive to males, and thus estrogens play a critical role in female receptivity.”

      Line 396-397: or the remaining estrogens are enough to activate esr2b-dependent female-typical mating behaviors.

      We agree that cyp19a1b deficiency did not completely preclude female mating behavior, most likely because residual estrogens in the brains of cyp19a1b-deficient females enable weak activation of Esr2b signaling. However, the relevant section in the Discussion is not focused on examining why mating behavior persisted, but rather on considering the implications of this finding for the neural circuits regulating mating behavior. Therefore, incorporating the suggested explanation here would shift the focus and would not be appropriate.

      Line 420-421: this is a lot of variation. Was age controlled for?

      The time required for medaka to reach sexual maturity varies with rearing density and food availability. Due to space constraints, we adjust these parameters as needed, which led to variation in the ages of the experimental fish. However, since all experiments were conducted using sibling fish of the same age that had just reached sexual maturity, we believe this does not affect our conclusions.

      Line 457: have these kits been validated in medaka?

      Although we have not directly validated its applicability in medaka, its extensive use in this species suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      Line 589, re fish that spawned: how many times did this happen? Please note it is based on genotype and experiment. This could be important.

      Line 627: In response to this comment, we have added the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup>+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.”

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      (A1) The framing of neuroestrogens being important for male-typical rodents, and not for other vertebrate lineages, does not account for other groups (birds) in which this is true (the authors can consult their cited work by Balthazart (Reference 6) for extensive accounting of this). This makes the novelty clause in the abstract "indicating that neuro-estrogens are pivotal for male-typical behaviors even in nonrodents" less surprising and should be acknowledged by the authors by amending or omitting this novelty clause. The findings regarding androgen receptor transcription (next sentence) are more important and pertinent.

      Line 27: We recognize that the aromatization hypothesis applies to some birds, including zebra finches, as stated in the Introduction (lines 48–49) and Discussion (lines 432–433). However, this was not reflected in the Abstract. Following the reviewer’s suggestion, we have changed “in non-rodents” to “in teleosts.”

      (A2) The medaka line that has been engineered to have aromatase absent in the brain is presented briefly in the abstract, but can be misinterpreted as naturally occurring. This should be amended, by including something like "engineered" or "directed mutant" before 'male medaka fish'.

      Line 24: We have added “mutagenesis-derived” before “male medaka fish” in response to this comment.

      Introduction:

      (I1) The paragraph on teleost brain aromatase should acknowledge that while the capacity for estrogen synthesis in the brain is 100-1000 fold higher in teleosts as compared to rodents and other vertebrates, the majority of this derives from glial and not neural sources. This can be confusing for readers since the term 'neuroestrogens' often refers to the neuronal origin and signalling. And this observation includes the exclusive radial glial expression of cyp19a1b in medaka (Diotel et al., 2010), and first discovered in midshipman (Forlano et al., 2001), each of which should also be cited here. In addition, the authors expend much text comparing teleosts and rodents, but it is worth expanding these kinds of comparisons, especially by pointing out that parts of the primate brain are found to densely express aromatase (see work by Ei Terasawa and others).

      In response to this comment and a similar comment from reviewer #1, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” As a result of this addition, we have changed “This observation suggests” to “These observations suggest” in the subsequent sentence.

      Line 51: Additionally, to include information on aromatase in the primate brain, we have added the following text: “In primates, the hypothalamic aromatization of androgens to estrogens plays a central role in female gametogenesis (10) but is not essential for male behaviors (7, 8).”

      The following references (#10 and 18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      E. Terasawa, Neuroestradiol in regulation of GnRH release. Horm. Behav. 104, 138–145 (2018).

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (I2) It is difficult to resolve from the introduction and work cited how restricted cyp19a1b is to the medaka brain. Important for the results of this study, it is not clear whether it is more of a bias in the brain vs other tissues, or if the cyp19a1b deficiency is restricted to the brain, and gonadal/peripheral cyp19 expression persists. The authors need to improve their consideration of the alternatives, i.e., that this manipulation is not somehow affecting: 1) peripheral aromatase expression (either cyp19a1a or cyp19a1b) in the gonad or elsewhere, 2) compensatory processes, such as other steroidogenic genes (are androgen synthesizing enzymes increasing?).

      Our previous study demonstrated that cyp19a1b is expressed in the gonads, but at levels tens to hundreds of times lower than those in the brain (Okubo et al., 2011, J Neuroendocrinol 23:412–423). Additionally, a separate study in medaka reported that cyp19a1b expression in the ovary is considerably lower than that of cyp19a1a (Nakamoto et al., 2018, Mol Cell Endocrinol 460:104–122). Given these observations, any potential effect of cyp19a1b knockout on peripheral estrogen synthesis is likely negligible. Indeed, Figures S1C and S1D confirm that cyp19a1b knockout does not alter peripheral E2 levels.

      Line 72: To incorporate this information into the Introduction and address the following comment, we have added the following text: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      The following references (#26 and 27), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      K. Okubo, A. Takeuchi, R. Chaube, B. Paul-Prasanth, S. Kanda, Y. Oka, Y. Nagahama, Sex differences in aromatase gene expression in the medaka brain. J. Neuroendocrinol. 23, 412–423 (2011).

      M. Nakamoto, Y. Shibata, K. Ohno, T. Usami, Y. Kamei, Y. Taniguchi, T. Todo, T. Sakamoto, G. Young, P. Swanson, K. Naruse, Y. Nagahama, Ovarian aromatase loss-of-function mutant medaka undergo ovary degeneration and partial female-to-male sex reversal after puberty. Mol. Cell. Endocrinol. 460, 104–122 (2018).

      We have not assessed whether the expression of other steroidogenic enzymes is altered in cyp19a1bdeficient fish, and this may be investigated in future studies.

      (I3) Related, there are documented sex differences in the brain expression of cyp19a1b especially in adulthood (Okubo et al 2011) and this study should be cited here for context.

      Line 72: As stated in our previous response, we have cited Okubo et al. (2011) by adding the following sentence: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      Methods

      (M1) The rationale is unclear as presented for using mutagen screening for cype19a1b while using CRISPR for esr2a. Are there methodological/biochemical reasons why the authors chose to not use the same method for both?

      At the time we generated the cyp19a1b knockouts, genome editing was not yet available, and the TILLING-based screening was the only method for obtaining mutants in medaka. In contrast, by the time we generated the esr2a knockouts, CRISPR/Cas9 had become available, enabling a more efficient and convenient generation of knockout lines. This is why the two knockout lines were generated using different methods.

      (M2) Measurement of steroids in biological matrices is not straightforward, and it is good that the authors use multiple extraction steps (organic followed by C18 columns) before loading samples on the ELISA plates, which are notoriously sensitive. Even though these methods have been published before by this group of authors previously, the quality control and ELISA performance values (recovery, parallelism, etc.) should be presented for readers to evaluate.

      Thank you for appreciating our sample purification method. Unfortunately, we have not evaluated the recovery rate or parallelism, but we recognize this a subject for future studies.

      (M3) Mating behavior - E2 treated males were not co-housed with social partners for the full 24 hr before testing, but instead a few hours (?) prior to testing. The rationale for this should be spelled out explicitly.

      Line 494: In response to this comment, we have added “to ensure the efficacy of E2 treatment” to the end of the sentence “The set-up was modified for E2-treated males, which were kept on E2 treatment and not introduced to the test tanks until the day of testing.”

      (M4) The E2 treatment is listed as 1ng/ml vs. vehicle (ethanol). Is the E2 dissolved in 100% ethanol for administration to the tank water? Clarification is needed.

      Line 517: As the reviewer correctly assumed, E2 was first dissolved in 100% ethanol before being added to the tank water. To provide this information and address reviewer #1’s minor comment 5, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      (M5) The authors exclude fish from the analysis of courtship display behavior for those individuals that spawned immediately at the start of the testing (and therefore it was impossible to register courtship display behaviors). How often did fish in the various treatment groups exhibit this "fast spawning" behavior? Was the occurrence rate different by treatment group? It is unlikely that these omissions from the data set drove large-scale patterns, but an indication of how often this occurred would be reassuring.

      Line 627: In response to this comment, we have included the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b+/+, 3/10 cyp19a1b+/−, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.” These data indicate that the proportion of excluded males is nearly constant within each trial and is independent of the genotype of the focal fish.

      Results

      (R1) It is striking to see the genetic-'dose' dependent suppression of brain E2 content by heterozygous and homozygous cyp19a1b deficiency, indicating that, as the authors point out, the majority of E2 in the male medaka brain (and 1/2 in the female brain) have a brain-derived origin. It is important also for the interpretation that there are large compensatory increases in brain levels of androgens, when E2 levels drop in the cyp19a1b mutant homozygotes. This latter point should receive more attention.

      Also, there are large increases in peripheral androgen levels in the homozygote mutants for cyp19a1b in both males and females. This indicates a peripheral effect in addition to the clear brain knockdown of E2 synthesis. These nuances need to be addressed.

      In response to this comment, we have revised the Results section as follows:

      Line 91: “, indicating a dosage effect of cyp19a1b mutation” has been added to the end of the sentence “In males, brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A).”

      Line 94: To draw more attention to the increase in brain androgen levels caused by cyp19a1b deficiency, “Brain levels of testosterone” has been modified to “Strikingly, brain levels of testosterone.”

      Line 100: “Their peripheral 11KT levels also increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D)” has been modified and now reads “In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.”

      (R2) The interpretation on page 4 that cyp19a1b deficient males are 'less motivated' to mate is premature, given the behavioral measures used in this study. There are several competing explanations for these findings (e.g., alterations in motivation, sensory discrimination, preference, etc.) that could be followed up in future work, but the current results are not able to distinguish among these possibilities.

      Line 112: We agree that the possibility of altered cognition or sexual preference cannot be dismissed. To incorporate this perspective, we have revised the text “, suggesting that they are less motivated to mate” to “These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (R3) On page 5, the authors present that peripheral E2 manipulation (delivery to the fish tank) restores courtship behavior in males, and then go on to erroneously conclude that this demonstrates "that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior." Because this is a peripheral E2 treatment, there can be manifold effects on gonadal physiology or other endocrine events that can have indirect effects on the brain and behavior. Without manipulation of E2 directly to the brain to 'rescue' the cyp19a1b deficiency, the authors cannot conclude that these effects are directly on the central nervous system. Tellingly, the tank E2 treatment did not rescue aggressive behavior, suggestive of the potential for indirect effects.

      Line 155: As detailed in Response to reviewer #2’s specific comment 1, we have revised the text from “These results demonstrated that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior. In contrast” to “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (R4) The downregulation of androgen-dependent gene expression (vasotocin in pNVT and galanin in pPMp) in the cyp19a1b deficient males (Figure 3) could be due to exceedingly high levels of brain androgens in the cyp19a1b deficient males. The best way to test the idea that estrogens can restore the expression to be more wild-type directly (like what is happening for ara and arb) is to look at these same markers (vasotocin and galanin) in these same brain areas in the brains of E2-treated males. The authors should have these brains from Figure 2. Unless I missed something, those experiments were not performed/reported here. It is clear that the ara and arb receptors have EREs and are 'rescued' by E2 treatment, but in principle, there could be indirect actions for reasons stated above for the behavior due to the peripheral E2 tank application.

      Thank you for your insightful comment. We agree that the current results cannot exclude the possibility that excessive androgen levels caused the downregulation of vt and gal. However, our previous studies showed that excessive 11KT administration to gonadectomized males and females increased the expression of these genes to levels comparable to wild-type males (Yamashita et al., 2020, eLife, 9:e59470; Kawabata-Sakata et al., 2024, Mol Cell Endocrinol 580:112101), making this scenario unlikely. That said, testing whether estrogen treatment restores vt and gal expression in cyp19a1bdeficient males would be informative, and we see this as an important direction for future research.

      Discussion

      (D1) The authors need to clarify whether EREs are found in other vertebrate AR introns, or is this unique to the teleost genome duplication?

      We have identified multiple ERE-like sequences within intron 1 of the mouse AR gene. However, sequence data alone do not provide sufficient evidence of their functionality, rendering this information of limited relevance. Therefore, we have chosen not to include this discussion in the current paper.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors are strongly encouraged to report information regarding the effect of Cyp19a1b deletion on the brain content of aromatase protein (ideally both isoforms investigated separately) as the two isoforms are mostly but not completely brain vs gonad specific. The analysis of other tissues would also strengthen the characterization of this model.

      We agree that measuring aromatase protein levels in the brain of our fish would be valuable for confirming the loss of cyp19a1b function. However, as no suitable method is currently available, this issue will need to be addressed in future studies. While this constitutes indirect evidence, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly suggests the loss of cyp19a1b function, as noted in Response to reviewer #3’s comment 1 on weaknesses.

      (2) As presented, this study reads as niche work. A better description of the behavior and reproductive significance of the different aspects of the behavioral sequence would allow a better understanding of the results and would thus allow the non-specialist to appreciate the significance of the observations.

      Line 103: In response to this comment and Reviewer #3’s comment 2 on weaknesses, we have revised the sentence from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning)” in order to provide a more detailed description of medaka mating behavior.

      (3) The data regarding female behavior are limited and incomplete. It is suggested to keep this for another manuscript unless data on the behavior of the female herself is added. Indeed, analyzing female's behavior from the male's perspective complicates the interpretation of the results while a description of what the females do would provide valuable and interpretable information.

      We thank the reviewer for this thoughtful suggestion and agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (4) In Figure 2, the validity to run multiple T-tests rather than a two-way ANOVA comparing TRT and genotype is questionable. Moreover, why are the absolute values in CTL higher than in the initial experiment comparing genotypes for ara in PPa, pPPp, and NVT as well as for arb in aPPp. More importantly, these graphs do not seem to reproduce the genotype effects for ara in pPPp and NVT and for arb in aPPp.

      The data in Figures 2J and 2K were analyzed with an exclusive focus on the difference between vehicletreated and E2-treated males, without considering genotype differences. Therefore, the use of T-tests for significance testing is appropriate.

      As the reviewer noted, the overall ara expression area is larger in Figure 2J than in Figure 2F. However, as detailed in Response to reviewer #3’s comment 8 on weaknesses, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, we consider this difference unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear smaller in Figures 2J and 2K compared to Figures 2F and 2H. This is likely due to the smaller sample size used in the experiments for Figures 2J and 2K, which makes the differences less distinct. However, since the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (5) More information is required regarding the analysis of single ISH - How was the positive signal selected from the background in the single ISH analyses? How was this measure standardized across animals? How many sections were imaged per region? Do the values represent unilateral or bilateral analysis?

      Line 540: Following this comment, we have provided additional details on the single ISH method in the manuscript. Specifically, “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to “The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      (6) More information should be provided in the methods regarding the image analysis of double ISH. In particular, what were the criteria to consider a cell as labeled are not clear. This is not clear either from the representative images.

      Line 596: To provide additional details on the single ISH method in the manuscript, we have added the following sentence: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (7) There is no description of the in silico analyses run on ESR2a in the methods.

      The method for identifying estrogen-responsive element-like sequences in the esr2a locus is described in line 549: “Each nucleotide sequence of the 5′-flanking region of ara and arb was retrieved from the Ensembl medaka genome assembly and analyzed for potential canonical ERE-like sequences using Jaspar (version 5.0_alpha) and Match (public version 1.0) with default settings.”

      However, the method for domain identification in Esr2a was not described. Therefore, we have added the following text in line 469: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference: S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      (8) Information about the validation steps of the EIA that were carried out as well as the specificity of the antibody the steroids and the extraction efficacy should be provided.

      We have not directly validated the applicability of the EIA kit, but its extensive use in medaka suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      The specificity (cross-reactivity) of the antibodies is detailed as follows.

      (1) Estradiol ELISA kits: estradiol, 100%; estrone, 1.38%; estriol, 1.0%; 5α-dihydrotestosterone, 0.04%; androstenediol, 0.03%; testosterone, 0.03%; aldosterone, <0.01%; cortisol, <0.01%; progesterone, <0.01%.

      (2) Testosterone ELISA kits: testosterone, 100%; 5α-dihydrotestosterone, 27.4%; androstenedione, 3.7%; 11-ketotestosterone, 2.2%; androstenediol, 0.51%; progesterone, 0.14%; androsterone, 0.05%; estradiol, <0.01%.

      (3) 11-Keto Testosterone ELISA kits: 11-ketotestosterone, 100%; adrenosterone, 2.9%; testosterone, <0.01%.

      As this information is publicly available on the manufacturer’s website, we deemed it unnecessary to include it in the manuscript.

      Unfortunately, we have not evaluated the extraction efficacy of the samples, but we recognize this a subject for future studies.

      (9) I wonder whether the evaluation of the impact of the mutation by comparing the behavior of a group of wild-type males to a group of mutated males is the most appropriate. Justifying this approach against testing the behavior of one mutated male facing one or several wild-type males would be appreciated.

      We agree that the resident-intruder test, in which a single focal resident is confronted with one or more stimulus intruders, is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (10) Lines 329-331: this sentence should be rephrased as it contributes to the confusion between sexual differentiation and activation of circuits. The restoration of sexual behavior by adult estrogen treatment pleads in favor of an activational role of neuro-estrogens on behavior rather than an organizational role. Therefore, referring to sexual differentiation is misleading, even more so that the study never compares sexes.

      As detailed in Response to reviewer #3’s comment 9 on weaknesses, we consider that all factors that cause sex differences, including the transient effects of adult steroids, need to be incorporated into a theory of sexual differentiation. In teleosts, since steroids during early development have little effect and sexual differentiation primarily relies on steroid action in adulthood, our discussion on brain sexual differentiation remains valid, including the statement in line 347: “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species.”

      (11) Lines 384-386: I may have missed something but I do not see data supporting the notion that neuroestrogens may function upstream of PGF2a signaling to mediate female receptivity.

      Line 403: We acknowledge that our explanation was insufficient and apologize for any confusion. To clarify this point, “Given that estrogen/Esr2b signaling feminizes the neural substrates that mediate mating behavior, while PGF2α signaling triggers female sexual receptivity,” has been added before the sentence “The present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.”

      Additional alteration

      Reference list (line 682): a preprint article has now been published in a peer-reviewed journal, and the information has been updated accordingly as follows: “bioRxiv doi: 10.1101/2024.01.10.574747 (2024)” to “Proc. Natl. Acad. Sci. U.S.A. 121, e2316459121 (2024).”

    1. Author response:

      Reviewer #1 (Public review):

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      We apologise for the oversight. These details have now been added to the methods section of the manuscript as well as to the relevant figure legends.

      Briefly, latrunculin was used at a final concentration of 250 nM and Y27632 at a final concentration of 50 μM. Both drugs were dissolved in DMSO. The vehicle controls were effected with the highest final concentration of DMSO of the two drugs.

      The details of the drug treatments and their duration was added to the methods and to figures 6, S10, and S12.

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      We apologise for the oversight. We have now added this data to the relevant figure panels.

      To gain a further understanding of the heterogeneity of bead displacements across cells, we have replotted the relevant graphs using different colours to indicate different cells. This reveals that different cells appear to behave similarly and that the behaviour appears controlled by distance to the indentation or the pipette tip rather than cell identity.

      We agree with the reviewer that the number of cells examined is low. This is due to the challenging nature of the experiments that signifies that many attempts are necessary to obtain a successful measurement.

      The experiments in Fig 1C are a verification of a behaviour documented in a previous publication [1]. Here, we just confirm the same behaviour and therefore we decided that only a small number of cells was needed.

      The experiments in Fig 2C (that allow for a direct estimation of the cytoplasm’s hydraulic permeability) require formation of a tight seal between the glass micropipette and the cell, something known as a gigaseal in electrophysiology. The success rate of this first step is 10-30% of attempts for an experienced experimenter. The second step is forming a whole cell configuration, in which a hydraulic link is formed between the cell and the micropipette. This step has a success rate of ~ 50%. Whole cell links are very sensitive to any disturbance. After reaching the whole cell configuration, we applied relatively high pressures that occasionally resulted in loss of link between the cell and the micropipette. In summary, for the 12 successful measurements, hundreds of unsuccessful attempts were carried out.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

      We thank the reviewer for this comment. Based on our experiments, we found that the cytoplasm behaves as a poroelastic material. However, to understand the displacements of the cell surface in response to localised indentation, we show that we also need to take the tension of the sub membranous cortex into account. In summary, the interplay between cell surface tension generated by the cortex and the poroelastic cytoplasm controls the cell behaviour. To our knowledge, no simple analytical solutions to this type of problem exist.

      In Fig 1, we show that the response of the cell to local indentation is biphasic with a short time-scale displacement followed by a longer time-scale one. In Figs 2 and 3, we directly characterise the kinetics of cell surface displacement in response to microinjection of fluid. These kinetics are consistent with the long time-scale displacement but not the short time-scale one. Scaling considerations led us to propose that tension in the cortex may play a role in mediating the short time-scale displacement. To verify this hypothesis, we have now added new data showing that the length-scale of an indentation created by an AFM probe depends on tension in the cortex (Fig S5).

      In a previous publication [2], we derived the temporal dynamics of cell surface displacement for a homogenous poroelastic material in response to a change in osmolarity. In the current manuscript, the composite nature of the cell (membrane, cortex, cytoplasm) needs to be taken into account as well as a realistic cell shape. Therefore, we did not attempt to provide an analytical solution for the displacement of the cell surface versus time in the current work. Instead, we turned to finite element modelling to show that our observations are qualitatively consistent with a cell that comprises a tensed sub membranous actin cortex and a poroelastic cytoplasm (Fig 4). We have now added text to make this clearer for the reader.

      Reviewer #2 (Public review):

      Comments & Questions:

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      Upon rereading our manuscript, we agree with the reviewer that some of our statements are too strong. We have now moderated these and clarified the goal of that section of the text.

      The reviewer asks if we have examined the effect of various perturbations on the short time-scale displacements. In our experimental conditions, we cannot precisely measure the time-scale of the fast relaxation because its duration is comparable to the frame rate of our image acquisition. However, we examined the amplitude of the displacement of the first phase in response to sucrose treatment and we have carried out new experiments in which we treat cells with 250nM Latrunculin to partially depolymerise cellular F-actin. Neither of these treatments had an impact on the amplitude of vertical displacements (Author response image 1).

      The absence of change in response to Latrunculin may be because the treatment decreases both the elasticity of the cytoplasm E and the cortical tension γ. As the length-scale l of the deformation of the surface scales as , the two effects of latrunculin treatment may therefore compensate one another and result in only small changes in l. We have now added this data to supplementary information and comment on this in the text.

      Author response image 1:

      Amplitude of the short time-scale displacements of beads in response to AFM indentation at δx=0µm for control cells, sucrose treated cells, and cells treated with Latrunculin B. n indicates the number of cells examined and N the number of beads.

      The reviewer’s comment also made us want to determine how cortical tension affects the length-scale of the cell surface deformation created by localised micro indentation. To isolate the role of the cortex from that of cell shape, we decided to examine rounded mitotic cells. In our experiments, we indented a mitotic cell expressing a membrane targeted GFP with a sharp AFM tip (Author response image 2).

      In our experiments, we adjusted force to generate a 2μm depth indentation and we imaged the cell profile with confocal microscopy before and during indentation. Segmentation of this data allowed us to determine the cell surface displacement resulting from indentation and measure a length scale of deformation. In control conditions, the length scale created by deformation is on the order of 1.2μm. When we inhibited myosin contractility with blebbistatin, the length-scale of deformation decreased significantly to 0.8 μm, as expected if we decrease the surface tension γ without affecting the cytoplasmic elasticity. We have now added this data to our manuscript.

      Author response image 2.

      (a) Overlay of the zx profiles of a mitotic cell before (green) and during indentation (red). The cell membrane is labelled with CellMask DeepRed. The arrowhead indicates the position of the AFM tip. Scale bar 10µm. (b) Position of the membrane along the top half of the cell before (green) and during (red) indentation. The membrane position is derived from segmentation of the data in (a). Deformation is highly localised and membrane profiles overlap at the edges. The tip position is marked by an *. (c) The difference in membrane height between pre-indentation and indentation profiles plotted in (b) with the tip located at x=0. (d) Schematic of the cell surface profile during indentation and the corresponding length scale of the deformation induced by indentation. (e) Measured length scale for an indentation ~2µm for DMSO control l=1.2±0.2µm (n=8 cells) and with blebbistatin treatment (100µM) l=0.8±0.4µm (n=9 cells) (p= 0.016

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      We thank the reviewer for this interesting question. Upon re-examining our data, we realised that the numerical values in the text related to the average rather than the median of our measurements. The median of the poroelastic time constant increases from ~0.4s in control conditions to 1.4s in sucrose, representing approximately a 3.5-fold increase.

      Previous work showed that HeLa cell volume decreases by ~40% in response to hyperosmotic shock [3]. The fluid volume fraction in cells is ~65-75%. If we assume that the water is contained in N pores of volume , we can express the cell volume as with V<sub>s</sub> the volume of the solid fraction. We can rewrite with ϕ = 0.42 -0.6. As V<sub>s</sub> does not change in response to osmotic shock, we can rewrite the volume change to obtain the change in pore size .

      The poroelastic diffusion constant scales as and the poroelastic timescale scales as . Therefore, the measured change in volume leads to a predicted increase in poroelastic diffusion time of 1.7-1.9-fold, smaller than observed in our experiments. This suggests that some intuition can be gained in a straightforward manner assuming that the cytoplasm is a homogenous porous material.

      However, the reality is more complex and the hydraulic pore size is distinct from the entanglement length of the cytoskeleton mesh, as we discussed in a previous publication [4]. When the fluid fraction becomes sufficiently small, macromolecular crowding will impact diffusion further and non-linearities will arise. We have now added some of these considerations to the discussion.

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions?

      We thank the reviewer for this comment. As discussed above, we have explored such considerations in a previous publication (see discussion in [4]). Briefly, we find that the entanglement length of the F-actin cytoskeleton does play a role in controlling the hydraulic pore size but is distinct from it. Membrane bounded organelles could also contribute to setting the pore size. In our previous publication, we derived a scaling relationship that indicates that four different length-scales contribute to setting cellular rheology: the average filament bundle length, the size distribution of particles in the cytosol, the entanglement length of the cytoskeleton, and the hydraulic pore size. Many of these length-scales can be dynamically controlled by the cell, which gives rise to complex rheology. We have now added these considerations to our discussion.

      Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      We thank the reviewer for this interesting question. Based on the same scaling arguments as above, we would expect that a 10-20% increase in cell volume would give rise to 10-20% increase in diffusion constant. However, we also note that metaphase leads to a dramatic reorganisation of the cell interior and in particular membrane-bounded organelles. In summary, we do not know why such a decrease could take place. We now highlight this as an interesting question for further research.

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature? Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      We thank the reviewer for this comment. We cannot directly estimate the hydraulic pore size from the measurements performed in the manuscript. Indeed, while we understand the general scaling laws, the pre-factors of such relationships are unknown.

      We carried out experiments aiming at estimating the hydraulic pore size in previous publications [3,4] and others have shown spatial heterogeneity of the cytoplasmic pore size [5]. In our previous experiments, we examined the diffusion of PEGylated quantum dots (14nm in hydrodynamic radius). In isosmotic conditions, these diffused freely through the cell but when the cell volume was decreased by a hyperosmotic shock, they no longer moved [3,4]. This gave an estimate of the pore radius of ~15nm.

      Previous work has suggested that F-actin plays a role in dictating this pore size but microtubules and intermediate filaments do not [4].

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      We apologise for the oversight. The quantifications are presented in Fig S10 and Fig S12. We have now modified the figure legends accordingly.

      Blebs are very heterogenous in size and growth velocity within a cell and across cells in the population in normal conditions [6]. Other work has shown that bleb size is controlled by a competition between pressure driving growth and actin polymerisation arresting it[7]. Therefore, we did not attempt to determine the impact of depressurisation on bleb growth velocity or size.

      In experiments in which we suddenly increased pressure in blebbing cells, we did notice a change in the rate of growth of blebs that occurred after we increased pressure (Author response image 3). However, the experiments are technically challenging and we decided not to perform more.

      Author response image 3:

      A. A hydraulic link is established between a blebbing cell and a pipette. At time t>0, a step increase in pressure is applied. B. Kymograph of bleb growth in a control cell (top) an in a cell subjected to a pressure increase at t=0s (bottom). Top: In control blebs, the rate of growth is slow and approximately constant over time. The black arrow shows the start of blebbing. Bottom: The black arrow shows the start of blebbing. The dashed line shows the timing of pressure application and the red arrow shows the increase in growth rate of the bleb when the pressure increase reaches the bleb. This occurs with a delay δt.

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      We thank the reviewer for this comment.

      First, we would like to clarify that both metaphase and interphase cells increase their volume in response to microinjection. The effect is easier to quantify in metaphase cells because we assume spherical symmetry and just monitor the evolution of the radius (Fig 3). However, the displacement of the beads in interphase cells (Fig 2) clearly shows that the cell volume increases in response to microinjection. For both interphase and metaphase cells, when the injection is prolonged, the membrane eventually detaches from the cortex and large blebs form until cell lysis. In contrast to the reviewer’s intuition, we never observe a relaxation in cell volume, probably because we inject fluid faster than the cell can compensate volume change through regulatory mechanisms involving ion channels.

      When we depressurise metaphase cells, we do not observe any change in volume (Fig S10). This contrasts with the increase that we observe upon pressurisation. The main difference between these two experiments is the pressure differential. During depressurisation experiments, this is the hydraulic pressure within the cell ~500Pa (Fig 6A); whereas during pressurisation experiments, this is the pressure in the micropipette, ranging from 1.4-10 kPa (Fig 3). We note in particular that, when we used the lowest pressures in our experiments, the increase in volume was very slow (see Fig 3C). Therefore, we agree with the reviewer that it is likely the magnitude of the pressure differential that explains these differences.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

      We thank the reviewer for pointing this out. In our opinion, the saturation occurring at 30 microns arises from the geometry of the model. At the largest distance away from the micropipette, the cortex becomes dominant in the mechanical response of the cell because it represents an increasing proportion of the cellular material.

      To test this hypothesis, we will rerun our finite element models with a range of cell sizes. This will be added to the manuscript at a later date.

      Reviewer #3 (Public review):

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      We thank the reviewer for these comments and we agree with their general premise.

      Some observations could qualitatively be explained in other ways. For example, if we considered the cell as a viscoelastic material, we could define a time constant with η the viscosity and E the elasticity of the material. The increase in relaxation time with sucrose treatment could then be explained by an increase in viscosity. However, work by others has previously shown that, in the exact same conditions as our experiment, viscoelasticity cannot account for the observations[1]. In its discussion, this study proposed poroelasticity as an alternative mechanism but did not investigate that possibility. This was consistent with our work that showed that the cytoplasm behaves as a poroelastic material and not as a viscoelastic material [4]. Therefore, we decided not to consider viscoelasticity as possibility. We now explain this reasoning better and have added a sentence about a potential role for mechanotransductory processes in the discussion.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      We agree with the reviewer. In our previous studies, we already examined what biological structures affect the poroelastic properties of cells [2,4]. Therefore, the most interesting aspect to examine in our current work would be perturbations to the phenomenon described in Fig 6G and, in particular, to investigate what volume regulation mechanisms enable sustained intracellular pressure gradients. However, these experiments are particularly challenging and with very low throughput. Therefore, we feel that these are out of the scope of the present report and we mention these as promising future directions.

      References:

      (1) Rosenbluth, M. J., Crow, A., Shaevitz, J. W. & Fletcher, D. A. Slow stress propagation in adherent cells. Biophys J 95, 6052-6059 (2008). https://doi.org/10.1529/biophysj.108.139139

      (2) Esteki, M. H. et al. Poroelastic osmoregulation of living cell volume. iScience 24, 103482 (2021). https://doi.org/10.1016/j.isci.2021.103482

      (3) Charras, G. T., Mitchison, T. J. & Mahadevan, L. Animal cell hydraulics. J Cell Sci 122, 3233-3241 (2009). https://doi.org/10.1242/jcs.049262

      (4) Moeendarbary, E. et al. The cytoplasm of living cells behaves as a poroelastic material. Nat Mater 12, 253-261 (2013). https://doi.org/10.1038/nmat3517

      (5) Luby-Phelps, K., Castle, P. E., Taylor, D. L. & Lanni, F. Hindered diffusion of inert tracer particles in the cytoplasm of mouse 3T3 cells. Proc Natl Acad Sci U S A 84, 4910-4913 (1987). https://doi.org/10.1073/pnas.84.14.4910

      (6) Charras, G. T., Coughlin, M., Mitchison, T. J. & Mahadevan, L. Life and times of a cellular bleb. Biophys J 94, 1836-1853 (2008). https://doi.org/10.1529/biophysj.107.113605

      (7) Tinevez, J. Y. et al. Role of cortical tension in bleb growth. Proc Natl Acad Sci U S A 106, 18581-18586 (2009). https://doi.org/10.1073/pnas.0903353106

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon.

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.

      Strengths:

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.

      Weaknesses:

      Suggestions for refinement:

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells?

      The transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. We will include this data set in the revised version of the manuscript.

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1.

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ((Beck et al, 2021), Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor aza-deoxycytidine (Author response image 2 and 3). These finding are in accordance with the observation that inhibition of DNA methyltransferase activity by azadeoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in upregulation of L1TD1 (Altenberger et al, 2017). Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We will include this information in the revised manuscript.

      Author response image 1.

      RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice (Beck et al., 2021). Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test).

      Author response image 2.

      RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3.

      Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C) RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. P < 0.05, *P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing.

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability.

      Based on previous studies with hESCs, it is likely that, in addition to its role in retrotransposition, L1TD1 has additional functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability.

      Reviewer #2 (Public Review):

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.

    1. Author response:

      eLife Assessment

      Alignment and sequencing errors are a major concern in molecular evolution, and this valuable study represents a welcome improvement for genome-wide scans of positive selection. This new method seems to perform well and is generally convincing, although the evidence could be made more direct and more complete through additional simulations to determine the extent to which alignment errors are being properly captured.

      We thank the editors for their positive assessment and for highlighting the core strength and a key area for improvement. The main request (also echoed by both reviewers) is for us to conduct additional simulation studies where true alignment errors are known and assess the performance of BUSTED-E. We plan to conduct several simulations (on the order of 100,000 individual alignments in total) in response to that request, with the caveat that we are not aware of any tools that simulate realistic alignment errors, so these simulations are likely only a pale reflection of biological reality.

      (1) Ad hoc small local edits of alignments similar to what was implemented in the HMMCleaner paper. These local edits would include operations like replacement of codons or small stretches of sequences with random data, local transposition, inversion.

      (a) Using parametrically simulated alignments (under BUSTED models).

      (b) Using empirical alignments.

      (2) Simulations under model misspecification, specifically to address the point of reviewer 2. For example, we would simulate under models that allow for multi-nucleotide substitutions, and then apply error filtering under models which do not.

      We will also run several new large-scale screens of existing alignments, to directly and indirectly address the reviewers comments. These will include

      (a) A drosophila dataset (from https://academic.oup.com/mbe/article/42/4/msaf068/8092905)

      (b) Current Selectome data (https://selectome.org/), both filtered and unfiltered. Here the filtering procedure refers to what Selectome does to obtain what its authors think are high quality alignments.

      (c) Current OrthoMam data, both (https://orthomam.mbb.cnrs.fr/) filtered and unfiltered. Here the filtering procedure refers to what OrthoMam does to obtain what its authors think are high quality alignments.

      Reviewer #1:

      We are grateful to Reviewer #1 for their positive and encouraging review. We are pleased they found our analyses convincing and recognized BUSTED-E as a "simple, efficient, and computationally fast" improvement for evolutionary scans.

      Strengths:

      As a side note, I found it particularly interesting how the authors tested the statistical support for the new method compared to the simpler version without the error class. In many cases, the simpler model could not be statistically rejected in favor of the more complex model, despite producing biologically incorrect results in terms of parameter inference. This highlights a broader issue in molecular evolution and phylogenomics, where model selection often relies too heavily on statistical tests, potentially at the expense of biological realism.

      We agree that this observation touches upon a critical issue in phylogenomics. A statistically "good" fit does not always equate to a biologically accurate model. We believe our work serves as a useful case study in this regard. We will add discussion of the importance of considering biological realism alongside statistical adequacy in model selection.

      Weaknesses:

      Regarding the structure of the manuscript, the text could be clearer and more precise.

      We appreciate this feedback. We will perform a thorough revision of the entire manuscript to improve its clarity, flow, and precision. We will focus on streamlining the language and ensuring that our methodological descriptions and results are as unambiguous as possible.

      Clear, practical recommendations for users could also be provided in the Results section.

      To make our method more accessible and its application more straightforward, we will add a new section that provides clear, practical recommendations for users. This includes guidance on when to apply BUSTED-E, how to interpret its output, and best practices for distinguishing potential errors from strong selection.

      Additionally, the simulation analyses could be further developed to include scenarios with both alignment errors and positive selection, in order to better assess the method's performance.

      Additional simulations will be conducted (see above)

      Finally, the model is evaluated only in the context of site models, whereas the widely used branch-site model is mentioned as possible but not assessed.

      BUSTED class models support branch-site variation in dN/dS, so technically all of our analyses are already branch-site. However, we interpret the reviewer’s comment as describing use cases when a method is used to test for selection on a subset of tree branches (as opposed to the entire tree). BUSTED-E already supports this ability, and we will add a section in the manuscript describing how this type of testing can be done, including examples. However, we do not plan to conduct additional extensive data analyses or simulations, as this would probably bloat the manuscript too much.

      Reviewer #2:

      We thank Reviewer #2 for their detailed and thought-provoking comments, and for their enthusiasm for modeling alignment issues directly within the codon modeling framework. The criticisms raised are challenging and we will work on improving the justification, testing, and contextualization of our method.

      Weaknesses:

      The definition of alignment error by a very large ω is not justified anywhere in the paper... I would suggest characterising a more specific error model. E.g., radical amino-acid "changes" clustered close together in the sequence, proximity to gaps in the alignment, correlation of apparent ω with genome quality... Also concerning this high ω, how sensitive is its detection to computational convergence issues?

      This is a fundamental point that we are grateful to have the opportunity to clarify. Our intention with the high ω category is not to provide a mechanistic or biological definition of an alignment error. Rather, its purpose is to serve as a statistical "sink" for codons exhibiting patterns of divergence so extreme that they are unlikely to have resulted from a typical selective process. It is phenomenological and ad hoc. The reviewer makes sensible suggestions for other ad hoc/empirical approaches to alignment quality filtering, but most of those have already been implemented in existing (excellent) alignment filtering tools. BUSTED-E is never meant to replace them, but rather to catch what is left over. Importantly, error detection is not even the primary goal of BUSTED-E; errors are treated as a statistical nuisance. With all due respect, all of the reviewers suggestions are similarly ad hoc -- there is no rigorous quantitative justification for any of them, but they are all sensible and plausible, and usually work in practice.

      Computational convergence issues can never be fully dismissed, but we do not consider this to be a major issue. Our approach already pays careful attention to proper initialization, does convergence checks, considers multiple initial starting points. We also don’t need to estimate large ω with any degree of precision, it just needs to be “large”.

      The authors should clarify the relation between the "primary filter for gross or large-scale errors" and the "secondary filter" (this method). Which sources of error are expected to be captured by the two scales of filters?

      We will add discussion and examples to explicitly define the distinct and complementary roles of these filtering stages.

      The benchmarking of the method could be improved both for real and simulated data... I suggest comparing results with e.g. Drosophila genomes... For simulations, the authors should present simulations with or without alignment errors... and with or without positive selection... I also recommend simulating under more complex models, such as multinucleotide mutations or strong GC bias...

      We will add more simulations as suggested (see above). We will also analyze a drosophila gene alignment from previously published papers.

      It would be interesting to compare to results from the widely used filtering tool GUIDANCE, as well as to the Selectome database pipeline... Moreover, the inconsistency between BUSTED-E and HMMCleaner, and BMGE is worrying and should be better explained.

      Some of the alignments we have analyzed had already been filtered by GUIDANCE. We’ll also run the Selectome data through BUSTED-E: both filtered and unfiltered. We consider it beyond the scope of this manuscript to conduct detailed filtering pipeline instrumentation and side-by-side comparison.

      For a new method such as this, I would like to see p-value distributions and q-q plots, to verify how unbiased the method is, and how well the chi-2 distribution captures the statistical value.

      We will report these values for new null simulations.

      I disagree with the motivation expressed at the beginning of the Discussion... Our goal should not be to find a few impressive results, but to measure accurately natural selection, whether it is frequent or rare.

      That’s a philosophical point; at some level, given enough time, every single gene likely experiences some positive selection at some point in the evolutionary past. The practically important question is how to improve the sensitivity of the methods while controlling for ubiquitous noise. We do agree with the sentiment that the ultimate goal is to “measure accurately natural selection, whether it is frequent or rare”. However, we also must be pragmatic about what is possible with dN/dS methods on available genomic data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. It is clear from the reviews that we’ve had the privilege to have our work extensively and thoroughly checked by knowledgeable experts, for which we are very grateful. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have included detailed responses below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: Yield data has been collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases this resulted in variations in how data were collected or processed (e.g. taxonomic level of species identification). We have added more details to the sections on sampling design and data analysis to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This gives better insight in the variation of responses among ground beetle taxa.

      (4) Restrict findings to our system: We nuanced our findings further and focused more on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      Below we also respond to the editor and reviewers in more detail.

      Reviewing Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or cite now (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We revised the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We revised the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we included a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analyses, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were often made up of several species). We illustrate these analyses still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our reasoning about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and have carefully checked the text to avoid overstatements.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight variations in how data were collected or processed (e.g. taxonomic level of species identification).

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia & van Apeldoorn, 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data (line 203-207, 217-221).

      Reviwer #1 (Recommendations for the authors):

      This is a well-written study on the effects of strip cropping on ground-beetle diversity. As stated above the study is well analyzed, presented, and written but you should not pretend that you analyzed yield e.g. lines 25-27 "We show that strip cropping...enhance ground beetle biodiversity without incurring major yield loss.

      We understand the confusion caused by this sentence, and it was never our intention to give the impression that we analyzed yield losses. These findings were based on previous research by ourselves and colleagues, and we have now changed the sentence to reflect this (line 25-27).

      I think you assume that yield does not differ between strip cropping and monoculture. I am not sure this is correct as one crop might attract pests or predators spilling over to the other crop. I am also not sure if the sowing and harvest of the crop will come with the same costs. So if you assume this, you should only do it in the main manuscript and not the abstract, to justify this better.

      With three peer-reviewed papers on the same fields as we studied, we can convincingly state that strip cropping in organic agriculture generally does not result in major yield loss, although exceptions exist, which we refer to in the discussion.

      In the introduction lines 28-43, you refer to insect biomass decline. I wonder if you would like to add the study of Loboda et al. 2017 in Ecography. It seems not fitting as it is from the Artic but also the other studies you cite are not only coming from agricultural landscapes and this study is from the same time as the Hallmann et al. 2017 study and shows a decline in flies of 80%

      We have removed the sentence that this comment refers to, to streamline the introduction more.

      Lines 50-51. You only have one citation for biodiversity strategies in agricultural systems. I suggest citing Mupepele et al. 2021 in TREE. This study refers to management but also the policies and societal pressures behind it.

      We have added this citation and a recent paper by Cozim-Melges et al. (2024) here (line 49-52).

      In the methods, I am missing a section on species identifications. This would help to understand why you used "taxonomic richness".

      Thanks for pointing this out. We have now included a new section on ground beetle identification (line 304-309 in methods).

      Figure 1 is great and I like that you separated the field and crop-level data, although there is no statistical power for the crop-specific data. I personally would move k to the supplements. It is very detailed and small and therefore hard to read

      We chose to keep figure 1k, as in our view it gives a good impression of the scale of the experiment, the number of crops included and the absolute numbers of caught species.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      We did not intend to investigate the effect of strip cropping on crop yields, but rather place our work in the framework of earlier studies that already studied yield. All the monocultures and strip cropping fields were organic farms. Our findings thus compare crop diversity effects within the context of organic farming.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we have explained this more clearly (line 297-300).

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields. We present a more detailed description of the sampling design in the methods of the revised manuscript (line 294-297).

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      To be clear, this limitation relates to the comparison that we did for the community compositions of ground beetles in two crops either in strip cropping or monocultures. In this case, it was impossible to avoid potential autocorrelation due to our field design. We also acknowledge this limitation in the results section (line 130-133). However, for our other analyses we corrected for spatial autocorrelation by including variables per location, year and crop. This grouped samples that were spatially autocorrelated. Therefore, we don’t see this as a discrepancy of our other analyses.

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      We agree and acknowledge that crop type can influence carabid richness and density, which is why we have included variables to account for differences caused by crops. However, we did not observe consistent differences between crops in how strip cropping affected ground beetle richness and density. Therefore, we don’t think that crop types would have influenced our conclusions on the overall effect of strip cropping.

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We have clarified this further in the revised manuscript (line 294-301, 318-322). As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. This approach allowed us to get a good impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we further clarified this point (line 365-366, 373-374).

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we nuanced our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields (line 33-35).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points stated under 'Weaknesses' above, I provide smaller comments and recommendations:

      Overall comments:

      (i) The carabid images used in the figures were created by Ortwin Bleich and are copyrighted. I could not find him accredited in the acknowledgements; the figure legends simply state that the images were taken from his webpage. Was his permission obtained? This should be stated.

      We have received written permission from Ortwin Bleich for using his pictures in our figures, and have accredited him for this in the acknowledgements (line 455-456).

      (ii) There is a great confusion in the field concerning terminology. The authors here use intercropping and strip cropping, a specific form of intercropping, interchangeably. I advise the authors to stick to strip cropping as it is more precise and avoids confusion with other forms of intercropping.

      We agree with the definitions given by reviewer 2 and had already used them as such in the text. We defined strip cropping in the first paragraph of the introduction and do not use the term “intercropping” after this definition to avoid confusion.

      Comments to specific lines:

      Line 19: While this is likely true, there is so far not enough compelling evidence for such a strong statement blaming agriculture. Please rephrase.

      Changed the sentence to indicate more clearly that it is one of the major drivers, but that the “blame” is not solely on agriculture (line 18-19).

      Line 22: Is this the case? I am aware of strip cropping being used in other countries, many of them in Europe. Why the focus on 'Dutch'?

      Indeed, strip cropping is now being pioneered by farmers throughout Europe. However in the Netherlands, some farmers have been pioneering strip cropping already since 2014. We have added this information to indicate that our setting is in the Netherlands, and as in our opinion it gives a bit more context to our manuscript.

      Line 24: I would argue that carabids are actually not good indicators for overall biodiversity in crop fields as they respond in a very specific way, contrasting with other taxa. It is commonly observed that carabids prefer more disturbed habitats and richness often increases with management intensity and in more agriculturally dominated landscapes - in stark contrast to other taxa like wild bees or butterflies.

      We have reworded this sentence to reflect that they are not necessarily indicators of wide agricultural biodiversity, but that they do hold keystone positions within food webs in agricultural systems (line 23-25).

      Line 31: This statement here is also too strong - carabids are not overall biodiversity and patterns found for carabids likely differ strongly from patterns that would be observed in other taxa. This study is on carabids and the conclusion should thus also refer to these in order to avoid such over-simplified generalizations.

      We agree and have nuanced this sentence to indicate that our findings are only on ground beetles (line 33-35). However, we would like to point out that the statement that “patterns found for carabids likely differ strongly from patterns that would be observed in other taxa” assumes a disassociation between carabids and other taxa.

      Line 41: I am sure the authors are aware of the various methodological shortcomings of the dataset used in Hallmann et al. (2017) which likely led to an overestimation of the actual decline. Analysing the same data, Müller et al. (2023) found that weather can explain fluctuations in biomass just as well as time. I thus advise not putting too much focus on these results here as they seem questionable.

      We have removed this sentence to streamline the introduction, thus no longer mentioning the percentages given by Hallmann et al. (2017).

      Line 46: Surely likely but to my knowledge this is actually remarkably hard to prove. Instead of using the IPBES report here that simply states this as a fact, it would be better to see some actual evidence referenced.

      We removed IPBES as a source and changed this for Dirzo et al. (2014), a review that shows the consequences of biodiversity decline on a range of different ecosystem services and ecological functions (line 45-47).

      Line 52ff: I am not sure whether this old land-sparing vs. land-sharing debate is necessary here. The authors could simply skip it and directly refer to the need of agricultural areas, the dominating land-use in many regions, to become more biodiversity-friendly. It can be linked directly to Line 61 in my opinion which would result in a more concise and arguably stronger introduction.

      After reconsidering, we agree with reviewer 2 that this section was redundant and we have removed the lines on land-sparing vs land-sharing.

      Line 59: Just a note here: this argument is not meaningful when talking about strip cropping in the Netherlands as there is virtually no land left that could be converted (if anything, agricultural land is lost to construction). The debate on land-use change towards agriculture is nowadays mostly focused on the tropics and the Global South.

      We argue that strip cropping could play an important role as a measure that does not necessarily follow the trade-off between biodiversity and agriculture for a context beyond the Netherlands (line 52-58).

      Line 69: Does this statement really need 8 references?

      Line 71: ... and this one 5 additional ones?

      We have removed excess references in these two lines (line 62-66).

      Line 74: But also likely provides the necessary crop continuity for many crop pests - the authors should keep in mind that when practitioners read agricultural biodiversity, they predominantly think of weeds and insect pests.

      We agree with reviewer 2 that agricultural biodiversity is still a controversial topic. However, as the focus in this manuscript is more on biodiversity conservation, rather than pest management, we prefer to keep this sentence as is. In other published papers and future work we focus more on the role of strip cropping for pest management.

      Line 83: Consider replacing 'moments' maybe - phenological stages or development stages?

      Although we understand the point of reviewer 2, we prefer to keep it at moments, as we did not focus on phenological stages and we only wanted to say that we set pitfall traps at several moments throughout the year. However, by placing the pitfall traps at several moments throughout the year, we did capture several phenological stages.

      Line 86: Not only farming practices - there are also massive fluctuations between years in the same crop with the same management due to effects of the weather in the previous reproductive season. Interpreting carabid assemblage changes is therefore not straightforward.

      We absolutely agree that interpreting carabid assemblage is not straightforward, but as we did not study year or crop legacy effects we chose to keep this sentence to maintain focus on our research goals.

      Line 88: 'ecolocal'?

      Typo, should have been ecological. Changed (line 81).

      Line 90: 'As such, they are often used as indicator group for wider insect diversity in agroecosystems' - this is the third repetition of this statement and the second one in this paragraph - please remove. Having worked on carabids extensively myself, I also think that this is not the true reason - they are simply easy to collect passively.

      We agree with the reviewer and have removed this sentence.

      Line 141: I have doubts about the value of the ISA looking at the results. Anchomenus dorsalis is a species extremely common in cereal monoculture fields in large parts of Europe, especially in warmer and drier conditions (H. griseus was likely only returned as it is generally rare and likely only occurred in few plots that, by chance, were strip-cropped). It can hardly be considered an indicator for diverse cropping systems but it was returned as one here (which I do not doubt). This often happens with ISA in my experience as they are very sensitive to the specific context of the data they are run on. The returned species are, however, often not really useable as indicators in other contexts. I thus believe they actually have very limited value. Apart from this, we see here that both monocultures and strip cropping have their indicators, as would likely all crop types. I wonder what message we would draw from this ...

      On close reconsideration, we agree with the reviewer that the ISAs might have been too sensitive to rare species that by chance occur in one of two crop configurations. To still get an idea on what happens with specific ground beetle groups, we chose to replace the ISAs with analyses on the 12 most common ground beetle genera. For this purpose we have added new sections to the methods (line 368-374) and results (line 135-143), replaced figure 2 and table S5, and updated the discussion (line 182-200).

      Line 165: Carabid activity is high when carabids are more active. Carabids can be more active either when (i) there are simply more carabid individuals or /and (ii) when they are starved and need to search more for prey. More carabid activity does thus not necessarily indicate more individuals, it can indicate that there is less prey. This aspect is missing here and should be discussed. It is also not true that crop diversification always increases prey biomass - especially strip cropping has previously been shown to decrease pest densities (Alarcón-Segura et al., 2022). Of course, this is a chicken-egg problem (less pests => less carabids or more carabids => less pests ?) ... this should at least be discussed.

      We have rewritten this paragraph to further discuss activity density in relation to food availability (line 175-185).

      Line 178: These species are not exclusively granivorous - this speculation may be too strong here.

      Line 185: true for all but C. melanocephalus - this species is usually more associated with hedgerows, forests etc.

      After removing the ISA’s, we also chose to remove this paragraph and replace it with a paragraph that is linked to the analyses on the 12 most common genera (line 182-200).

      Line 202: These statements are too strong for my taste - the authors should add an 'on average' here. The data show that they likely do not always enhance richness by 15 % and as the authors state, some monocultures still had higher richness and densities.

      “on average” added (line 211)

      Line 203: 'can lead' - the authors cannot tell based on their results if this is always true for all taxa.

      Changed to “can lead” (line 213)

      Line 205: What is 'diversification' here?

      This concerns measures like hedgerows or flower strips. We altered the sentence to make this clearer (line 215-216).

      Line 208: Does this statement need 5 references? (as in the introduction, the reader gets the impression the authors aimed to increase the citation count of other articles here).

      We have removed excess references (line 219-221).

      Line 222: How many are 'a few'? Maybe state a proportion.

      We only found two species, we’ve changed the sentence accordingly (line 232-233).

      Line 224: As stated above, I would not overstress the results of the ISAs - the authors stated themselves that the result for A. dorsalis is likely only based on one site ...

      We removed this sentence after removing the ISAs.

      Line 305: I think there is an additional nested random level missing - the transect or individual plot the traps were located in (or was there only one replicate for each crop/strip in each experiment)? Hard to tell as the authors provide no information on the actual sample sizes.

      Indeed, there was one field or plot per cropping system per crop per location per year from which all the samples were taken. Therefore the analysis does not miss a nested random level. We provided information on sample sizes in Table S7.

      Line 314ff: The authors describe that they basically followed a (slightly extended) Chao-Hill approach (species richness, Shannon entropy & inverse Simpson) without the sampling effort / sample completeness standardization implemented in this approach and as a reader I wonder why they did not simply just use the customary Chao-Hill approach.

      We were not aware of the Chao-Hill approach, and we see it as a compliment that we independently came up with an approach similar to a now accepted approach.

      Line 329: Unclear what was nested in what here - location / year / crop or year / location / crop ?

      For the crop-level analyses, the nested structure was location > year > crop. This nested structure was chosen as every location was sampled across different years and (for some locations) the crops differed among years. However, as we pooled the samples from the same field in the field-level analyses, using the same random structure would have resulted in each individual sampling unit being distinguished as a group. Therefore, the random structure here was only location > year. We explain this now more clearly in lines 329 and 355-357.

      Line 334: I can see why the authors used these distributions but it is presented here without any justification. As a side note: Gamma (with log link) would likely be better for the Shannon model as well (I guess it cannot be 0 or negative ...).

      We explain this now better in lines 360-364.

      Line 341: Why Hellinger and not simply proportions?

      We used Hellinger transformation to give more weight to rarer species. Our pitfall traps were often dominated by large numbers of a few very abundant / active species. If we had used proportions, these species would have dominated the community analyses. We clarified this in the text (line 379-381).

      Line 348: An RDA is constrained by the assumptions / model the authors proposed and "forces" the data into a spatial ordination that resembles this model best. As the authors previously used an unconstrained PERMANOVA, it would be better to also use an NMDS that goes along with the PERMANOVA.

      The initial goal of the RDA was not to directly visualize the results of the PERMANOVA, but to show whether an overall crop configuration effect occurred, both for the whole dataset and per location. We have now added NMDS figures to link them to the PERMANOVA and added these to the supplementary figures (fig S6-S8). We also mention this approach in the methods section (line 387-390).

      Line 355f: This is also a clear indication of the strong annual fluctuations in carabid assemblages as mentioned above.

      Indeed.

      Line 361: 'pairwise'.

      Typo, we changed this.

      Line 362: reference missing.

      Reference added (line 405)

      References

      Alarcón-Segura, V., Grass, I., Breustedt, G., Rohlfs, M., Tscharntke, T., 2022. Strip intercropping of wheat and oilseed rape enhances biodiversity and biological pest control in a conventionally managed farm scenario. J. Appl. Ecol. 59, 1513-1523.

      Boetzl, F.A., Sponsler, D., Albrecht, M., Batáry, P., Birkhofer, K., Knapp, M., Krauss, J., Maas, B., Martin, E.A., Sirami, C., Sutter, L., Bertrand, C., Baillod, A.B., Bota, G., Bretagnolle, V., Brotons, L., Frank, T., Fusser, M., Giralt, D., González, E., Hof, A.R., Luka, H., Marrec, R., Nash, M.A., Ng, K., Plantegenest, M., Poulin, B., Siriwardena, G.M., Tscharntke, T., Tschumi, M., Vialatte, A., Van Vooren, L., Zubair-Anjum, M., Entling, M.H., Steffan-Dewenter, I., Schirmel, J., 2024. Distance functions of carabids in crop fields depend on functional traits, crop type and adjacent habitat: a synthesis. Proceedings of the Royal Society B: Biological Sciences 291, 20232383.

      Hallmann, C.A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H., Stenmans, W., Müller, A., Sumser, H., Hörren, T., Goulson, D., de Kroon, H., 2017. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS One 12, e0185809.

      Knapp, M., Seidl, M., Knappová, J., Macek, M., Saska, P., 2019. Temporal changes in the spatial distribution of carabid beetles around arable field-woodlot boundaries. Scientific Reports 9, 8967.

      Müller, J., Hothorn, T., Yuan, Y., Seibold, S., Mitesser, O., Rothacher, J., Freund, J., Wild, C., Wolz, M., Menzel, A., 2023. Weather explains the decline and rise of insect biomass over 34 years. Nature.

      Toivonen, M., Huusela, E., Hyvönen, T., Marjamäki, P., Järvinen, A., Kuussaari, M., 2022. Effects of crop type and production method on arable biodiversity in boreal farmland. Agriculture, Ecosystems & Environment 337, 108061.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we nuanced the text in the revised version to reflect this. See below for more details on how we revised the manuscript to reflect this point.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on the same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission (line 161-164).

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We revised the methods section to better explain this (line 318-322). Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. This approach allowed us to make a better estimation whenever more samples were available, as we did not always have an equal number of samples available between both cropping systems. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture fields, and we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive to changes in common species. Therefore, we have replaced the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles in the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species (line 135-143, 182-200 in discussion). Furthermore, we have added information on rarity and habitat preference to the table that shows species abundances per location (Table S2), and mention these aspects briefly in the results (line 145-153).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion (line 160-162).

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We revised the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and clarified this research direction clearer in the introduction of the revised manuscript (line 66-72).

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we further explored the species/genera that respond to cropping systems and discuss these findings in more detail in the revised manuscript (line 182-200 in discussion).

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we added a column for rarity (according to waarneming.nl) in the table showing abundances of species per location (table S2). We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We discuss this more in depth in the revised results (line 145-153).

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We nuanced the implications of our study in the revised manuscript (line 30-35, 232-237).

      Reviewer #3 (Recommendations for the authors):

      General comments:

      (1) I am missing the R script and data files in the manuscript. This is a serious drawback in assessing the quality of the work.

      Datasets and R scripts will be made available upon completion of the manuscript.

      (2) I have doubts about the clarity of the title. It more or less states that strip cropping is designed in order to maintain productivity. However, the main objective of strip cropping is to achieve ecological goals without losing productivity. I suggest a rethink of the title and what it is the authors want to convey.

      As the title lead to false expectations for multiple reviewers regarding analyses on yield, we chose to alter the title and removed any mention of yield in the title.

      (3) Line 22: I would add something along the lines of: "As an alternative to intercropping, strip cropping is pioneerd by Dutch farmers... " This makes the distinction and the connection between the two more clear.

      In our opinion, strip cropping is a form of intercropping. We have changed this sentence to reflect this point better. (line 21-22)

      (4) Line 24: "these" should read "they"

      After changing this sentence, this typo is no longer there (line 24).

      (5) Line 34-48. I think this introduction is too long. The paper is not directly about insect decline, so the authors could consider starting with line 43 and summarising 34-42 in one or two sentences.

      Removed a sentence on insect declines here to make the introduction more streamlined.

      (6) Line 51-59. I am not convinced the land sparing - land sharing idea adds anything to the paper. It is not used in the discussion and solicits much discussion in and of itself unnecessary in this paper. The point the authors want to make is not arable fields compared to natural biodiversity, but with increases in biodiversity in an already heavily degraded ecosystem; intensive agriculture. I think the introduction should focus on that narrative, instead of the land sparing-sharing dichotomy, especially because too little attention is spent on this narrative.

      We removed the section on land-sparing vs land-sharing as it was indeed off-topic.

      (7) Line 85. Dynamics is not correctly used here. It should read Ground beetle communities are sensitive.

      Changed accordingly (line 78-79).

      (8) Line 90-91. Here, it should be added that ground beetles are used as indicators for ground-dwelling insect diversity, not wider insect diversity in agricultural systems. In fact, Gerlach et al., the reference included, clearly warn against using indicator groups in a context that is too wide for a single indicator group to cover and Van Klink (2022) has recently shown in a meta-analysis that the correlation between trends in insect groups is often rather poor.

      We removed the sentence that claimed ground beetles to be indicators of general biodiversity, and have focused the text in general more on ground beetle biodiversity, rather than general biodiversity.

      (9) Line 178: was there a high weed abundance measured in the stripcropping fields? Or has there been reports on higher weed abundance in general? The references provided do not appear to support this claim.

      To our knowledge, there is only one paper on the effect of strip cropping on weeds (Ditzler et al., 2023). This paper shows strip cropping (and more diverse cropping systems) reduce weed cover, but increase weed richness and diversity. We mistakenly mentioned that crop diversification increases weed seed biomass, but have changed this accordingly to weed seed richness. The paper from Carbonne et al. (2022) indeed doesn’t show an effect of crop diversification on weeds. However, it does show a positive relation between weed seed richness and ground beetle activity density. We have moved this citation to the right place in the sentence (line 172-175).

      (10) Line 279-288. The description of sampling with pitfalls is inadequate. Please follow the guidelines for properly incorporating sufficient detail on pitfall sampling protocols as described in Brown & Matthews 2016,

      We were sadly not aware of this paper prior to the experiments, but have at least added information on all characteristics of the pitfall traps as mentioned in the paper (line 290-294).

      (11) Lines 307-310. What reasoning lies behind the choice to focus on the most beetle-rich monocultures? Do the authors have references for this way of comparing treatments? Is there much variation in the monocultures that solicits this approach? It would be preferable if the authors could elaborate on why this method is used, provide references that it is a generally accepted statistical technique and provide additional assesments of the variation in the data so it can be properly related to more familiar exploratory data analysis techniques.

      We ran two analyses for the field-level richness and abundance. First we used all combinations of monocultures and strip cropping. However, as strip cropping is made up of (at least) 2 crops, we had 2 constituent monocultures. As we would count a comparison with the same strip cropped field twice when we included both monocultures, we also chose to run the analyses again with only those monocultures that had the highest richness and abundance. This choice was done to get a conservative estimate of ground beetle richness increases through strip cropping. We explained this methodology further in the statistical analysis section (line 329-335).

      In Figure S6 the order of crop combinations is altered between 2021 on the left and 2022 on the right. This is not helpful to discover any possible patterns.

      We originally chose this order as it represented also the crop rotations, but it is indeed not helpful without that context. Therefore, we chose to change the order to have the same crop combinations within the rows.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Reviewer #2 (Public review):

      Summary

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.

      Strengths

      The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.

      Reviewer #3 (Public review):

      The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The edits have significantly improved the clarity of the manuscript. A few small notes:

      Figure 2B legend - describe what the orange dashed line represents

      We added a description.

      Figure 2B legend - references Table 1 but I believe this should reference Table S1. There are other places in the manuscript where Table 1 is referenced and it should reference S1

      We changed this for all instances in the main paper and supplement, where the reference was wrong.

      Figure S1 legend - some figure panel letters are in parentheses while others are not

      We unified the notation to not use parentheses for any of the panel letters.

      Reviewer #2 (Recommendations for the authors):

      I couldn't find the l, r, d, v indications in Fig. 1a. This was just a suggestion, but since you wrote you added them, I was wondering if this is the old figure version.

      We added them to what is now Fig. 2, which was originally part of Fig. 1. After restructuring, we did indeed not add an additional set to Fig. 1, which we have now adjusted.

      Fig. 2: Adding 'optic flow' and 'edges' to the y-axis in panels E and F, would make it faster for me to parse the figure. Maybe also add the units for the magnitudes? Same for Figure 6B

      We added 'optic flow' and 'edges' to the panels E and F in Fig. 2 and Fig. 6.

      Fig. 2: Very minor - could you use the same pictograms in D and E&F (i.e. all circles for example, instead of switching to "tunnels" in EF)?

      We used the tunnel pictograms, because we associated those with the short notations for the different conditions summarised in Table S1. Because we wanted to keep this consistent across the paper, we used the “tunnel” pictograms here too.

      In the manuscript, you still draw lots of conclusions based on these area measurements (L132-142, L204-209 etc). This does not fully reflect what you wrote in your reply to the reviewers. If you think of these measurements as qualitative rather than quantitative, I would say so in the manuscript and not use quantitative statistics etc. My suggestion would be to be more specific about potential issues that can influence the measurement (you mentioned body size, image contrast, motion blur, pitch across conditions etc) and give that data not the same weight as the rest of the measurements.

      We do express explicit caution with this measure in the methods section (l. 657-659) and the results section (l. 135-137). Nevertheless, as the trends in the data are consistent with optic flow responses in the other planes, and with responses reported in the literature, we felt that it is valuable to report the data, as well as the statistics for all readers, who can – given out cautionary statement – assess the data accordingly.

      The area measurements suggest that moths fly lower with unilateral vertical gratings (Fig. S1, G1 and G2 versus the rest). If you leave the data in can you speculate why that would be? (Sorry if I missed that)

      We agree, this seems quite consistent, but we do not have a good explanation for this observation. It would certainly require some additional experiments and variable conditions to understand what causes this phenomenon.

      Fig.4 - is panel B somehow flipped? Shouldn't the flight paths start out further away from the grating and then be moved closer to midline (as in A). That plot shows the opposite.

      Absolutely right, thank you for spotting this, it was indeed an intermediate and not the final figure which was uploaded to the manuscript. It also had outdated letter-number identifiers, which we now updated.

      L198 - should be "they avoided"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      We thank the reviewer for raising the concerns regarding the definition of ROI. Our approach to analyze V1 separately was based on two key considerations. First, previous studies consistently identify V1 as the main locus of sensory-like templates during featurespecific preparatory attention (Kok et al., 2014; Aitken et al., 2020). Second, V1 shows the strongest orientation selectivity within the visual hierarchy (Priebe, 2016). In contrast, the extrastriate visual cortex (EVC; comprising V2, V2, V3AB and V4) demonstrates broader selectivity, such as complex features like contour and texture (Grill-Spector & Malach, 2004). Thus, we think it would be particularly informative to analyze V1 data separately as our experiment examines orientation-based attention. We should also note that we conducted MVPA separately for each visual ROIs (V2, V3, V3AB and V4). After observing similar patterns of results across these regions, we averaged the decoding accuracies into a single value and labeled it as EVC. This approach allowed us to simplify data presentation while preserving the overall data pattern in decoding performance. We now added the related explanations on the ROI definition in the revised texts (Page 26; Line 576-581).

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      We thank the reviewer for the comments. We agree that including valid and neutral cue trials would have provided valuable behavioral measures of attention; Yet, our current design was aimed at maximizing the number of trials for decoding analysis due to fMRI time constraints. Thus, we could not fit additional conditions to measure the behavioral effects of attention. However, we note that in our previous studies using a similar feature cueing paradigm, we observed benefits of attentional cueing on behavioral performance when comparing valid and neutral conditions (Liu et al., 2007; Jigo et al., 2018). Furthermore, our neural data indeed demonstrated attention-related modulation (as indicated by MVPA results, Fig. 2 in the main texts) so we are confident that on average participants followed the instruction and deployed their attention accordingly. We now added the related explanations on this point in the revised texts (Page 23; Line 492-498).

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      We thank the reviewer for this comment. We added the following extensive discussion on this point in the revised texts (Page 18; Line 363-381).

      “It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance. While these functions are intuitively similar and likely overlap, there is also evidence indicating that they can be dissociated (Battistoni et al., 2017). In particular, we note that in our task, attention is guided by symbolic cues (color-orientation associations), while working memory tasks typically present the actual visual stimulus as the memorandum. A central finding in working memory studies is that neural signals during WM maintenance are sensory in nature, as demonstrated by generalizable neural activity patterns from stimulus encoding to maintenance in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019). However, in our task, neural signals during preparation were nonsensory, as demonstrated by a lack of such generalization in the No-Ping session (see also Gong et al., 2022). We believe that the differences in cue format and task demand in these studies may account for such differences. In addition to the difference in the sensory nature of the preparatory versus delay-period activity, our ping-related results also exhibited divergence from working memory studies (Wolff et al., 2017; 2020). While these studies used the visual impulse to differentiate active and latent representations of different items (e.g., attended vs. unattended memory item), our study demonstrated the active and latent representations of a single item in different formats (i.e., non-sensory vs. sensory-like). Moreover, unlike our study, the impulse did not evoke sensory-like neural patterns during memory retention (Wolff et al., 2017). These observations suggest that the cognitive and neural processes underlying preparatory attention and working memory maintenance could very well diverge. Future studies are necessary to delineate the relationship between these functions both at the behavioral and neural level.”

      (4) If I understand correctly, the only ROI that showed a significant difference for the crosstask generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      We thank the reviewer for this comment. We would like to clarify that our analyses revealed similar patterns of preparatory attentional representations in V1 and EVC. During the Ping session, the cross-task generalization analyses revealed decodable information in both V1 and EVC (ps < 0.001), significantly higher than that in the No-Ping session for V1 (independent t-test: t(38) = 3.145, p = 0.003; Cohen’s d = 0.995) and EVC (independent t-test: t(38) = 2.153, p = 0.038, Cohen’s d = 0.681) (Page 10; Line 194-196). While both areas maintained similar representations, additional measures (Mahalanobis distance, neural-behavior relationship and connectivity changes) showed more robust ping-evoked changes in V1 compared to EVC. This differential pattern likely reflects the primary role of V1 in orientation processing, with EVC showing a similar but weaker response profile. We have revised the text to clarity this point (Page 16; Line 327-329).

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      We thank the reviewer for this comment. The reviewer proposed an alternative account suggesting that visual pings may function to refocus attention, rather than reactivate latent information during the preparatory period. If this account holds (i.e., attention became weaker in the no-ping condition and it was strengthened by the ping due to re-focusing), we would expect to observe a general enhancement of attentional decoding during the preparatory period. However, our data reveal no significant differences in overall attention decoding between two conditions during this period (ps > 0.519; BF<sub>excl</sub> > 3.247), arguing against such a possibility.

      The reviewer also raised an interesting question about whether an auditory tone during preparation could produce effects similar to those observed with visual pings. Although our study did not directly test this possibility, existing literature provides some relevant evidence. In particular, prior studies have shown that latent visual working memory contents are selectively reactivated by visual impulses, but not by auditory stimuli (Wolff et al., 2020). This finding supports the modality-specificity for visually encoded contents, suggesting that sensory impulses must match the representational domain to effectively access latent visual information, which also argues against the refocusing hypothesis above. However, we do think that this is an important question that merits direct investigation in future studies. We now added the related discussion on this point in the revised texts (Page 10, Line 202-203; Page 19, Line 392395).

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      We thank the reviewer for this comment. We took the reviewer’s suggestion to explore the relationship between attentional modulation index (AMI) and RTs across participants for each session (see Figure 3). In the No-Ping session, we observed no significant correlation between AMI and RT (r = -0.366, p = 0.113). By contrast, the same analysis in the Ping condition revealed a significantly negative correlation (r = -0.518, p = 0.019). These results indicate that the attentional modulations evoked by visual impulse was associated with faster RTs, supporting the functional relevance of activating sensory-like representations during preparation. We have now included these inter-subject correlations in the main texts (Page 13, Line 258-264; Fig 3D and 3E) along with within-subject correlations in the Supplementary Information (Page 6, Line, 85-98; S3 Fig).

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      We thank the reviewer for this comment regarding the mechanistic basis of visual pings. We agree that this warrants deeper discussion. One possibility, as informed by theoretical studies of working memory, is that the sensory-like template could be maintained via an “activity-silent” mechanism through short-term changes in synaptic weights (Mongillo et al., 2008). In this framework, a visual impulse may function as nonspecific inputs that momentarily convert latent traces into detectable activity patterns (Rademaker & Serences, 2017). Related to our findings, it is unlikely that the orientation-specific templates observed during the Ping session emerged from purely non-sensory representations and were entirely induced by an exogenous ping, which was devoid of any orientation signal. Instead, the more parsimonious explanation is that visual impulse reactivated pre-existing latent sensory signals. To our knowledge, the detailed circuit-level mechanism of such reactivation is still unclear; existing evidence only suggests a relationship between ping-evoked inputs and the neural output (Wolff et al., 2017; Fan et al., 2021; Duncan et al., 2023). We now included the discussion on this point in the main texts (Page 19, Line 383-401).

      Reviewer #2 (Public review):

      (1) The origin of the latent sensory-like representation. By 'pinging' the neural activity with a high-contrast, task-irrelevant visual stimulus during the preparation period, the authors identified the representation of the attentional feature target that contains the same information as perceptual representations. The authors interpreted this finding as a 'sensory-like' template is inherently hosted in a latent form in the visual system, which is revealed by the pinging impulse. However, I am not sure whether such a sensory-like template is essentially created, rather than revealed, by the pinging impulses. First, unlike the classical employment of the pinging technique in working memory studies, the (latent) representation of the memoranda during the maintenance period is undisputed because participants could not have performed well in the subsequent memory test otherwise. However, this appears not to be the case in the present study. As shown in Figure 1C, there was no significant difference in behavioral performance between the ping and the no-ping sessions (see also lines 110-125, pg. 5-6). In other words, it seems to me that the subsequent attentional task performance does not necessarily rely on the generation of such sensory-like representations in the preparatory period and that the emergence of such sensory-like representations does not facilitate subsequent attentional performance either. In such a case, one might wonder whether such sensory-like templates are really created, hosted, and eventually utilized during the attentional process. Second, because the reference orientations (i.e. 45 degrees and 135 degrees) have remained unchanged throughout the experiment, it is highly possible that participants implicitly memorized these two orientations as they completed more and more trials. In such a case, one might wonder whether the 'sensory-like' templates are essentially latent working memory representations activated by the pinging as was reported in Wolff et al. (2017), rather than a functional signature of the attentional process.

      We thank the reviewer for this comment. We agree that the question of whether the sensory-like template is created or merely revealed by visual pinging is crucial for the understanding our findings. First, we acknowledge that our task may not be optimized for detecting changes in accuracy, as the task difficulty was controlled using individually adjusted thresholds (i.e., angular difference). Nevertheless, we observed some evidence supporting the neural-behavioral relationships. In particular, the impulse-driven sensory-like template in V1 contributed to facilitated faster RTs during stimulus selection (Page 12, Fig. 3D and 3E in the main texts; also see our response to R1, Point 6).

      Second, the reviewer raised an important concern about whether the attended feature might be stored in the memory system due to the trial-by-trial repetition of attention conditions (attend 45º or attend 135º). Although this is plausible, we don’t think it is likely. We note that neuroimaging evidence shows that attended working memory contents maintain sensory-like representations in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019), with generalizable neural activity patterns from perception to working memory delay-period, whereas unattended items in multi-item working memory tasks are stored in a latent state for prospective use (Wolff et al., 2017). Importantly, our task only required maintaining a single attentional template at a time. Thus, there was no need to store it via latent representations, if participants simply used a working memory mechanism for preparatory attention. Had they done so, we should expect to find evidence for a sensory template, i.e., generalizable neural pattern between perception and preparation in the No-Ping condition, which was not what we found. We have mentioned this point in the main texts (Page 18, Line 367-372).

      (2) The coexistence of the two types of attentional templates. The authors interpreted their findings as the outcome of a dual-format mechanism in which 'a non-sensory template' and a latent 'sensory-like' template coexist (e.g. lines 103-106, pg. 5). While I find this interpretation interesting and conceptually elegant, I am not sure whether it is appropriate to term it 'coexistence'. First, it is theoretically possible that there is only one representation in either session (i.e. a non-sensory template in the no-ping session and a sensory-like template in the ping session) in any of the brain regions considered. Second, it seems that there is no direct evidence concerning the temporal relationship between these two types of templates, provided that they commonly emerge in both sessions. Besides, due to the sluggish nature of fMRI data, it is difficult to tell whether the two types of templates temporally overlap.

      We thank the reviewer for the comment regarding our interpretation of the ‘coexistence’ of non-sensory and sensory-like attentional template. While we acknowledge the limitations of fMRI in resolving temporal relationships between these two types of templates, several aspects of our data support a dual-format interpretation.

      First, our key findings remained consistent for the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. It thus seems improbable that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the task-irrelevant, uninformative visual impulse. Second, while we agree with the reviewer that the temporal dynamics between these two templates remain unclear, it is difficult to imagine that orientation-specific templates observed during the Ping session emerged de novo from a purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? It seems to us that the more parsimonious explanation is that there was already some orientation signal in a latent format, and it was activated by the ping, in line with the models of “activity-silent” working memory. To address these concerns, we have added the related discussion of these alternative interpretations in the main texts (Page 19, Line 387-391)

      (3) The representational distance. The authors used Mahalanobis distance to quantify the similarity of neural representation between different conditions. According to the authors' hypothesis, one would expect greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session. However, this appears not to be the case. As shown in Figures 3B and C, there was no major difference in Mahalanobis distance between the two sessions in either ROI and the authors did not report a significant main effect of the session in any of the ANOVAs. Besides, in all the ANOVAs, the authors reported only the statistic term corresponding to the interaction effect without showing the descriptive statistics related to the interaction effect. It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective and intuitive understanding of their data.

      We thank the reviewer for this comment. We expected greater pattern similarity between 'attend leftward' and 'perceived leftward' in the Ping session in comparison to the Noping session. This prediction was supported by a significant three-way interaction effect between session × attended orientation × perceived orientation (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116). In particular, there was a significant interaction between attended orientation × perceived orientation (F(1,19) = 9.335, p = 0.007, η<sub>p</sub><sup>2</sup> = 0.329) in the Ping session, but not in the No-Ping session (F(1,19) = 0.017, p = 0.898, η<sub>p</sub><sup>2</sup> = 0.001). These above-mentioned statistical results were reported in the original texts. In addition, this three-way mixed ANOVA (session × attended orientation × perceived orientation) on Mahalanobis distance in V1 revealed no significant main effects (session: F(1,38) = 0.009, p = 0.923, η<sub>p</sub><sup>2</sup> < 0.001; attended orientation: F(1,38) = 0.116, p = 0.735, η<sub>p</sub><sup>2</sup> = 0.003; perceived orientation: (F(1,38) = 1.106, p = 0.300, η<sub>p</sub><sup>2</sup> = 0.028). We agree with the reviewer that a complete reporting of analyses enhances understanding of the data. Therefore, we have now included the main effects in the main texts (Page 11, Line 233).

      We thank the reviewer for the suggestion regarding the inclusion of descriptive statistics for interaction effects. However, since the data were already visualized in Fig. 3B and 3C in the main texts, to maintain conciseness and consistency with the reporting style of other analyses in the texts, we have opted to include these statistics in the Supplementary Information (Page 5, Table 1).

      Reviewer #3 (Public review):

      (1) The title is "Dual-format Attentional Template," yet the supporting evidence for the nonsensory format and its guiding function is quite weak. The author could consider conducting further generalization analysis from stimulus selection to preparation stages to explore whether additional information emerges.

      We thank the reviewer for this comment. Our approach to investigate whether preparatory attention is encoded in sensory or non-sensory format - by training classifier using separate runs of perception task – closely followed methods from previous studies (Stokes et al., 2009; Peelen et al., 2011; Kok et al., 2017). Following the reviewer’s suggestion, we performed generalization analyses by training classifiers on activity during the stimulus selection period and testing them preparatory activity. However, we observed no significant generalization effects in either No-Ping and Ping sessions (ps > 0.780). This null result may stem from a key difference in the neural representations: classifiers trained on neural activity from stimulus selection period necessarily encode both target and distractor information, thus relying on somewhat different information than classifier trained exclusively on isolated target information in the perception task.

      (2) In Figure 2, the author did not find any decodable sensory-like coding in IPS and PFC, even during the impulse-driven session, indicating that these regions do not represent sensory-like information. However, in the final section, the author claimed that the impulse-driven sensorylike template strengthens informational connectivity between sensory and frontoparietal areas. This raises a question: how can we reconcile the lack of decodable coding in these frontoparietal regions with the reported enhancement in network communication? It would be helpful if the author provided a clearer explanation or additional evidence to bridge this gap.

      We thank the reviewer for this comment. We would like to clarity that although we did not observe sensory-like coding during preparation in frontoparietal areas, we did observe attentional signals in these regions, as evidenced by the above-chance within-task attention decoding performance (Fig. 2 in the main texts). This could reflect different neural codes in different areas, and suggests that inter-regional communication does not necessarily require identical representational formats. It seems plausible that the representation of a non-sensory attentional template in frontoparietal areas supports top-down attentional control, consistent with theories suggesting increasing abstraction as the cortical hierarchy ascends (Badre, 2008; Brincat et al., 2018), and their interaction with the sensory representation in the visual areas is enhanced by the visual impulse.

      (3) Given that the impulse-driven sensory-like template facilitated behavior, the author proposed that it might also enhance network communication. Indeed, they observed changes in informational connectivity. However, it remains unclear whether these changes in network communication have a direct and robust relationship with behavioral improvements.

      We thank the reviewer for the suggestion. To examine how network communication relates to behavior, we performed a correlation analysis between information connectivity (IC) and RTs across participants (see Figure S5). We observed a trend of correlations between V1-PFC connectivity and RTs in the Ping session (r = -0.394, p = 0.086), but not in the NoPing session (r = -0.046, <i.p\</i> = 0.846). No significant correlations were found between V1-IPS and RTs (\ps\ > 0.400) or between ICs and accuracy (ps > 0.399). These results suggests that ping-enhanced connectivity might contributed to facilitated responses. Although we may not have sufficient statistical power to warrant a strong conclusion, we think this result is still highly suggestive, so we now added the texts in the Supplementary Information (Page 8, Line 116121; S5 Fig) and mentioned this result in the main texts (Page 14, Line 292-293).

      (4) I'm uncertain about the definition of the sensory-like template in this paper. Is it referring to the Ping impulse-driven condition or the decodable performance in the early visual cortex? If it is the former, even in working memory, whether pinging identifies an activity-silent mechanism is currently debated. If it's the latter, the authors should consider whether a causal relationship - such as "activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas" - is reasonable.

      We apologize for the confusions. The sensory-like template by itself does not directly refer to representations under Ping session or the attentional decoding in early visual cortex. Instead, it pertains to the representational format of attentional signals during preparation. Specifically, its existence is inferred from cross-task generalization, where neural patterns from a perception task (perceive 45º or perceive 135º) generalize to an attention task (attend 45 º or attend 135º). We think this is a reasonable and accepted operational definition of the representational format. Our findings suggest that the sensory-like template likely existed in a latent state and was reactivated by visual pings, aligning more closely with the first account raised by the reviewer.

      We agree with the reviewer that whether ping identifies an activity-silent mechanism is currently debated (Schneegans & Bays, 2017; Barbosa et al., 2021). It is possible that visual impulse amplified a subtle but active representation of the sensory template during attentional preparation and resulted in decodable performance in visual cortex. Distinguishing between these two accounts likely requires neurophysiological measurements, which are beyond the scope of the current study. We have explicitly addressed this limitation in our Discussion (Page 19, Line 395-399).

      Nevertheless, the latent sensory-like template account remains plausible for three reasons. First, our interpretation aligns with theoretical framework proposing that the brain maintains more veridical, detailed target templates than those typically utilized for guiding attention (Wolfe, 2021; Yu et al., 2023). Second, this explanation is consistent with the proposed utility of latent working memory for prospective use, as maintaining a latent sensory-like template during preparation would be useful for subsequent stimulus selection. The latter point was further supported by the reviewer’s suggestion about whether “activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas is reasonable”. Our additional analyses (also refer to our response to Reviewer 3, Point 3) suggested that impulse-enhanced V1-PFC connectivity was associated with a trend of faster behavioral responses (r = -0.394, p = 0.086; see Supplementary Information, Page 8, Line 116-121; S5 Fig). Considering these findings in totality, we think it is reasonable to suggest that visual impulse may strengthen information flow among areas to enhance attentional control.

      Recommendation for the Authors:

      Reviewer #1 (Recommendation for the authors):

      I hate to suggest another fMRI experiment, but in order to make strong claims about two states, I would want to see the methodological and interpretation confounds addressed. Ping condition - would a tone lead to the same result of sharpening the template? If so, then why? Can a ping be manipulated in its effectiveness? That would be an excellent manipulation condition.

      We thank the reviewer for the comments. Please refer to our reply to Reviewer 1, Point 5 for detailed explanation.

      Reviewer #2 (Recommendation for the authors):

      It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective understanding of their data.

      We thank the reviewer for the comments. We now included the relevant descriptive statistics in the Supplementary Information, Table 1.

      Reviewer #3 (Recommendation for the authors):

      In addition to p-values, I see many instances of 'ps'. Does this indicate the plural form of p?

      We used ‘ps’ to denote the minimal p-value across multiple statistical analyses, such as when applying identical tests to different region groups.

      References

      Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023.

      Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193-200.

      Barbosa, J., Lozano-Soldevilla, D., & Compte, A. (2021). Pinging the brain with visual impulses reveals electrically active, not activity-silent, working memories. PLoS Biology, 19(10), e3001436.

      Battistoni, E., Stein, T., & Peelen, M. V. (2017). Preparatory attention in visual cortex. Annals of the New York Academy of Sciences, 1396(1), 92-107.

      Brincat, S. L., Siegel, M., von Nicolai, C., & Miller, E. K. (2018). Gradual progression from sensory to task-related processing in cerebral cortex. Proceedings of the National Academy of Sciences, 115(30), E7202-E7211.

      Duncan, D. H., van Moorselaar, D., & Theeuwes, J. (2023). Pinging the brain to reveal the hidden attentional priority map using encephalography. Nature Communications, 14(1), 4749.

      Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuroscience, 27(1), 649-677.

      Gong, M., Chen, Y., & Liu, T. (2022). Preparatory attention to visual features primarily relies on nonsensory representation. Scientific Reports, 12(1), 21726.

      Fan, Y., Han, Q., Guo, S., & Luo, H. (2021). Distinct Neural Representations of Content and Ordinal Structure in Auditory Sequence Memory. Journal of Neuroscience, 41(29), 6290–6303.

      Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.

      Jigo, M., Gong, M., & Liu, T. (2018). Neural determinants of task performance during feature-based attention in human cortex. eNeuro, 5(1).

      Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evoke stimulus templates in the primary visual cortex. Journal of Cognitive Neuroscience, 26(7), 1546-1554.

      Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473-10478.

      Liu, T., Stevens, S. T., & Carrasco, M. (2007). Comparing the time course and efficacy of spatial and feature-based attention. Vision Research, 47(1), 108-113.

      Mongillo, G., Barak, O., & Tsodyks, M. (2008). Synaptic theory of working memory. Science, 319(5869), 1543-1546.

      Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125-12130. Priebe, N. J. (2016). Mechanisms of orientation selectivity in the primary visual cortex. Annual Review of Vision Science, 2(1), 85-107.

      Rademaker, R. L., & Serences, J. T. (2017). Pinging the brain to reveal hidden memories. Nature Neuroscience, 20(6), 767-769.

      Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336-1344.

      Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207-214.

      Schneegans, S., & Bays, P. M. (2017). Restoration of fMRI decodability does not imply latent working memory states. Journal of Cognitive Neuroscience, 29(12), 1977-1994.

      Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences, 106(46), 19569-19574.

      Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060-1092.

      Wolff, M. J., Jochim, J., Akyürek, E. G., & Stokes, M. G. (2017). Dynamic hidden states underlying working-memory-guided behavior. Nature Neuroscience, 20(6), 864 – 871.

      Wolff, M. J., Kandemir, G., Stokes, M. G., & Akyürek, E. G. (2020). Unimodal and bimodal access to sensory working memories by auditory and visual impulses. Journal of Neuroscience, 40(3), 671-681.

      Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E., & Geng, J. J. (2023). Good-enough attentional guidance. Trends in Cognitive Sciences, 27(4), 391-403.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.

      The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.

      The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle-aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.

      The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.

      I only have two questions/concerns about the specific methodologies used:

      (1) Synapse counts were made only at the 3 kHz place on the cochlea. However, the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?

      Thank you for raising this important point. The frequency regions that contribute to the generation of EFRs, especially at the suprathreshold sound levels presented here are expected to be broad, with a greater leaning towards higher frequencies and reaching up to one octave above the center frequency. We have investigated this phenomenon in earlier published articles using both low/high pass masking noise and computational models using data from rodent models and humans (Encina-Llamas et al. 2017; Parthasarathy, Lai, and Bartlett 2016). So, the expectation here is that the EFRs reflect a wider frequency region centered at 3 kHz. The difference in cochlear activation regions between humans and gerbils for EFRs have not been systematically studied to our knowledge but given the general agreement between humans and other rodent models stated above, we expect this to be similar to gerbils as well. Additionally, all current evidence points to cochlear synapse loss with age being flat across frequencies, in contrast to cochlear synapse loss with noise which is dependent on the bandwidth of the noise exposure.

      Histological evidence for this flat loss across frequencies is found in mice and human temporal bones (Parthasarathy and Kujawa 2018; Sergeyenko et al. 2013; Wu et al. 2018). We find this to be true in our gerbils as well. Author response image 1 shows the patterns of synapse loss as a function of cochlear place. We focused on synapse loss at 3 kHz to keep the analysis focused on the center frequency of the stimulus and minimize compounding errors due to averaging synapse counts across multiple frequency regions. We have now added some explanatory language in the discussion.

      Author response image 1.

      Cochlear synapse counts per inner hair cell (IHC) in young and middle-aged gerbils as a function of cochlear frequency.

      (2) Unless I misunderstood, the predictive power of the final model was not tested on heldout data. The standard way to fit and test such a model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only split was for training and hyperparameter optimization.

      The goal of the analysis in this current manuscript was inference, rather than prediction, i.e., to find the important/significant variables that contribute to speech intelligibility in noise, rather than predicting the behavioral deficit of speech performance in a yet-unforeseen sample of adults.

      Additionally, we used a repeated 10-fold cross-validation approach for our model building exercise as detailed in the Elastic Net Regression section of the methods. This repeated-cross validation calculated the mean square error on a held-out fold and average it repeatedly to reduce the inherent variability of randomly choosing a validation set. The repeated 10-fold CV approach is both more stable and efficient compared to a validation set approach, or splitting the data into two segments: training and test, and provides a better estimate of the test error by utilizing more observations for training (vide Chapter 5,(James et al. 2021). These predictive MSEs along with the R-squared for the final model give us a good idea of the predictive performance, as, for the linear model the R-squared is the correlation between the observed and the predicted response. Future studies with a larger sample size can facilitate having a designated test set and still have enough statistical power to perform predictive analyses.

      While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixing these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us? I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.

      Studies with CND have so far been largely inferential in humans, since currently we cannot confirm CND in vivo. Hence any measures of putative CND in humans can only be interpreted based on evidence from other animal studies. Our translational approach is partly meant to address this deficit, as mentioned in the Introduction section. By using identical stimuli, recording, acquisition and analysis parameters we hope to reduce some of the variability that may be associated with this inference between human and other animal models. Until direct measurements of CND in humans are possible, the intended goal is to provide diagnostic biomarkers that have face validity – i.e., that explain variance related to speech intelligibility deficits in this population.

      We’ve added more to the discussion to state that our work demonstrates the need for next generation diagnostic measures of auditory processing that incorporate cognitive factors associated with listening effort to better capture speech in noise perceptual abilities.

      That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.

      We agree that pre-determining sample sizes is the optimal approach towards designing a study. The sample sizes here were chosen a priori based on previously published data in young adults with normal hearing thresholds (McHaney et al. 2024; Parthasarathy et al. 2020). With the lack of published literature especially for the EFRs at 1024Hz AM in middle aged adults, there are practical challenges in pre-determining the sample size (given a prefixed power and an effect size) with limited precursors to supply good estimates of the parameters (e.g., mean, s.d. for each age group for a two-sample test). We hope that this data set now shared will enable us and other researchers to conduct power analyses for successive studies that use similar metrics on this population.

      Several authors, including Heinsburg and Weeks (2022) argue that post-hoc power could be “misleading and simply not informative” and encourage using other indicators of poorly powered studies such as the width of the confidence interval. Since the elastic net estimate is a non-linear and non-differentiable function of the response values—even for fixed tuning parameters—it is difficult to obtain an accurate estimate of its standard error (Tibshirani and Taylor 2012). While acknowledging the limitations of post-hoc power analyses, we performed a retrospective power calculation for our linear model with the predictors that we selected (EFR @ 1024Hz, Pupil slope for QuickSIN at selected SNRs and analyses windows, and PTA). The calculated Cohen’s effect size was 0.56, which is considered large (Cohen 2013). With this effect size, a power analysis with our sample size revealed a very high retrospective power of 0.99 with a significance level of 0.05. The minimum number of subjects needed to get 80% power with this effect size was N = 21. Hence for the final model, we are confident that our results hold true with adequate statistical power.

      So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.

      Thank you for your comments. We hope that this study establishes a framework for the eventual development of the next generation of objective diagnostics tests in the hearing clinic that provide insights into the underlying neurophysiology of the auditory pathway and take into effect top-down contributors such as listening effort.

      Reviewer #2 (Public review):

      Summary:

      This paper addresses the bottom-up and top-down causes of hearing difficulties in middleaged adults with clinically-normal audiograms using a cross-species approach (humans vs. gerbils, each with two age groups) mixing behavioral tests and electrophysiology. The study is not only a follow-up of Parthasarathy et al (eLife 2020), since there are several important differences.

      Parthasarathy et al. (2020) only considered a group of young normal-hearing individuals with normal audiograms yet with high complaints of hearing in noisy situations. Here, this issue is considered specifically regarding aging, using a between-subject design comparing young NH and older NH individuals recruited from the general population, without additional criterion (i.e. no specifically high problems of hearing in noise). In addition, this is a cross-species approach, with the same physiological EFR measurements with the same stimuli deployed on gerbils.

      This article is of very high quality. It is extremely clear, and the results show clearly a decrease of neural phase-locking to high modulation frequencies in both middle-aged humans and gerbils, compared to younger groups/cohorts. In addition, pupillometry measurements conducted during the QuickSIN task suggest increased listening efforts in middle-aged participants, and a statistical model including both EFRs and pupillometry features suggests that both factors contribute to reduced speech-in-noise intelligibility evidenced in middle-aged individuals, beyond their slight differences in audiometric thresholds (although they were clinically normal in both groups).

      These provide strong support to the view that normal aging in humans leads to auditory nerve synaptic loss (cochlear neural degeneration - CNR- or, put differently, cochlear synaptopathy) as well as increased listening effort, before any clearly visible audiometric deficits as defined in current clinical standards. This result is very important for the community since we are still missing direct evidence that cochlear synaptopathy might likely underlie a significant part of hearing difficulties in complex environments for listeners with normal thresholds, such as middle-aged and senior listeners. This paper shows that these difficulties can be reasonably well accounted for by this sensory disorder (CND), but also that listening effort, i.e. a top-down factor, further contributes to this problem. The methods are sound and well described and I would like to emphasize that they are presented concisely yet in a very precise manner so that they can be understood very easily - even for a reader who is not familiar with the employed techniques. I believe this study will be of interest to a broad readership.

      I have some comments and questions which I think would make the paper even stronger once addressed.

      Main comments:

      (1) Presentation of EFR analyses / Interpretation of EFR differences found in both gerbils and humans:

      a) Could the authors comment further on why they think they found a significant difference only at the highest mod. frequency of 1024 Hz in their study? Indeed, previous studies employing SAM or RAM tones very similar to the ones employed here were able to show age effects already at lower modulation freqs. of ~100H; e.g. there are clear age effects reported in human studies of Vasilikov et al. (2021) or Mepani et al. (2021), and also in animals (see Garrett et al. bioXiv: https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.p df).

      Previously published studies in animal models by us and others suggests that EFRs elicited to AM rates > 700Hz are most sensitive to confirmed CND (Parthasarathy and Kujawa 2018; Shaheen, Valero, and Liberman 2015). This is likely because these AM rates fall well outside of phase-locking limits in the auditory midbrain and cortex (Joris, Schreiner, and Rees 2004), and hence represent a ‘cleaner’ signal from the auditory periphery that may not be modulated by complex excitatory/inhibitory feedback circuits present more centrally (Caspary et al. 2008). We have also demonstrated that we are able to acquire high quality EFRs at 1024Hz AM rates both in a previously published study in young normal hearing adults (McHaney et al. 2024), and in middle aged adults in the present study as seen in Fig. 1 H-J. We posit that the lack of age-related differences at the lower AM rates may be indicative of compensatory plasticity with age (central ‘gain’) that occurs with age in more central regions of the auditory pathway (Auerbach, Radziwon, and Salvi 2019; Parthasarathy and Kujawa 2018). We now expand on this in the discussion. A secondary reason for the lack of change in slower modulation rates may be the difference in stimulus between sinusoidally amplitude modulated tones used here, and the rectangular amplitude modulated tones in other studies, as discussed in response to the comment below.

      Furthermore, some previous EEG experiments in humans that SAM tones with modulation freqs. of ~100Hz showed that EFRs do not exhibit a single peak, i.e. there are peaks not only at fm but also for the first harmonics (e.g. 2fm or 3fm) see e.g.Garrett et al. bioXiv https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pd f. Did the authors try to extract EFR strength by looking at the summed amplitude of multiple peaks (Vasilikov Hear Res. 2021), in particular for the lower modulation frequencies? (indeed, there will be no harmonics for the higher mod. freqs).

      We examined peak amplitudes for the AM rate and harmonics for the 110 Hz AM condition as shown in Author response image 2. The quantified amplitudes of the first four harmonics did not differ with age (ps > .08).

      Additionally, the harmonic structures obtained were also not as robust as would be expected with rectangular amplitude modulated stimuli. The choice of sinusoidal modulation may explain why. We have previously published studies systematically modulating the rise time of the envelope per cycle in amplitude modulated tones, where the individual period of the envelope is described by Env (t) = t<sup>x</sup> (1-t), where t goes from 0 to 1 in one period, and where x = 0.05 represents a highly damped envelope akin to the rising envelope f a rectangular modulation, and x = 1 representing a symmetric, near-sinusoidal envelope (Parthasarathy and Bartlett 2011). The harmonic structure was much more developed in the damped envelopes compared to the symmetric envelopes and response amplitudes were also higher for the damped envelopes overall, a result also observed in Mepani et. al., 2021. Hence, we believe the rapid rise time may contribute to the harmonic structures evidenced in studies using RAM stimuli, and the absence of this rapid onset may result in reduced harmonic structures in our EFRs. Some language regarding this issue is now added to the discussion.

      Author response image 2.

      Harmonics analysis for the first four harmonics of envelope following responses elicited to the 110Hz AM stimulus.

      b) How do the present EFR results relate to FFR results, where effects of age are already at low carrier freqs? (e.g. Märcher-Rørsted et al., Hear. Res., 2022 for pure tones with freq < 500 Hz). Do the authors think it could be explained by the fact that this is not the same cochlear region, and that synapses die earlier in higher compared to lower CFs? This should be discussed. Beyond the main group effect of age, there were no negative correlations of EFRs with age in the data?

      We believe the current results are in close agreement with these studies showing deficits in pure tone phase locking with age. These tones are typically at ~300-500Hz or above, and phase locking to these tones likely involves the same or similar peripheral neural generators in the auditory nerve and brainstem. Emerging evidence also seems to suggest that TFS coding measured using pure tone phase locking is closely related to sound with amplitude modulation in the same range (Ponsot et al. 2024). Unpublished observations from our lab support this view as well. In this data set, we begin to see EFR responses at 512 Hz diverge with age, but this difference does not reach statistical significance. This may be due to specific AM frequencies selected or a lack of statistical power. Using more continuous AM frequency sweeps such as with our recently published dynamic amplitude modulated tones (Parida et al. 2024) may help resolve these AM frequency specific challenges and help us investigate changes over a broader range of AM frequencies. Ongoing studies are currently exploring this hypothesis. Some explanatory language is now presented in the discussion.

      (2) Size of the effects / comparing age effects between two species:

      Although the size of the age effect on EFRs cannot be directly compared between humans and gerbils - the comparison remains qualitative - could the authors at least provide references regarding the rate of synaptic loss with aging in both humans and gerbils, so that we understand that the yNH/MA difference can be compared between the two age groups used for gerbils; it would have been critical in case of a non-significant age effect in one species.

      Current evidence seems to suggest that humans have more synaptic loss than gerbils, though exact comparison of lifespan between the two species is challenging due to differences in slopes of growth trajectories between species. Post-mortem temporal bone studies demonstrate a ~40-50% loss of synapses in humans by the fifth decade of life. On the other hand, our gerbils in the current study showed approximately 15-20% loss. Based on our findings and previous studies, it is reasonable to assume that our gerbil data underestimate the temporal processing deficits that would be seen in humans due to CND.

      We have added this information and citations to the discussion section.

      Equalization/control of stimuli differences across the two species: For measuring EFRs, SAM stimuli were presented at 85 dB SPL for humans vs. 30 dB above the detection threshold (inferred from ABRs) for gerbils - I do not think the results strongly depend on this choice, but it would be good to comment on why you did not choose also to present stimuli 30 dB above thresholds in humans.

      We chose to record EFRs to stimuli presented at 85 dB SPL in humans, as opposed to 30 dB SL, because 30 dB SL in humans would have corresponded to an intensity that makes EEG recordings unfeasible. The average PTA across younger and middle-aged adults was 7.51 dB HL (~19.51 dB SPL), which would have resulted in an average stimulus intensity of ~50 dB SPL at 30 dB SL. This intensity level would have been far too low to reliably record EFRs without presenting many thousands of trials. In a pilot study, we recorded EFRs at 75 dB SL, which equated to an average of 83.9 dB SPL. Thus, we chose the suprathreshold level of 85 dB SPL for the current study to obtain reliable responses with just 1000 trials.

      Simulations of EFRs using functional models could have been used to understand (at least in humans) how the differences in EFRs obtained between the two groups are quantitatively compatible with the differences in % of remaining synaptic connections known from histopathological studies for their age range (see the approach in Märcher-Rørsted et al., Hear. Res., 2022)

      We agree with the reviewer that phenomenological models would be a useful approach to examining differences between age groups and species. We have previously used the Zilany/Carney model to examine differences in EFRs with age in rats (Parthasarathy, Lai, and Bartlett 2016). It is unclear if such models will directly translate to responses form gerbils. However, this is a subject of ongoing study in our lab.

      (3) Synergetic effects of CND and listening effort:

      Could you test whether there is an interaction between CND and listening effort? (e.g. one could hypothesize that MA subjects with the largest CND have also higher listening effort).

      We have previously reported that EFRs and listening effort are not linearly related (McHaney et al. 2024). We found the same to be largely true in the current study as well. We ran correlations between EFR amplitudes at 1024 Hz and listening effort at each SNR level in the listening and integrations windows. We did not observe any significant relationships between EFRs at 1024 Hz and listening effort in the listening window (all ps > .05). In the integration window, we did see a significant correlation between listening effort at SNR 5 and EFRs at 1024 Hz, which was significant after correcting for multiple comparisons (r = -.42, p-adj = .021). However, we chose to not report these multiple oneto-one correlations in the current study and instead opted for the elastic net regression analysis to better understand the multifactorial contributions to speech-in-noise abilities. These results also do not preclude non-linear relationships between listening effort and EFRs which may be present based on emerging results (Bramhall, Buran, and McMillan 2025), and will be explored in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few more minor comments/questions:

      (1) How old were the YA gerbils on average? 18 weeks, or 19 weeks, or 22 weeks?

      Young gerbils were on average 22 weeks. We have updated the manuscript accordingly.

      (2) "Gerbils share the same hearing frequency range as humans" is misleading; the gerbil hearing range extends to much higher frequencies.

      We have revised the statement to say: “The hearing range of gerbils largely overlaps with that of humans, making them an ideal animal model for direct comparison in crossspecies studies.”

      (3) The writing contains more than a few typos and grammatical errors.

      We have completed a thorough revision to correct for grammatical and typographical errors.

      (4) Suggesting that correlation and linear modelling are "independent" methods is misleading since they are both measuring linear associations. A better word would be "different".

      Thank you for this suggestion. We have rephrased the sentence as “two separate approaches”

      (5) The phrase "Our results reveal perceptual deficits ... driven by CND" in the abstract is too strong. Correlation is not causation.

      We have revised this phrase to say they “are associated with CND.”

      Reviewer #2 (Recommendations for the authors):

      More general comments:

      (1) Recruitment criterion related to hearing-in-noise difficulties:

      If I understood correctly, the middle-aged participants recruited for this study do not have specific hearing in noise difficulties, some could, as with 10% in the general population, but they were not recruited using this criterion. If this is correct, this should be stated explicitly, as it constitutes an important methodological choice and a difference with your eLife 2020 study. If you were to use this specific recruitment criterion for both groups here, what differences would you expect?

      Our participants were not required to have specific complaints of speech perception in noise challenges to be eligible for this study. We included middle-aged adults here, as opposed to only younger adults as in Parthasarathy et al. (2020), with the assumption that middle-aged adults were likely to have some cochlear synapse loss and individual variability in the degree of synapse loss based on post-mortem data from human temporal bones. We have recently published studies identifying the specific clinical populations of patients with self-perceived hearing loss, including those adults who have received assessments for auditory processing disorders (Cancel et al. 2023). Ongoing studies in the lab are aimed at recruiting from this population.

      It is striking here that the QuickSIN test does not exhibit the same variability at low SNRS here as with the digits-in-noise used in your eLife 2020 study. Why would QuickSIN more appropriate than the Digits-in-noise test? Would you expect the same results with the Digits-in-noise test?

      Our 2020 eLife study investigated the effects of TFS coding in multi-talker speech intelligibility. TFS coding is specifically hypothesized to be related to multi-talker speech, compared to broadband maskers. The digits test was appropriate in that context as the ‘masker’ there was two competing speakers also speaking digits. In this study, we wanted to test the effects of CND on speech in noise perception using clinically relevant speech in noise tests. The Digits test is devoid of linguistic context and is essentially closed set (participants know that only a digit will be presented). However, QuickSIN consists of open set sentences of moderate context, making it closer to real world listening situations. Additionally, we recently published pupillometry recorded in response to QuickSIN in young adults ((McHaney et al. 2024) and identified QuickSIN as a promising screening tool for self-perceived hearing difficulties (Cancel et al. 2023). These factors informed our choice of using QuickSIN in the current study.

      (2) Why is the increase in listening effort interpreted as an increase in gain? please clarify (p10, 1st paragraph; [these data suggest a decrease in peripheral neural coding, with a concomitant increase in central auditory activity or 'gain'])

      In the above referenced paragraph, we were discussing the increase in 40 Hz AM rate EFRs in middle-aged adults as an increase in central gain. We have revised parts of this paragraph to better communicate that we were discussing the EFRs and not listening effort: “We observed decreases in EFRs at modulation rates that were selective to the auditory periphery (i.e., 1024 Hz) in middle-aged adults, while EFRs primarily generated from the central auditory structures were not different from those in younger adults (Fig. 1K). These data suggest that middle-aged adults exhibited an increase in central auditory activity, or ‘gain’, in the presence of decreased peripheral neural coding. The perceptual consequences of this gain are unclear, but our findings align with emerging evidence suggesting that gain is associated with selective deficits in speech-in-noise abilities”

      (3) Further discussion on the relationship/differences between markers EFR marker of CND (this study) and MEMR marker of CND(Bharadwaj et al., 2022) is needed.

      We now make mention of other candidate markers of CND (ABR wave I and MEMRs) in the discussion and expand on why we chose the EFR.

      (4) Further analyses and discussion would be needed to be related to extended high-freq thresholds:

      Did you test for a potential correlation of your EFR marker of CND with extended high-freq. thresholds ? (could be paralleling the amount of CND in these individuals) Why won't you also consider measuring extended HF in Gerbils?

      We acknowledge that there is increasing evidence to suggest extended high frequency thresholds may be an early marker for hidden hearing loss/CND. We have examined an additional correlation for extended high frequency pure tone averages (8k-16k Hz) with EFR amplitudes at 1024 Hz AM rate, which revealed a significant relationship (r = -.43, p < .001). However, we opted to exclude this analysis from our current study as we wanted to reduce reporting on several one-to-one correlations. Therefore, we chose the elastic net regression model to examine individual contributions to speech in noise abilities. EHF thresholds were included in the elastic net regression models, but were not found to be significant upon accounting for individual differences in PTA.

      Additionally, our electrophysiological experimental paradigm was not designed with the consideration of extended high frequencies—we used ER3C transducers which are not optimal for frequencies above ~6kHz. Future studies could use transducers such as the ER2 or free field speakers to examine the influence of extended high frequencies on the EFRs and measure high frequency thresholds in gerbils.

      Minor Comments:

      (1) Abstract: repetition of 'later in life' in the first two sentences - please reformulate.

      We have revised the first two sentences to state: “Middle-age is a critical period of rapid changes in brain function that presents an opportunity for early diagnostics and intervention for neurodegenerative conditions later in life. Hearing loss is one such early indicator linked to many comorbidities in older age.”

      (2) Sentence on page 3 [However, these behavioral readouts may minimize subliminal changes in perception that are reflected in listening effort but not in accuracies (26-28)] is not clear.

      We’ve added a sentence just after that states: “Specifically, two individuals may show similar accuracies on a listening task, but one individual may need to exert substantially more listening effort to achieve the same accuracy as the other.”

      (3) The second paragraph of page 11 should go to a methods (model) section, not to the discussion.

      We have now moved a portion of this paragraph to the Elastic Net Regression subsection of the Statistical Analysis in the Methods.

      (4) Please checks references: references 13 and 25 are identical.

      Fixed

      References

      Auerbach, Benjamin D., Kelly Radziwon, and Richard Salvi. 2019. “Testing the Central Gain Model: Loudness Growth Correlates with Central Auditory Gain Enhancement in a Rodent Model of Hyperacusis.” Neuroscience 407:93–107. https://doi.org/10.1016/j.neuroscience.2018.09.036.

      Bramhall, Naomi F., Brad N. Buran, and Garnett P. McMillan. 2025. “Associations Between Physiological Indicators of Cochlear Deafferentation and Listening Effort in Military Veterans with Normal Audiograms.” Hearing Research, April, 109263. https://doi.org/10.1016/j.heares.2025.109263.

      Cancel, Victoria E., Jacie R. McHaney, Virginia Milne, Catherine Palmer, and Aravindakshan Parthasarathy. 2023. “A Data-Driven Approach to Identify a Rapid Screener for Auditory Processing Disorder Testing Referrals in Adults.” Scientific Reports 13 (1): 13636. https://doi.org/10.1038/s41598-023-40645-0.

      Caspary, D. M., L. Ling, J. G. Turner, and L. F. Hughes. 2008. “Inhibitory Neurotransmission, Plasticity and Aging in the Mammalian Central Auditory System.” Journal of Experimental Biology 211 (11): 1781–91. https://doi.org/10.1242/jeb.013581.

      Cohen, Jacob. 2013. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. New York: Routledge. https://doi.org/10.4324/9780203771587.

      Encina-Llamas, Gerard, Aravindakshan Parthasarathy, James Michael Harte, Torsten Dau, Sharon G. Kujawa, Barbara Shinn-Cunningham, and Bastian Epp. 2017. “Hidden Hearing Loss with Envelope Following Responses (EFRs): The off-Frequency Problem: 40th MidWinter Meeting of the Association for Research in Otolaryngology.” In .

      James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.

      Joris, P. X., C. E. Schreiner, and A. Rees. 2004. “Neural Processing of Amplitude-Modulated Sounds.” Physiological Reviews 84 (2): 541–77. https://doi.org/10.1152/physrev.00029.2003.

      McHaney, Jacie R., Kenneth E. Hancock, Daniel B. Polley, and Aravindakshan Parthasarathy. 2024. “Sensory Representations and Pupil-Indexed Listening Effort Provide Complementary Contributions to Multi-Talker Speech Intelligibility.” Scientific Reports 14 (1): 30882. https://doi.org/10.1038/s41598-024-81673-8.

      Parida, Satyabrata, Kimberly Yurasits, Victoria E. Cancel, Maggie E. Zink, Claire Mitchell, Meredith C. Ziliak, Audrey V. Harrison, Edward L. Bartlett, and Aravindakshan Parthasarathy. 2024. “Rapid and Objective Assessment of Auditory Temporal Processing Using Dynamic Amplitude-Modulated Stimuli.” Communications Biology 7 (1): 1–10. https://doi.org/10.1038/s42003-024-07187-1.

      Parthasarathy, A., and E. L. Bartlett. 2011. “Age-Related Auditory Deficits in Temporal Processing in F-344 Rats.” Neuroscience 192:619–30. https://doi.org/10.1016/j.neuroscience.2011.06.042.

      Parthasarathy, A., J. Lai, and E. L. Bartlett. 2016. “Age-Related Changes in Processing Simultaneous Amplitude Modulated Sounds Assessed Using Envelope Following Responses.” Jaro-Journal of the Association for Research in Otolaryngology 17 (2): 119–32. https://doi.org/10.1007/s10162-016-0554-z.

      Parthasarathy, A., Kenneth E Hancock, Kara Bennett, Victor DeGruttola, and Daniel B Polley. 2020. “Bottom-up and Top-down Neural Signatures of Disordered Multi-Talker Speech Perception in Adults with Normal Hearing.” Edited by Barbara G Shinn-Cunningham, Huan Luo, Fan-Gang Zeng, and Christian Lorenzi. eLife 9 (January):e51419. https://doi.org/10.7554/eLife.51419.

      Parthasarathy, Aravindakshan, and Sharon G. Kujawa. 2018. “Synaptopathy in the Aging Cochlea: Characterizing Early-Neural Deficits in Auditory Temporal Envelope Processing.” The Journal of Neuroscience. https://doi.org/10.1523/jneurosci.324017.2018.

      Ponsot, Emmanuel, Pauline Devolder, Ingeborg Dhooge, and Sarah Verhulst. 2024. “AgeRelated Decline in Neural Phase-Locking to Envelope and Temporal Fine Structure Revealed by Frequency Following Responses: A Potential Signature of Cochlear Synaptopathy Impairing Speech Intelligibility.” bioRxiv. https://doi.org/10.1101/2024.12.11.628010.

      Sergeyenko, Yevgeniya, Kumud Lall, M. Charles Liberman, and Sharon G. Kujawa. 2013. “Age-Related Cochlear Synaptopathy: An Early-Onset Contributor to Auditory Functional Decline.” Journal of Neuroscience 33 (34): 13686–94. https://doi.org/10.1523/jneurosci.1783-13.2013.

      Shaheen, L. A., M. D. Valero, and M. C. Liberman. 2015. “Towards a Diagnosis of Cochlear Neuropathy with Envelope Following Responses.” J Assoc Res Otolaryngol. https://doi.org/10.1007/s10162-015-0539-3.

      Tibshirani, Ryan J., and Jonathan Taylor. 2012. “Degrees of Freedom in Lasso Problems.” The Annals of Statistics 40 (2): 1198–1232. https://doi.org/10.1214/12-AOS1003.

      Wu, P. Z., L. D. Liberman, K. Bennett, V. de Gruttola, J. T. O’Malley, and M. C. Liberman. 2018. “Primary Neural Degeneration in the Human Cochlea: Evidence for Hidden Hearing Loss in the Aging Ear.” Neuroscience. https://doi.org/10.1016/j.neuroscience.2018.07.053.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, Dop1R1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that Dop1R1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest that different internal states (courtship failure vs. starvation) modulate the peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We have discussed these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      This is an important consideration. To rule out potential confound from food intake during courtship conditioning, we have now also conducted courtship conditioning in vials absent of food. In the absence of any feeding opportunity over the 4.5-hour courtship conditioning period, sexually rejected males still exhibited a robust decrease in sweet taste sensitivity compared with Naïve and Satisfied controls (Figure 1-supplement 1C). These data confirm that the suppression of PER is driven by courtship failure per se, rather than by differences in feeding during the conditioning phase.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Our initial description of the experimental setup might be a bit confusing. Here is a brief clarification of our experimental design and we have further clarified the details in the revised manuscript, which should resolve the reviewer’s concerns:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the dose-response curve). On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005).

      In the initial submission, we used 400 mM sucrose for the MAFE assay. When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding, as a natural consequence of decreased sugar sensitivity (Figure 1B). We were able to quantify the actual volume of food consumed of these flies showing PER responses towards 400 mM sucrose and observed no change (Figure 1-supplement 1A, left). To avoid potential confusion, we have now repeated the MAFE assay with 800 mM sucrose, which elicited feeding in ~100% of flies among all three groups, as shown in Figure 1C. Again, we observed no change in food intake (Figure 1-supplement 1A, right).

      These experiments in combination suggest that sexual failure suppresses sweet sensitivity of the Failed males. Meanwhile, as long as they still responded to a certain food stimulus and initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We have now quantified the effect of TH<sup>+</sup> neuron activation on Gr5a<sup>+</sup> neuron calcium responses. in Naïve males, dTRPA1-mediated activation of TH<sup>+</sup> cells significantly enhanced sucrose-induced calcium responses (Figure 3-supplement 1C); while in Failed males, the baseline activity of Gr5a<sup>+</sup> neurons was lower (Figure 3C), the same activation also produced significant (even slightly larger) effect on the calcium responses of Gr5a<sup>+</sup> neurons (Figure 3-supplement 1D).

      Taken together, we would argue that these experiments using both Naïve and Failed males were adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3-supplement 1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuitry.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We have added detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. For specificity, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. We have now addressed the issues by re-conducting these calcium imaging experiments again with a head-to-head comparison with the controls (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi).

      In these new experiments, Dop1R1 or Dop2R knockdown completely prevented the suppression of Gr5a<sup>+</sup> neuron responsiveness by courtship failure (Figure 6E), whereas the activities of Gr5a<sup>+</sup> neurons in Naïve/Satisfied groups were not altered. These results demonstrate that Dop1R1 and Dop2R are specifically required to mediate the decrease in sweet sensitivity following courtship failure.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We have changed the wording in the revised manuscript. In brief, we think that these manipulations have two consequences: suppressing the overall sweet sensitivity, and eliminating the effect of sexual failure on sweet sensitivity.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine the exact mechanism of how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds Figure 1-supplement 1D-E) upon courtship failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We have further discussed this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Please refer to our responses to the Public Review (Reviewer 1, Point 2) for details.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The label for the y-axis in Figure 1B should be "fraction", not "percentage".

      We have revised the figure as suggested.

      (2) I suggest that the authors indicate the ROIs they used to quantify the signal intensity in Figure 3E and G.

      We have revised the figures as suggested.

      (3) There is a typo in Figure 4A: it should be "Wilde type", not "Wide type".

      We have revised the figure as suggested.

      (4) The elav-GAL4/+ data in Figure 4-S1B, C, and D appears to be reused across these panels. However, the number of asterisks indicating significance in the MAT plots differs between them (three in panels B and C, and four in panel D). Is this a typo?

      It is indeed a typo, and we have revised the figure accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional comments:

      The authors should add this missing literature about dopamine and neuromodulation in courtship:

      Boehm et al., 2022 (eLife) - this study shows that mating affects olfactory behavior in females.

      Cazalé-Debat et al., 2024 (Nature) - Mating proximity blinds threat perception.

      Gautham et al., 2024 (Nature) - A dopamine-gated learning circuit underpins reproductive state-dependent odor preference in Drosophila females.

      We have added these references in the introduction section.

      Has the mating behavior been quantified? How often did males copulate with mated and virgin females?

      We tried to examine the copulation behavior based on our video recordings. In the “Failed” group (males paired with mated females), we observed virtually no successful copulation events at all, confirming that nearly 100% of those males experienced sexual failure. In contrast, males in the “Satisfied” group (paired with virgin females) mated on average 2-3 times during the 4.5-hour conditioning period. We have added some explanations in the manuscript.

      Do the rejected males live shorter? Is the effect also visible when they are fed with normal fly food, or is it only working with sugar?

      We did not directly measure the lifespan of these males. But we conducted a relevant assay (starvation resistance), in which “Failed” males died significantly faster than both Naïve and Satisfied controls, indicating a clear reduction in their ability to endure food deprivation (Figure 1-supplement 1B). Since sweet taste is a primary cue for food detection in Drosophila, and sugar makes up a large portion of their standard diet, the drop in sugar sensitivity we observed in Failed males could likewise impair their perception and consumption of regular fly food, hence their resistance to starvation.

      Also, the authors mention that the reward pathway is affected, this is probably the case as sugar sensation is impaired. One interesting experiment would be (and maybe has been done?) to test rejected males in normal odor-fructose conditioning. The data would suggest that they would do worse.

      We have already measured how courtship failure affected fructose sensitivity (Figure 1 supplement 1D), and we found that the reduction in fructose perception was even more profound than for sucrose. We have not yet tested whether Failed males showed deficits in odor-fructose associative conditioning. That was indeed a very interesting direction to explore. But olfactory reward learning relies on molecular and circuit mechanisms distinct from those governing taste. We therefore argue such experiments would be more suitable in a separate, follow up study.

      The authors could have added another group where males are exposed to other males. It would be interesting if this is also a "stressful" context and if it would also reduce sugar preference - probably beyond the scope of this paper.

      In our experiments, all flies, including those in the Naïve, Failed, and Satisfied groups, were housed in groups of 25 males per vial before the conditioning period (and the Naïve group remained in the same group housing until PER testing). This means every cohort experienced the same level of “social stress” from male-male interactions. While it would indeed be interesting to compare that to solitary housing or other male-only exposures, isolation itself imposes a different kind of stress, and disentangling these effects on sugar preference would require a separate, dedicated study beyond the scope of the present work.

      Would the behavior effect also show up with experienced males? Maybe this has been tested before. Does mating rejection in formerly successful males have the same impact?

      As suggested by the reviewer, we performed an additional experiment in which males that had previously mated successfully were subsequently subjected to courtship rejection. As shown in Figure 1 supplement 1F, prior successful mating did not prevent the decline in sweet sensitivity induced by subsequent mating failure, indicating that even experienced males exhibit the reduction in sugar sensitivity after rejection.

      Is the same circuit present and functioning in females? Does manipulating dopamine receptors in GR5a neurons in females lead to the same phenotype? This would suggest that different internal states in males and females could lead to the same phenotype and circuit modulations.

      This is indeed a very interesting suggestion. In male flies, Gr5a-specific knockdown of dopamine receptors did not alter baseline sweet sensitivity, but it selectively prevented the reduction in sugar perception that followed mating failure (Figure 6C-D), indicating that this dopaminergic pathway is engaged only in the context of courtship rejection. By extension, knocking down the same receptors in female GR5a neurons would likewise be expected to leave their basal sugar sensitivity unchanged. Moreover, because there is currently no established paradigm for inducing mating failure in female flies, we cannot yet test whether sexual rejection similarly modulates sweet taste in females, or whether it operates via the same circuit.

      Reviewer #3 (Recommendations for the authors):

      Suggestions to the authors:

      Introduction, line 61. I suggest the authors add references in fruit flies concerning the rewarding nature of mating. For example, the paper from Zhang et al, 2016 "Dopaminergic Circuitry Underlying Mating Drive" demonstrates the role of the dopamine rewarding system in mating drive. There is a large body of literature showing the link between dopamine and mating.

      We have added this literature in the introduction section.

      Figure 1B and Figure Supplement 1: If I understood correctly, Figure Supplement 1A shows that the total food consumption across all tested flies remains unchanged. However, fewer flies that failed to mate consumed sucrose. I would be curious to see the results for sucrose consumption per individual fly that did eat. According to their results, individual flies that failed to mate should consume more sucrose. This would change the conclusion. The authors currently show that a group of flies that failed to mate consumed less sucrose overall, but since fewer males actually ate, those that failed to mate and did eat consumed more sucrose. The authors should distinguish between failed and satisfied flies in two groups: those that ate and those that did not.

      Please see our responses to the Public Review for details (Reviewer 1, Point 2).

      Figure 1C, right: For a better understanding of all the "MAT" figures, I suggest the authors start the Y axis with the unit 25 and increase it to 400. This would match better the text (line 114) saying that it was significantly elevated in the failed group. As it is, we have the impression of a decrease in the graph.

      We have revised the figures accordingly.

      Line 103: When suggesting a reduced likelihood of meal initiation of these males, do these males take longer to eat when they did it? In other words, is the latency to eat increased in failed males? That would be a good measure of motivational state.

      We tried to analyze feeding latency in the MAFE assay by measuring the time from sucrose presentation to the first proboscis extension, but it was too short to be accurately accounted. Nevertheless, when conducting the experiments, we did not feel/observe any significant difference in the feeding latency between Failed males and Naïve or Satisfied controls.

      Line 117. I don't understand which results the authors refer to when writing "an overall elevation in the threshold to initiate feeding upon appetitive cues". Please specify.

      This phrase refers to the fact that for every sweet tastant we tested, including sucrose (Figure 1C), fructose and glucose (Figure 1 supplement 1D-E), the concentration-response curve in Failed males shifted to the right, and the Mean Acceptance Threshold (MAT) was significantly higher. In other words, for these different appetitive cues, mating failure raised the concentration of sugar required to trigger a proboscis extension, indicating a general elevation in the threshold to initiate feeding upon an appetitive cue.

      Figure 1D. Please specify the time for the satisfied group.

      For clarity, the Naïve and Satisfied groups in Figure 1D each represent pooled data from 0 to 72 hours post-treatment, as their sweet sensitivity remained stable throughout this period. Only the Failed group was shown with time-resolved data, since it was the only group exhibiting a dynamic change in sugar sensitivity over time. We have now specified this in the figure legend.

      Figure 1F. The phenotype was not totally reversed in failed-re-copulated males. Could it be due to the timing between failure and re-copulation? I suggest the authors mention in the figure or in the text, the time interval between failure and re-copulation.

      We’d like to clarify that the interval between the initial treatment (“Failed”) and the opportunity for re copulation was within 30 minutes. The incomplete reversal in the Failed-re-copulated group indeed raised interesting questions. One possible explanation is that mating failure reduces synaptic transmissions between the SEZ dopaminergic neurons and Gr5a<sup>+</sup> sweet sensory neurons (Figure 3), and the regeneration of these transmissions takes a longer time. We have added this information to the figure legend and the Method section.

      Line 227-228 and Figure 3E. The authors showed that the synaptic connections between dopaminergic neurons and Gr5a+ GRNs were significantly weakened. I am wondering about the delay between mating failure and the GFP observation. It would be informative to know this timing to interpret this decrease in synaptic connections. If the timing is relatively long, it is possible that we can observe a neuronal plasticity. However, if this timing is very short, I would not expect such synaptic plasticity.

      The interval between the behavioral treatment and the GRASP-GFP experiment was approximately 20 hours. We chose this time window because it was sufficient for both GFP expression and accumulation. Therefore, the observed reduction in synaptic connections between dopaminergic neurons and Gr5a<sup>+</sup> GRNs likely reflects a genuine, experience-induced structural and functional change rather than an immediate, transient effect. We have added this information to the revised manuscript for clarity in the Method section.

      Line 240-243: The authors demonstrated that there is a reduction of CaLexA-mediated GFP signals in dopaminergic neurons in the SEZ after mating failure, but not a reduction in Gr5a+ GRNs. I suggest replacing "indicate" with "suggest' in line 240.

      We have made the change accordingly. Meanwhile, we would like to clarify that while we observed a reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G), we did not directly test NFAT signal in Gr5a<sup>+</sup> neurons. Notably, the results that the synaptic transmissions from SEZ dopaminergic neurons to Gr5a<sup>+</sup> neurons were weakened (Figure 3E-F), and the reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G-I), were in line with a reduction in sweet sensitivity of Gr5a<sup>+</sup> neurons upon courtship failure (Figure 3B-D).

      Line 243: replace "consecutive" with "constitutive".

      We have revised it accordingly.

      Figure 5: I have trouble understanding the results obtained in Figure 5. Both constitutive activation and inhibition of Dop1R1 and Dop2R neurons lead to the same results, knowing that males who failed mating no longer exhibit decreased sweet sensitivity. I would have expected contrary results for both experimental conditions. I suggest the author to discuss their results.

      Both activation and inhibition of Dop1R1 and Dop2R neurons eliminated the effect of courtship failure on sweet sensitivity (Figure 5). These results are in line with our hypothesis that courtship failure leads to changes in dopamine signaling and hence sweet sensitivity. If dopamine signaling via Dop1R1 and Dop2R was locked, either to a silenced or a constitutively activated state, the effect of courtship failure on sweet sensitivity was eliminated.

      Nevertheless, as the reviewer pointed out, constitutive activation/inhibition should in principle lead to the opposite effect on Naïve flies. In fact, when Dop1R1<sup>+</sup>/Dop2R<sup>+</sup> neurons were silenced in Naïve flies, PER to sucrose was significantly reduced (Figure 5C-D), confirming that these neurons normally facilitate sweet sensation. Meanwhile, while neuronal activation by NaChBac did show a trend towards enhanced PER compared to the GAL4/+ controls, it did not exhibit a difference compared to +>UAS-NaChBac controls that showed a high PER level, likely due to a potential ceiling effect. We have added the discussions to the manuscript.

      Figure 7: I suggest the authors modify their figure a bit. It is not clear why in failed mating, the red arrow in "behavioral modulation" goes to the fly. The authors should find another way to show that mating failure decreased the percentage of flies that are motivated to eat sugar.

      We have modified the figure as suggested.

      Overall, I would suggest the authors be precautious with their conclusion. For example, line 337= "sexual failure suppressed feeding behavior". This is not what is shown by this study. Here, the study shows that mating failure decreases the fraction of flies to eat sucrose. Unless the authors demonstrate that this decrease is generalizable to other metabolites, I suggest the authors modify their conclusion.

      While we primarily used sucrose as the stimulant in our experiments, we also tested responses to two other sugars: fructose and glucose (Figure 1 supplement 1D-E). In all three cases, mating failure led to a significant reduction in sweet perception, suggesting that the effect of courtship failure is not limited to a single metabolite but rather reflects a general decrease in sweet sensitivity. Meanwhile, reduced sweet sensitivity indeed led to a reduction of feeding initiation (Figure 1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is a thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems. We thank the reviewer for these positive comments.

      Weaknesses:

      Characterization of the behavioral effects of manipulations of these PPN input circuits could be further parsed, for a better understanding of the functional consequences of the connections demonstrated in the ephys analyses.

      We have further analyzed our behavioral data to reveal more nuanced functional effects and included these analyses in Figure S2.

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for how the locomotion and valence effects are demonstrated here.

      This is a really interesting future direction and we have expanded on these points in the discussion in lines 771-772 and 782-785.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall these are really valuable studies and help set up a number of future directions.

      We thank the reviewer for their positive comments.

      (2) I don't have many specific suggestions, but more examples of viral targeting and cell type targeting, including potentially some validation of the genetic identity of the cells targeted, could be useful for considering the details of the ephys experiments.

      We agree that understanding which exact SNr and GPe neurons go to which exact PPN populations is an important next step and are planning to conduct future experiments investigating these important questions. Others have found that there is minimal overlap between the three cell types within the PPN discussed in this manuscript (Wang and Morales 2009; Yoo et al. 2017; Steinkellner, Yoo, and Hnasko 2019). One important line of future investigations is to look at the specific inputs onto recently identified subsets of the glutamatergic PPN neurons such as Chx10- and Rbp4-expressing neurons (Goñi-Erro et al. 2023; Ferreira-Pinto et al. 2021). We hope to explore the electrophysiological properties and connectivity of these subtypes in future projects.

      (3) More discussion of which PPN cell types might be mediating the optogenetic behavioral effects of bulk SNr or GPe terminal stimulation would be useful for connecting the ephys results with the behavior.

      We are also interested in the question of which PPN cell type is most critical for mediating the effects observed in bulk terminal stimulation. While the best experiment would be to stimulate the axons projecting to each specific cell type of the PPN, this is not currently possible due to methodological limitations and lack of studies dissecting which SNr and GPe subpopulations project to each cell type of the PPN. However, in future studies, we plan to leverage the ability of AAV1 to jump a synapse along with Cre/Flp viruses and mouse lines to selectively inhibit cholinergic, GABAergic, or glutamatergic PPN neurons that receive GPe or SNr input to elucidate the contribution of each cell type in mediating behavioral changes in movement and valence processing.

      To address these important future directions, we have added additional text in the discussion in lines 771-772 and 782-785.

      Reviewer #2 (Public review):

      Summary:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN.

      Strengths:

      The slice electrophysiology work is technically well done and provides useful information for further studies of PPN. The optogenetics and behavioral studies are thought-provoking, showing that SNr and GPe projections to PPN play distinct roles in behavior.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      Although the optogenetics and behavioral studies are intriguing, they are somewhat difficult to fit together into a specific model of circuit function. Perhaps the authors can work to solidify the connection between these two arms of the work.

      We have expanded on these topics in the discussion.

      Otherwise, there are a few questions whose answers could add context to the interpretation of these results:

      (1) Male and female mice are used, but the authors do not discuss any analysis of sex differences. If there are no sex differences, it is still useful to report data disaggregated by sex in addition to pooled data.

      We have added a supplementary figure (Figure S2) showing distance traveled during optical stimulation for male and female mice.

      (2) There is some lack of clarity in the current manuscript on the ages used - 2-5 months vs "at least 7 weeks." Is 7 weeks the time of virus injection surgery, then recordings 3 weeks later (at least 10 weeks)? Please clarify if these ages apply equally to electrophysiological and behavioral studies. If the age range used for the test is large, it may be useful to analyze and report if there are age-related effects.

      Thanks for pointing this out, we have clarified this in the methods. 7 weeks is the youngest age at which mice used for electrophysiology were injected, and all were used for electrophysiology between 2-5 months. For behavior, the youngest mice used were 11 weeks old at time of behavior (8 weeks old at injection). Mice in the GPe-stimulated condition were 110 ± 7.4 SEM days old and mice in the SNr-stimulated condition 132 ± 23.4 SEM days old. We have added these details to the revised manuscript in lines 913 and 963-964.

      In addition, we have correlated distance traveled at baseline and during stimulation with age for both SN and GPe stimulated conditions. Baseline distance traveled did not correlate with age, but there was a trend toward more movement during stimulation with older mice in the SN axon stimulation group. We have included these plots in supplemental Figure S2.

      (3) Were any exclusion criteria applied, e.g. to account for missed injections?

      All injection sites and implant sites were within our range of acceptability, so we did not exclude any mice for missed injections or incorrect implant location.

      (4) 28-34 degC is a fairly wide range of temperatures for electrophysiological recording, which could affect kinetics.

      This is an important consideration, and we agree the wide temperature is not optimal. We have plotted our main measurement of current amplitude in the condition where we found significant differences between rostral and caudal PPN (SNr to Vglut2 PPN neurons) against temperature and found no correlation (Pearson’s r value = -0.0076). Similarly, we found no correlation between baseline (pre-opto) firing frequency and temperature (r = -0.068). See Author response image 1.

      Author response image 1.

      (5) It would be good to report the number of mice used for each condition in addition to n=cells. Statistically, it would be preferable not to assume that each cell from the same mouse is an independent measurement and to use a nested ANOVA.

      For electrophysiology, the number of mice used in each experiment was 6 (3 male, 3 female). In the manuscript ‘N’ represents number of mice and ‘n’ represents number of cells. Because of the unpredictability of how many healthy cells can be recorded from one mouse, our data were planned to be collected with n=cells, and are underpowered for a nested ANOVA.

      However, in many cases, rostral and caudal data were collected from the same mice. While we do not have sufficient paired data for each electrophysiological parameter, analyzing one of our main and most important findings with a paired comparison (with biological replicates being mice) shows a statistically significant difference in the inhibitory effect of SNr axon stimulation on firing rate between rostral and caudal glutamatergic neurons (p=0.031, Wilcoxon signed rank test). See Author response image 2.

      Author response image 2.

      Reviewer #3 (Public review):

      Summary:

      The study by Fallah et al provides a thorough characterization of the effects of two basal ganglia output pathways on cholinergic, glutamatergic, and GABAergic neurons of the PPN. The authors first found that SNr projections spread over the entire PPN, whereas GPe projections are mostly concentrated in the caudal portion of the nucleus. Then the authors characterized the postsynaptic effects of optogenetically activating these basal ganglia inputs and identified the PPN's cell subtypes using genetically encoded fluorescent reporters. Activation of inputs from the SNr inhibited virtually all PPN neurons. Activation of inputs from the GPe predominantly inhibited glutamatergic neurons in the caudal PPN, and to a lesser extent GABAergic neurons. Finally, the authors tested the effects of activating these inputs on locomotor activity and place preference. SNr activation was found to increase locomotor activity and elicit avoidance of the optogenetic stimulation zone in a real-time place preference task. In contrast, GPe activation reduced locomotion and increased the time in the RTPP stimulation zone.

      Strengths:

      The evidence of functional connectivity of SNr and GPe neurons with cholinergic, glutamatergic, and GABAergic PPN neurons is solid and reveals a prominent influence of the SNr over the entire PPN output. In addition, the evidence of a GPe projection that preferentially innervates the caudal glutamatergic PPN is unexpected and highly relevant for basal ganglia function.

      Opposing effects of two basal ganglia outputs on locomotion and valence through their connectivity with the PPN.

      Overall, these results provide an unprecedented cell-type-specific characterization of the effects of basal ganglia inputs in the PPN and support the well-established notion of a close relationship between the PPN and the basal ganglia.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The behavioral experiments require further analysis as some motor effects could have been averaged out by analyzing long segments.

      We have further analyzed our motor effects and included these analyses in supplemental figure S2 in the revised manuscript.

      Additional controls are needed to rule out a motor effect in the real-time place preference task.

      To address this comment, we analyzed the second day of RTPP, where no stimulation was applied in either chamber. Specifically, we evaluated the time spent in the stimulated chamber during the first minute of the unstimulated RTPP task. We found that the mice that had SNr axon stimulation still avoided the previously stimulated chamber and the mice that had GPe axon stimulation still preferred the previously stimulated chamber. These data have been added to Figure 7 and in the results section lines 564-575.

      Importantly, the location of the stimulation is not reported even though this is critical to interpret the behavioral effects.

      The implant locations were generally over the middle-to-rostral PPN and we will clarify this in the revised manuscript. These locations are shown in figure 7B.

      There are some concerns about the possible recruitment of dopamine neurons in the SNr experiments.

      We have added experiments stimulating the SNc dopaminergic neuron axons in the PPN and found very interesting behavioral effects. These are described in more detail below and in the results lines 595-624. These data are also included in Figure S3.

      Reviewer #3 (Recommendations for the authors):

      (1) Locomotor activity should be analyzed as trial averages instead of session averages. The effect of SNr on locomotion might be showing a rebound of activity in cholinergic neurons, which innervate dopamine neurons and induce locomotion. Furthermore, the variability between animals should be reported, Figure 7C doesn't show a standard deviation.

      This is an important point and could reveal different early and late effects of basal ganglia axon stimulation. We have added a time course graph of the trial averages for the distance traveled in the open field with higher temporal resolution (10s vs 1min). This is included in supplemental Figure S2A&B.

      The variability between animals was shown as shaded area, but was too light and transparent so it was difficult to see in Figure 7C. We have changed this shading to error bars for better visibility.

      (2) SNc projects to the PPN. It has recently been shown that PPN neurons respond robustly to dopaminergic activation, including effects on motor activity (Juarez Tello et al., 2024). The transductions shown in Figure S1 clearly cover to entirety of the SNc. Dopamine blockers should be used in the ex vivo experiments to rule out dopaminergic effects.

      This is an important point and one we were particularly interested in as far as the behavioral experiments. We thank the reviewer for bringing this up because it led us to a really interesting result. We have now run an additional experiment using DAT-cre mice and a cre-dependent ChR2 using the same injection site at our constitutive ChR2 experiments. We found that selectively stimulating the SNc dopaminergic axons replicates the increased locomotion at high laser power and replicates the no change in locomotion at low laser power as seen with our constitutive ChR2 experiment. However, the selective dopaminergic axon activation in the PPN is rewarding at both high and low power, while the constitutive ChR2 activation is aversive. We have added these data to supplemental figure S3, and have added text in the results (lines 595624) and discussion (lines 695-734) about this new exciting finding.

      While we can’t exclude the possibility of dopamine influence on the electrophysiology experiments (via changes in input resistance or channel properties), the fast synaptic currents measured are uncharacteristic of inhibitory D2 receptor currents (which would be slow), and are inhibited by the GABAa receptor blocker, GABAzine.

      (3) Activation of glutamatergic neurons in the caudal PPN elicits locomotion while the same stimulation in the rostral PPN terminates locomotion. In line with this, the authors report important differences in glutamatergic neurons in the rostral vs caudal PPN (Fig. 5). For the behavioral experiments, the location of the optic fiber is not reported. This is essential for the interpretation of the behavioral experiments. Based on the recent literature, inhibiting glutamatergic neurons in the rostral and in the caudal PPN will produce opposing effects.

      We absolutely agree the rostral and caudal PPN differences are functionally important. In Figure 7B, we have mapped the location of the optical fiber tip for each experiment. Our implant location was generally in the rostral-middle part of the PPN and we have added this to the methods section of the revised manuscript in lines 887 and 1048. While we did not have many implant locations that were specifically rostral or specifically caudal, we did evaluate the behavioral response for our most rostrally-located implant and our most-caudally located implant in the SN axon stimulation experiment. We found that low-power laser activation of nigral axons in the most rostral implant resulted in increased locomotion but in the most caudal implant resulted in decreased locomotion. This increased locomotion exactly what we would expect when rostral PPN neurons (that normally inhibit movement) are preferentially inhibited, and decreased locomotion is what we would expect when caudal PPN neurons (that normally promote movement) are inhibited. Future experiments using more precise rostral and caudal implant locations will be needed to fully parse out the functional role of rostral vs caudal PPN. See Figure S4 (two green implant sites are circled for one mouse because the implants were bilateral).

      (4) Even though the authors made an effort to dissect out the motor component during the RTPP task, this was not entirely achieved. Low laser power was still able to decrease activity following GPe stimulation, causing the animal to spend more time in the stimulated compartment. It is not clear the reason for using RTPP as opposed to CPP, which will not have the confound of the effects on motor activity. The interpretation of these data is problematic.

      This is an important consideration, and the reviewer is correct that we can’t completely eliminate a motor contribution to our RTPP experiment. We attempted to minimize potential motor confounds by utilizing unilateral stimulation and our supplemental videos show that the mice can escape the stimulated chamber.

      However, to address this comment, we analyzed the second day of RTPP, where no stimulation was applied in either chamber. Specifically, we evaluated the time spent in the stimulated chamber during the first minute of the unstimulated RTPP task. We found that the mice that had SN stimulation still avoided the previously stimulated chamber and the mice that had GPe axon stimulation still preferred the previously stimulated chamber. These data have been added as Figure 7G and in the results section lines 564-575.

      (5) The resting membrane potential for cholinergic, glutamatergic, and GABAergic neurons is not reported.

      Since a majority of PPN neurons are spontaneously active, we have reported the average membrane voltage during the pre-optical stimulation period in supplementary table 1.

      (6) During the RTPP, the animals were stimulated unilaterally with the purpose of reducing the optogenetic effects on locomotion, but no data support this claim. Please report the locomotor measurements during unilateral stimulation.

      To address this comment, we have analyzed the speed of the mouse in each compartment (stimulated vs non-stimulated) during the RTPP task. We found that the mean speed does differ, in the direction expected (i.e., mice are on average slower in the GPe stimulated zone where they spend more time, and mice are on average faster in the SNr stimulated zone where they spend less time). This is expected because when the mouse spends more time in a zone, it is more likely to spend time grooming or staying still, but it could still be evidence of motor response to the stimulation. To evaluate how fast the mouse is able to move with and without unilateral stimulation, we measured maximum speed in the stimulated and unstimulated zone. We found that maximum speed does not differ between stimulated and unstimulated zones in either the SNr or GPe group. See Author response image 3.

      Author response image 3.

      (7) Given the similarity of the parameters evaluated for all three PPN cell types, the results could be presented in a table, it will be easier to summarize.

      This is a good point and we have added supplemental tables 1-4 for key electrophysiological findings.

      (8) The text is repetitive in some parts.

      We have gone through the results to edit out repetitive text. For example, lines 244-260 and 274-287 have been rewritten for clarity and efficiency.

      (9) Lines 609-620: the behavioral effects after SNr stimulation are not mediated by the PPN, please correct.

      We have corrected this.

      (10) The number of patched GABAergic neurons in the caudal PPN is almost double the number of patched neurons in the rostral PPN. This contrasts with the high density of GABAergic neurons in the rostral PPN reported in the literature, and therefore, the probability of recording GABAergic neurons will be much higher in the rostral PPN. Please comment.

      It is true that there are more GABA neurons in the rostral region, but on a sagittal slice, the rostral region occupies a smaller area compared to caudal and there is a notable cluster of GABAergic neurons in the caudal region (Mena-Segovia et al. 2009). The number of visible and healthy cells with obvious fluorescence against background fluorescence in the heavily myelinated tissue of the PPN is unpredictable and it is possible that the dense number of GABA neurons in the rostral region conglomerates the fluorescence of individual cell somas, making it difficult to detect as many rostral neurons. While we did our best to equally patch rostral and caudal neurons based on our best judgment during the experiment, neurons were ultimately designated as ‘rostral’ or ‘caudal’ after post-hoc staining for the cholinergic neurons, as described below.

      (11) Describe how the rostral and caudal PPN regions were defined and how the authors ensured consistency across recordings.

      We have added more details about the definition of rostral vs caudal PPN in to the methods in lines 1042-1053.

      (12) Please report the proportion of GABAergic neurons showing STD vs STP for rostral and caudal PPN. The data in Figure 3 might be averaging out some important differences. Figure 3L suggests some differences in the proportions.

      The variability within the GABAergic population was really interesting and we plan to pursue this in the future. We have defined STD as PPR<0.95 and STP as PPR>1.05 and added the proportions of caudal and rostral GABAergic PPN neurons with each type of short-term synaptic plasticity to lines 253-257.

      (13) Please report whether the mice’s compartment preferences during the habituation were taken into account for the selection of the laser-on compartment.

      Mice were not habituated to the chamber in the unstimulated condition prior to the RTPP experiment. Laser-on side was randomly chosen and counter-balanced between mice. Mice were also randomly assigned to have low laser power RTPP first or high laser power RTPP first. In each case, mice were given an unstimulated 10-minute trial on the day between the first and second RTPP experiment to ‘unlearn’ which side was stimulated and the second RTPP experiment stimulated the opposite chamber compared to the first RTPP experiment. For example, one mouse would have high power stimulation on the striped side on day 1, no stimulation on day 2, and low power stimulation on the spotted side on day 3. This is now explained more thoroughly in lines 564-575 and lines 992-998.

      (14) Some references to figure panels are missing in the text.

      We have carefully reviewed the manuscript to ensure figure panels are referenced in the text.

      (15) The interpretation in lines 724-725 is not supported by the data given that GPe inputs to cholinergic neurons are negligible.

      We have reworded much of the discussion.

      (16) Some parts of the discussion should go into the “ideas and speculation” subsection of the discussion.

      We have rewritten sections of the discussion.

      References:

      Ferreira-Pinto, Manuel J., Harsh Kanodia, Antonio Falasconi, Markus Sigrist, Maria S. Esposito, and Silvia Arber. 2021. “Functional Diversity for Body Actions in the Mesencephalic Locomotor Region.” Cell 184 (17): 4564-4578.e18. https://doi.org/10.1016/j.cell.2021.07.002.

      Goñi-Erro, Haizea, Raghavendra Selvan, Vittorio Caggiano, Roberto Leiras, and Ole Kiehn. 2023. “Pedunculopontine Chx10+ Neurons Control Global Motor Arrest in Mice.” Nature Neuroscience 26 (9): 1516–28. https://doi.org/10.1038/s41593-023-01396-3.

      Mena-Segovia, J., B. R. Micklem, R. G. Nair-Roberts, M. A. Ungless, and J. P. Bolam. 2009. “GABAergic Neuron Distribution in the Pedunculopontine Nucleus Defines Functional Subterritories.” The Journal of Comparative Neurology 515 (4): 397–408. https://doi.org/10.1002/cne.22065.

      Steinkellner, Thomas, Ji Hoon Yoo, and Thomas S. Hnasko. 2019. “Differential Expression of VGLUT2 in Mouse Mesopontine Cholinergic Neurons.” eNeuro, July. https://doi.org/10.1523/ENEURO.0161-19.2019.

      Wang, Hui-Ling, and Marisela Morales. 2009. “Pedunculopontine and Laterodorsal Tegmental Nuclei Contain Distinct Populations of Cholinergic, Glutamatergic and GABAergic Neurons in the Rat.” The European Journal of Neuroscience 29 (2): 340–58. https://doi.org/10.1111/j.1460-9568.2008.06576.x.

      Yoo, Ji Hoon, Vivien Zell, Johnathan Wu, Cindy Punta, Nivedita Ramajayam, Xinyi Shen, Lauren Faget, Varoth Lilascharoen, Byung Kook Lim, and Thomas S. Hnasko. 2017. “Activation of Pedunculopontine Glutamate Neurons Is Reinforcing.” The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 37 (1): 38–46. https://doi.org/10.1523/JNEUROSCI.3082-16.2016.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      In the future, could you please include the exact changes made to the manuscript in the relevant section of the rebuttal, so it's clear which changes addressed the comment? That would make it easier to see what you refer to exactly - currently I have to guess which manuscript changes implement e.g. "We have tried to make these points more evident".

      Yes, we apologize for the inconvenience.

      On possible navigation solutions:

      I'm not sure if I follow this argument. If the networks uses a shifted allocentric representation centred on its initial state, it couldn't consistently decode the position from different starting positions within the same environment (I don't think egocentric is the right term here - egocentric generally refers to representations relative to the animal's own direction like "to the left" rather than "to the west" but these would not work in the allocentric decoding scheme here). In other words: If I path integrate my location relative to my starting location s1 in environment 1 and learn how to decode that representation to an environment location, I cannot use the same representation when I start from s2 in environment 1, because everything will have shifted. I still believe using boundaries is the only solution to infer the absolute location for the agent here (because that's the only information that it gets), and that's the reason for finding boundary representations (and not grid cells). Imagine doing this task on a perfect torus where there are no boundaries: it would be impossible to ever find out at what 'absolute' location you are in the environment. I have therefore not updated this part of my review, but do let me know if I misunderstood.

      Thank you for addressing this point, which is a somewhat unusual feature of our network: We believe the point you raise applies if the decoding were fixed. However, in our case, the decoding is dynamic and depends on the firing pattern, as place unit centers are decoded on a per-trajectory basis. Thus, a new place-like basis may be formed for each trajectory (and in each environment). Hence, the model is not constrained to reuse its representation across trajectories or environments, as place centers are inferred based on unit firing. However, we do observe that the network learns to use a fixed place field placement in each geometry, which likely reflects some optimal solution to the decoding problem. This might also help to explain the hexagonal arrangement of learned field centers. Finally, we agree that egocentric may not be entirely accurate, but we found it to be the best word to distinguish from the allocentric-type navigation adopted by the network.

      Regarding noise injection:

      Beyond that noise level, the network might return to high correlations, but that must be due to the boundary interactions - very much like what happens at the very beginning of entering an environment: the network has learned to use the boundary to figure out where it is from an uninformative initial hidden state. But I don't think this is currently reflected well in the main text. That still reads "Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor." I think your new (very useful!) velocity ablations show that only small noise is compensated for by attractor dynamics, and larger noise injections are error corrected through boundary interactions. I've added this to the new review.

      Thank you for your kind feedback: We have changed the phrasing in the text to say “robust even to moderate perturbations. ” As we hold that, while numerically small, the amount of injected noise is rather large when compared to the magnitude of activities in the network (see Fig. A5d); the largest maximal rate is around 0.1, which is similar to the noise level at which output representations fail to re-converge. However, some moderation is appropriate, we agree.

      On contexts being attractive:

      In the new bit of text, I'm not sure why "each environment appears to correspond to distinct attractive states (as evidenced by the global-type remapping behavior)", i.e. why global-type remapping is evidence for attractive states. Again, to me global-type remapping is evidence that contexts occupy different parts of activity space, but not that they are attractive. I like the new analysis in Appendix F, as it demonstrates that the context signal determines which region of activity space is selected (as opposed to the boundary information!). If I'm not mistaken, we know three things: 1. Different contexts exist in different parts of representation space, 2. Representations are attractive for small amounts of noise, 3. The context signal determines which point in representation space is selected (thanks to the new analysis in Appendix F). That seems to be in line with what the paper claims (I think "contexts are attractive" has been removed?) so I've updated the review.

      It seems to us that we are in agreement on this point; our aim is simply to point out that a particular context signal appears to correspond to a particular (discrete) attractor state (i.e., occupying a distinct part of representation space, as you state), it just seems we use slightly different language, but to avoid confusion, we changed this to say that “representations are attractive”.

      Thanks again for engaging with us, this discussion has been very helpful in improving the paper.

      Reviewer #2:

      However, I still struggle to understand the entire picture of the boundary-to-place-to-grid model. After all, what is the role of grid cells in the proposed view? Are they just redundant representations of the space? I encourage the authors to clarify these points in the last two paragraphs on pages 17-18 of the discussion.

      Thank you for your feedback. While we have discussed the possible role of a grid code to some extent, we agree that this point requires clarification. We have therefore added to the discussion on the role of grid cells, which now reads “While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry \cite{xu_conformal_2022, schaeffer_self-supervised_2023, schoyen_hexagons_2024}, capacity, spatial separation and path invariance \cite{schaeffer_self-supervised_2023}. Another possibility is that grid cells are geared more towards other cognitive tasks, such as providing a neural metric for space \cite{ginosar_are_2023, pettersen_self-supervised_2024}, or supporting memory and inference-making \cite{whittington_tolman-eichenbaum_2020}. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.”

      Thank you again for your time and input.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a nonhymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      We agree that a functional proof of the PsimOR14 function using reverse genetics would be a valuable addition to the study to firmly establish its role in trail pheromone sensing. Nevertheless, such a functional proof is difficult to obtain. Due to the very slow ontogenetic development inherent to termites (several months from an egg to the worker stage) the CRISPR Cas-9 is not a useful technique for this taxon. By contrast, termites are quite responsive to RNAimediated silencing and RNAi has previously been used for the silencing of the ORCo co-receptor in termites resulting in impairment of the trail-following behavior (DOI: 10.1093/jee/toaa248). Likewise, our previous experiments showed a decreased ORCo transcript abundance, lower sensitivity to neocembrene and reduced neocembrene trail following upon dsPsimORCo administration to P. simplex workers, while we did not succeed in reducing the transcript abundance of PsimOR14 upon dsPsimOR14 injection. We do not report these negative results in the present manuscript so as not to dilute the main message. In parallel, we are currently developing an alternative way of dsRNA delivery using nanoparticle coating, which may improve the RNAi experiments with ORs in termites.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      We thank the reviewer for the overall positive evaluation of the manuscript.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

      We address point by point the specific comments listed in the Recommendation to the authors chapter below.

      Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      We understand the comment about the lack of an intelligible cue to highlight the motivation and importance of the present study. In the current version of the manuscript the introduction has been reworked. As suggested by Reviewer 3 in the Recommendations section below, the introduction now integrates some parts of the original discussion, especially the part discussing the OR evolution and emergence of eusociality in hymenopteran social insects and in termites, while underscoring the need of data from termites to compare the commonalities and idiosyncrasies in neurophysiological (pre)adaptations potentially linked with the independent eusociality evolution in the two main social insect clades.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      Indeed, we were extremely lucky. Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. The selection criteria for the first set of four receptors were (i) to have full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) to be represented on different branches (subbranches) of the phylogenetic tree. Then it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component. In the revised version, we state these selection criteria in the results section (Phylogenetic reconstruction and candidate OR selection).

      The deorphanization attempts of additional P. simplex ORs are currently running.

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      We agree that the classification of multiporous sensilla into five categories lacks robust discrimination cues. The identification of the neocembrene-responding sensillum was initially carried out by SSR measurements on individual olfactory sensilla of P. simplex workers one-by-one and the topology of each tested sensillum was recorded on optical microscope photographs taken during the SSR experiment. Subsequently, the SEM and HR-SEM were performed in which we localized the neocembrene sensillum and tried to find distinguishing characters. We admit that these are not robust. Therefore, in the revised version of the manuscript we decided to abandon the attempt of sensilla classification and only report the observations about the specific sensillum in which we consistently recorded the response to neocembrene (and geranylgeraniol). The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.

      The reviewer points at an important methodological error we made while designing the experiments. Indeed, the inclusion of long-chain hydrocarbons into Panel 1 without additional heat applied to the odor cartridges was inappropriate, even though the experiments were performed at 25–26 °C. We carefully considered the best solution to correct the mistake and finally decided to remove all tested ligands beyond C22 from Panel 1, i.e. altogether five compounds. These changes did not affect the remaining Panels 2-4 (containing compounds with sufficient volatility), nor did they affect the message of the manuscript on highly selective response of PsimOR14 to neocembrene (and geranylgeryniol). In consequence, Figures 2, 3 and 5 were updated, along with the supplementary tables containing the raw data on SSR measurements. In addition, the tuning curve for PsimOR14 was re-built and receptor lifetime sparseness value re-calculated (without any important change). We also exchanged squalene for limonene in the docking and molecular dynamics analysis and made new calculations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) L 208: "than" instead of "that"

      Corrected.

      (2) L 527+527 strange squares (•) before dimensions

      Apparently an error upon file conversion, corrected.

      (3) L553 "reconstructing" instead of "reconstruct"

      Corrected.

      (4) Two references (Chahda et al. and Chang et al. appear too late in the alphabet.

      Corrected. Thank you for spotting this mistake. Due to our mistake the author list was ordered according to the alphabet in Czech language, which ranks CH after H.

      Reviewer #2 (Recommendations for the authors):

      (1) L148: Why did the authors select only four ORs (PsimOR9, 14, 30, and 31) though there are 50 ORs in P. simplex? I would like you to explain why you chose them.

      Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. Then, it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component, while the deorphanization attempts of a set of additional P. simplex ORs is currently running. In the revised version of the manuscript, we state the selection criteria for the four ORs studied in the Results section (Phylogenetic reconstruction and candidate OR selection).

      (2) L149: Where is Figure 1A? Does this mean Figure 1?

      Thank you for spotting this mistake. Fig. 1 is now properly labelled as Fig. 1A and 1B in the figure itself and in the legend. Also the text now either refers to either 1A or 1B.

      (3) Figure 1: The authors also showed the transcription abundance of all 50 ORs of P. simplex in the right bottom of Figure 1, but there is no explanation about it in the main text.

      The heatmap reporting the transcript abundances is now labelled as Fig. 1B and is referred to in the discussion section (in the original manuscript it was referred to on the same place as Fig. 1).

      (4) L260-265: The authors confirmed higher expression of PsimOR14 in workers than soldiers by using RNA-seq data and stronger EAG responses of PsimOR14 to neocembrene in workers than soldiers, but I think that confirming the expression levels of PsimOR14 in workers and soldiers by RT-qPCR would strengthen the authors' argument (it is optional).

      qPCR validation is a suitable complement to read count comparison of RNA Seq data, especially when the data comes from one-sample transcriptomes and/or low coverage sequencing. Yet, our RNA Seq analysis is based on sequencing of three independent biological replicates per phenotype (worker heads vs. soldier heads) with ~20 millions of reads per sample. Thus, the resulting differential gene expression analysis is a sufficient and powerful technique in terms of detection limit and dynamic range.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified since the Methods section only referred to the GenBank accession numbers in the original manuscript. Therefore, we added more information in the Methods section (Bioinformatics) and make clear in the Methods that this data comes from our previous research and related bioproject.

      (5) L491: I think that "The synthetic processes of these fatty alcohols are ..." is better.

      We replaced the sentence with “The de novo organic synthesis of these fatty alcohols is described …”

      (6) L525 and 527: There are white squares between the number and the unit. Perhaps some characters have been garbled.

      Apparently an error upon file conversion, corrected.

      (7) L795: ORCo?

      Corrected.

      (8) L829-830 & Figure 4: Where is Figure 4D?

      Thank you for spotting this mistake from the older version of Figure 4. The SSR traces referred to in the legend are in fact a part of Figure 5. Moreover, Figure 4 is now reworked based on the comments by Reviewer 3.

      (9) L860-864: Why did the authors select the result of edgeR for the volcano plot in Figure 7 although the authors use both DESeq2 and edgeR? An explanation would be needed.

      Both algorithms, DESeq2 and EdgeR, are routinely used for differential gene expression analysis. Since they differ in read count normalization method and statistical testing we decided to use both of them independently in order to reduce false positives. Because the resulting fold changes were practically identical in both algorithms (results for both analyses are listed in Supplementary table S15), we only reported in Fig. 7 the outputs for edgeR to avoid redundancies. We added in the Results section the information that both techniques listed PsimOR14 among the most upregulated in workers.

      Reviewer #3 (Recommendations for the authors):

      The discussion contains many descriptions that would fit better into the introduction, where they could be used to hint at the study's importance (e.g., 292-311, 381-412). The remaining parts often lack a detailed discussion of the results that integrates details from other insect studies. Although references were provided, no details were usually outlined. It would be helpful to see a stronger emphasis on what we learn from this study.

      Along with rewriting the introduction, we also modified the discussion. As suggested, the lines 292-311 were rewritten and placed in the introduction. By contrast, we preferred to keep the two paragraphs 381-412 in the discussion, since both of them outline the potential future interesting targets of research on termite ORs.

      As suggested, the discussion has been enriched and now includes comparative examples and relevant references about the broad/narrow selectivity of insect ORs, about the expected breadth of tuning of pheromone receptors vs. ORs detecting environmental cues, about the potential role of additional neurons housed in the neocembrene-detecting sensillum of P. simplex workers, etc. From both introduction and discussion the redundant details on the chemistry of termite communication have been removed.

      This includes explanations of the advantages of the specific methodologies the authors used and how they helped solve the manuscript's problem. What does the phylogeny solve? Was it used to select the ORs tested? It would be helpful to discuss what the phylogeny shows in comparison to other well-studied OR phylogenies, like those from the social Hymenoptera.

      We understand the comment. In fact, our motivation to include the phylogenetic tree of termite ORs was essentially to demonstrate (i) the orthologous nature of OR diversity with few expansions on low taxonomic levels, and (ii) to demonstrate graphically the relationship among the four selected sequences. We do not attempt here for a comprehensive phylogenetic analysis, because it would be redundant given that we recently published a large OR phylogeny which includes all sequences used in the present manuscript and analysed them in the proper context of related (cockroaches) and unrelated insect taxa (Johny et al., 2023). This paper also discusses the termite phylogenetic pattern with those observed in other Insecta. This paper is repeatedly cited on appropriate places of the present manuscript and its main observations are provided in the Introduction section. Therefore, we feel that thorough discussion on termite phylogeny would be redundant in the present paper.

      The authors categorized the sensilla types. Potential problems in the categorization aside, it would be helpful to know if it is expected that you have sensilla specialized in perceiving one specific pheromone. What is known about sensilla in other insects?

      We understand. In the discussion of the revised version, we develop more about the features typical/expected for a pheromone receptor and the sensillum housing this receptor together with two other olfactory sensory neurons, including examples from other insects.

      As the manuscript currently stands, specialist readers with their respective background knowledge would find this study very interesting. In contrast, the general reader would probably fail to appreciate the importance of the results.

      We hope that the re-organized and simplified introduction may now be more intelligible even for non-specialist readers.

      (1) L35: Should "workers" be replaced with "worker antennae"?

      Corrected.

      (2) L62: Should "conservativeness" be replaced by "conservation"?

      Replaced with “parsimony”.

      (3) L129: How and why did the authors choose four candidate ORs? I could not find any information about this in the manuscript. I wondered why they did not pick the more highly expressed PsimOr20 and 26 (Figure 7).

      As already replied above in the Weaknesses section, we selected for the first deorphanization attempts only a modest set of four ORs, while an additional set is currently being tested. We also explained above the inclusion criteria, i.e. (i) full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) presence on different branches (subbranches) of the OR phylogeny. For these reasons, we did not primarily consider the expression patterns of different ORs. As for Fig. 7, it shows differential expression between soldiers and workers, which was not the primary guideline either and the data was obtained only after having the ORs tested by SSR. Yet, even though we had data on P. simplex ORs expression (Fig. 1B), we did not presume that pheromone receptors should be among the most expressed ORs, given the richness of chemical cues detected by worker termites and unlike, e.g., male moths, where ORs for sex pheromones are intuitively highly expressed.

      The strategy of OR selection is specified in the results section of the revised manuscript under “Phylogenetic reconstruction and candidate OR selection”.

      (4) 198 to 200: SI, II, and III look very similar. Additional measurements rather than qualitative descriptions are required to consider them distinct sensilla. The bending of SIII could be an artifact of preparation. I do not see how the authors could distinguish between SI and SII under the optical microscope for recordings. A detailed explanation is required.

      As we responded above in “Weaknesses” chapter, we admit that the sensilla classification is not intelligible. Therefore, we decided in the revised version to abandon the classification of sensilla types and only focus on the observations made on the neocembreneresponding sensillum. To recognize the specific sensillum, we used its topology on the last antennal segment. Because termite antennae are not densely populated with sensilla, it is relatively easy to distinguish individual sensilla based on their topology on the antenna, both in optical microscope and SEM photographs. The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      (5) 208: "Than" instead of "that"

      Corrected.

      (6) 280: I suggest replacing "demand" with "capabilities"

      Corrected.

      (7) 312: Why "nevertheless? It sounds as if the authors suggest that there is evidence that ORs are not important for communication. This should be reworded.

      We removed “Nevertheless” from the beginning of the sentence.

      (8) 321 to 323: This sentence sounds as if something is missing. I suggest rewriting it.

      This sentence simply says that empty neuron Drosophila is a good tool for termite OR deorphanization and that termite ORs work well Drosophila ORCo. We reworded the sentence.

      (9) 323: I suggest starting a new paragraph.

      Corrected.

      (10) 421: How many colonies were used for each of the analyses?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (11) 430: Did the termites originate from one or multiple colonies and did the authors sample from the Florida and Cuba population?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (12) 501: How was the termite antenna fixated? The authors refer to the Drosophila methods, but given the large antennal differences between these species, more specific information would be helpful.

      Understood. We added the following information into the Methods section under “Electrophysiology”: “The grounding electrode was carefully inserted into the clypeus and the antenna was fixed on a microscope slide using a glass electrode. To avoid the antennal movement, the microscope slide was covered with double-sided tape and the three distal antennal segments were attached to the slide.”

      (13)509: I want to confirm that the authors indicate that the outlet of the glass tube with the airstream and odorant is 4 cm away from the Drosophila or termite antenna. The distance seems to be very large.

      Thank you for spotting this obvious mistake. The 4 cm distance applies for the distance between the opening for Pasteur pipette insertion into the delivery tube, the outlet itself is situated approx. 1 cm from the antenna. This information is now corrected.

      (14) 510/527: It looks like all odor panels were equally applied onto the filter paper despite the difference in solvent (hexane and paraffin oil). How was the solvent difference addressed?

      In our study we combine two types of odorant panels. First, we test on all four studied receptors a panel containing several compounds relevant for termite chemical communication including the C12 unsaturated alcohols, the diterpene neocembrene, the sesquiterpene (3R,6E)-nerolidol and other compounds. These compounds are stored in the laboratory as hexane solutions to prevent the oxidation/polymerization and it is not advisable to transfer them to another solvent. In the second step we used three additional panels of frequently occurring insect semiochemicals, which are stored as paraffin oil solutions, so as to address the breadth of PsimOR14 tuning. We are aware that the evaporation dynamics differ between the two solvents but we did not have any suitable option how to solve this problem. We believe that the use of the two solvents does not compromise the general message on the receptor specificity. For each panel, the corresponding solvent is used as a control. Similarly, the use of two different solvents for SSR can be encountered in other studies, e.g. 10.1016/j.celrep.2015.07.031.

      (15) 518: delta spikes/sec works for all tables except for the wild type in Table S5. I could not figure out how the authors get to delta spikes/sec in that table.

      Thank you for your sharp eye. Due to our mistake, the values of Δ spikes per second reported in Table S5 for W1118 were erroneously calculated using the formula for 0.5 sec stimulation instead of 1 sec. We corrected this mistake which does not impact the results interpretation in Table S5 and Fig. 2.

      522: Did the workers and soldiers originate from different colonies or different populations?

      We now clearly describe in the Material and Methods section the origin of termites for different experiments. EAG measurements were made using individuals (workers, soldiers) from one Cuban colony.

      (16) Figure 6C/D: I suggest matching colors between the two figures. For example, instead of using an orange circle in C and a green coloration of the intracellular flap in D, I recommend using blue, which is not used for something else. In addition, the binding pocket could be separated better from anything else in a different color.

      We agree that the color match for the intracellular flap was missing. This figure is now reworked and the colors should have a better match and the binding region is better delineated.

      (17) Figure 7/Table S15: It is unclear where the transcriptome data originate and what they are based on. Are these antennal transcriptomes or head transcriptomes? Do these data come from previous data sets or data generated in this study? Figure 7 refers to heads, Table S15 to workers and soldiers, and the methods only refer to antennal extractions. This should be clarified in the text, the figure, and the table.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified and that the information that the RNASeq originated from samples of heads+antennae of workers and soldiers should be provided at appropriate places. Therefore, we added more information on replicates and origin of the data in the Methods section (Bioinformatics) and make clear that this data comes from our previous research and refer to the corresponding bioproject. Likewise, the Figure 7 legend and Table S15 heading have been updated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors examine how probabilistic reversal learning is affected by dopamine by studying the effects of methamphetamine (MA) administration. Based on prior evidence that the effects of pharmacological manipulation depend on baseline neurotransmitter levels, they hypothesized that MA would improve learning in people with low baseline performance. They found this effect, and specifically found that MA administration improved learning in noisy blocks, by reducing learning from misleading performance, in participants with lower baseline performance. The authors then fit participants' behavior to a computational learning model and found that an eta parameter, responsible for scaling learning rate based on previously surprising outcomes, differed in participants with low baseline performance on and off MA.

      Questions:

      (1) It would be helpful to confirm that the observed effect of MA on the eta parameter is responsible for better performance in low baseline performers. If performance on the task is simulated for parameters estimated for high and low baseline performers on and off MA, does the simulated behavior capture the main behavioral differences shown in Figure 3?

      We thank the reviewer for this suggestion. We agree that the additional simulation provides valuable confirmation of the effect of methamphetamine (MA) on the eta parameter and subsequent choice behavior. Using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that the simulated behavior reflects the observed mean behavioral differences. Specifically, the simulation demonstrates that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).

      We have incorporated this analysis into the manuscript. Specifically, we added a new figure to illustrate these findings and updated the text accordingly. Below, we detail the changes made to the manuscript.

      From the manuscript page 12, line 25:

      “Sufficiency of the model was evaluated through posterior predictive checks that matched behavioral choice data (see Figure 4D-F and Figure 5) and model validation analyses (see Supplementary Figure 2). Specifically, using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (Figure 5A; mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).”

      (2) In Figure 4C, it appears that the main parameter difference between low and high baseline performance is inverse temperature, not eta. If MA is effective in people with lower baseline DA, why is the effect of MA on eta and not IT?

      Thank you for raising this important point. It is correct that the primary difference between the low and high baseline performance groups in the placebo session lies in the inverse temperature (mean(SD); low baseline performance: 2.07 (0.11) vs. high baseline performance: 2.95 (0.07); t(46) = -5.79, p = 5.8442e-07, d = 1.37). However, there is also a significant difference in the eta parameter between these groups during the placebo session (low baseline performance: 0.33 (0.02) vs. baseline performance: 2.07 (0.11243) vs. high baseline performance: 0.25 (0.02); t(46) = 2.59, p = 0.01, d = 0.53).

      Interestingly, the difference in eta is resolved by MA (mean(SD); low baseline performance: 0.24 (0.02) vs. high baseline performance: 0.23 (0.02); t(46) = 0.39, p = 0.70, d = 0.08), while the difference in inverse temperature remains unaffected (mean(SD); low baseline performance: 2.16 (0.11) vs. high baseline performance: 2.99 (0.08); t(46) = -5.38, p < .001, d = 1.29). Moreover, we checked the distribution of the inverse temperature estimates on/offdrug to ensure the absent drug effect is not driven by outliers. Here, we do not observe any descriptive drug effect (see Author response image 1). Additionally, non-parametric tests indicate no drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      Author response image 1.

      Inverse temperature distribution on/off drug suggest that this parameter is not affected by the drug. Inverse temperature for low (blue points) and high (yellow points) baseline performer tended to be not affected by the drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      This pattern of results might suggests that MA specifically affects eta but not other parameters like the inverse temperature, pointing to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter in turn to be differentially estimated for MA and placebo, while keeping other parameters fixed to the group (low and high baseline performance) mean estimates of the winning model fit to chocie behaviour of the placebo session.

      These control analyses confirmed that MA does not affect inverse temperature in either the low baseline performance group or the high baseline performance group. Similarly, MA did not affect the play bias or learning rate intercept parameter. Yet, it did affect eta in the low performer group (see supplementary table 1 reproduced below).

      Taken together, our data suggest that only the parameter controlling dynamic adjustments of the learning rate based on recent prediction errors, eta, was affected by our pharmacological manipulation and that the paremeters of our models did not trade off. A similar effect has been observed in a previous study investigating the effects of catecholaminergic drug administration in a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, the authors demonstrated that methylphenidate influenced the inverse learning rate parameter as a function of working memory span, assessed through a baseline cognitive task. Similar to our findings, they did not observe drug effects on other parameters in their model including the inverse temperature.

      We have updated the section of the manuscript where we discuss the difference in inverse temperature between low and high performers in the task. From the manuscript (page 19, line 13):

      “While eta seemed to account for the differences in the effects of MA on performance in our low and high performance groups, it did not fully explain all performance differences across the two groups (see Figure 1C and Figure 7A/B). When comparing other model parameters between low and high baseline performers across drug sessions, we found that high baseline performers displayed higher overall inverse temperatures (2.97(0.05) vs. 2.11 (0.08); t(93) = 7.94, p < .001, d = 1.33). This suggests that high baseline performers displayed higher transfer of stimulus values to actions leading to better performance (as also indicated by the positive contribution of this parameter to overall performance in the GLM). Moreover, they tended to show a reduced play bias (-0.01 (0.01) vs. 0.04 (0.03); t(93) = -1.77, p = 0.08, d = 0.26) and increased intercepts in their learning rate term (-2.38 (0.364) vs. -6.48 (0.70); t(93) = 5.03, p < .001, d = 0.76). Both of these parameters have been associated with overall performance (see Figure 6A). Thus, overall performance difference between high and low baseline performers can be attributed to differences in model parameters other than eta. However, as described in the previous paragraph, differential effects of MA on performance on the two groups were driven by eta.

      This pattern of results suggests that MA specifically affects the eta parameter while leaving other parameters, such as the inverse temperature, unaffected. This points to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter, in turn, to be differentially estimated for MA and PL, while keeping the other parameters fixed at the group (low and high baseline performance) mean estimates of the winning model for the placebo session. These control analyses confirmed that MA affects only the eta parameter in the low-performer group and that there is no parameter-trade off in our model (see Supplementary Table 1). A similar effect was observed in a previous study investigating the effects of catecholaminergic drug administration on a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, methylphenidate was shown to influence the inverse learning rate parameter (i.e., decay factor for previous payoffs) as a function of working memory span, assessed through a baseline cognitive task. Consistent with our findings, no drug effects were observed on other parameters in their model, including the inverse temperature.”

      Additionally, we summarized the results in a supplementary table:

      Also, this parameter is noted as temperature but appears to be inverse temperature as higher values are related to better performance. The exact model for the choice function is not described in the methods.

      We thank the reviewer for bringing this to our attention. The reviewer is correct that we intended to refer to the inverse temperature. We have corrected this mistake throughout the manuscript and added information about the choice function to the methods section.

      From the manuscript (page 37, line 3):

      On each trial, this value term was transferred into a “biased” value term (𝑉<sub>𝐵</sub>(𝑋<sub>𝑡</sub>) = 𝐵<sub>𝑝𝑙𝑎𝑦</sub> + 𝑄<sub>𝑡</sub>(𝑋<sub>𝑡</sub>), where 𝐵<sub>𝑝𝑙𝑎𝑦</sub> is the play bias term) and converted into action probabilities (P(play|(𝑉<sub>𝐵 play</sub>(𝑡)(𝑋<sub>𝑡</sub>); P(pass|𝑉<sub>𝐵 pass</sub>(𝑡)(𝑋<sub>𝑡</sub>)) using a softmax function with an inverse temperature (𝛽):

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the task was quite long (700+ trials), were there any fatigue effects or changes in behavior over the course of the task?

      To address the reviewer comment, we regressed each participant single-trial log-scaled RT and accuracy (binary variable reflecting whether a participant displayed stimulus-appropriate behavior on each trial) onto the trial number as a proxy of time on task. Individual participants’ t-values for the time on task regressor were then tested on group level via two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these two regression models are shown in the supplementary table 2 and raw data splits in supplementary figure S7. Results demonstrate that the choice behavior was not systematically affected over the course of the task. This effect was not different between low and high baseline performers and not affected by the drug. In contrast, participants’ reaction time decreased over the course of the task and this speeding was enhanced by MA, particularly in the low performance group.

      We added the following section to the supplementary materials and refer to this information in the task description section of the manuscript (page 35, line 26):

      “Time-on-Task Effects

      Given the length of our task, we investigated whether fatigue effects or changes in behavior occurred over time. Specifically, we regressed each participant's single-trial log-scaled reaction times (RT) and accuracy (a binary variable reflecting whether participants displayed stimulus-appropriate behavior on each trial) onto trial number, which served as a proxy for time on task. The resulting t-values for the time-on-task regressor were analyzed at the group level using two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these regression models are presented in Supplementary Table S2, with raw data splits shown in Supplementary Figure S3.

      Our findings indicate that choice behavior was not systematically affected over the course of the task. This effect did not differ between low and high baseline performers and was not influenced by the drug. In contrast, reaction times decreased over the course of the task, with this speeding effect being enhanced by MA, particularly in the low-performance group.”

      (2) Figure 5J is hard to understand given the lack of axis labels on some of the plots. Also, the scatter plot is on the left, not the right, as stated in the legend.

      We agree that this part of the figure was difficult to understand. To address this issue, we have separated it from Figure 5, added axis labels for clarity, and reworked the figure caption.

      (3) The data and code were not available for review.

      Thank you for pointing this out. The data and code are now made publicly available on GitHub: https://github.com/HansKirschner/REFIT_Chicago_public.git

      We updated the respective section in the manuscript:

      Data Availability Statement All raw data and analysis scripts can be accessed at: https://github.com/HansKirschner/REFIT_Chicago_public.git

      Reviewer #2 (Public review):

      Summary:

      Kirschner and colleagues test whether methamphetamine (MA) alters learning rate dynamics in a validated reversal learning task. They find evidence that MA can enhance performance for low-performers and that the enhancement reflects a reduction in the degree to which these low-performers dynamically up-regulate their learning rates when they encounter unexpected outcomes. The net effect is that poor performers show more volatile learning rates (e.g. jumping up when they receive misleading feedback), when the environment is actually stable, undermining their performance over trials.

      Strengths:

      The study has multiple strengths including large sample size, placebo control, double-blind randomized design, and rigorous computational modeling of a validated task.

      Weaknesses:

      The limitations, which are acknowledged, include that the drug they use, methamphetamine, can influence multiple neuromodulatory systems including catecholamines and acetylcholine, all of which have been implicated in learning rate dynamics. They also do not have any independent measures of any of these systems, so it is impossible to know which is having an effect.

      Another limitation that the authors should acknowledge is that the fact that participants were aware of having different experiences in the drug sessions means that their blinding was effectively single-blind (to the experimenters) and not double-blind. Relatedly, it is difficult to know whether subjective effects of drugs (e.g. arousal, mood, etc.) might have driven differences in attention, causing performance enhancements in the low-performing group. Do the authors have measures of these subjective effects that they could include as covariates of no interest in their analyses?

      We thank the reviewer for highlighting this complex issue. ‘Double blind’ may refer to masking the identity of the drug before administration, or to the subjects’ stated identifications after any effects have been experienced. In our study, the participants were told that they might receive a stimulant, sedative or placebo on any session, so before the sessions their expectations were blinded. After receiving the drug, most participants reported feeling stimulant-like effects on the drug session, but not all of them correctly identified the substance as a stimulant. We note that many subjects identified placebo as ‘sedative’. The Author response image 2 indicates how the participants identified the substance they received.

      Author response image 2.

      Substance identification.

      We share the reviewer’s interest in the extent to which mood effects of drugs are correlated with the drugs’ other effects, including cognitive function. To address this in the present study, we compared the subjective responses to the drug in participants who were low- or highperformers at baseline on the task. The low- and high baseline performers did not differ in their subjective drug effects, including ‘feel drug’ or stimulant-like effects (see Figure 1 from the mansucript reproduced below; peak change from baseline scores for feel drug ratings ondrug: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; ARCI-A score: low baseline performer: 4.87 (0.43) vs. high baseline performer: 4.00 (0.418); t(91) = 1.43, p = 0.15, d = 0.30). Moreover, task performance in the drug session was not correlated with the subjective effects (peak “feel drug” effect: r(94) = 0.09, p = 0.41; peak “stimulant like” effect: r(94) = -0.18, p = 0.07).

      We have added details of these additional analyses to the manuscript. Since there were no significant differences in subjective drug effects between low- and high-baseline performers, and these effects were not systematically associated with task performance, we did not include these measurements as covariates in our analyses. Furthermore, as both subjective measurements indicate a similar pattern, we have chosen not to report the ARCI-A effects in the manuscript.

      From the manuscript (page 6, line 5ff):

      “Subjective drug effects MA administration significantly increased ‘feel drug effect’ ratings compared to PL, at 30, 50, 135, 180, and 210 min post-capsule administration (see Figure 1; Drug x Time interaction F(5,555) = 38.46, p < 0.001). In the MA session, no differences in the ‘feel drug effect’ were observed between low and high baseline performer, including peak change-from-baseline ratings (rating at 50 min post-capsule: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; rating at 135 min post-capsule: low baseline performer: 37.27 (4.15) vs. high baseline performer: 45.38 (3.84); t(91) = 1.42, p = 0.15, d = 0.29).”

      Reviewer #2 (Recommendations for the authors):

      I was also concerned about the distinctions between the low- and high-performing groups. It is unclear why, except for simplicity of presentation, they chose to binarize the sample into high and low performers. I would like to know if the effects held up if they analyzed interactions with individual differences in performance and not just a binarized high/low group membership. If the individual difference interactions do not hold up, I would like to know the authors' thoughts on why they do not.

      Thank you for raising this important issue. We chose a binary discretization of baseline performance to simplify the analysis and presentation. However, we acknowledge that this simplification may limit the interpretability of the results.

      To address the reviewer’s concern, we conducted additional linear mixed-effects model (LMM) analyses, focusing on the key findings reported in the manuscript. See supplementary materials section “Linear mixed effects model analyses for key findings”

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      See supplementary materials section “Linear mixed effects model analyses for key findings”

      Perhaps relatedly, in multiple analyses, the authors point out that there are drug effects for the low-performance group, but not the high-performance group. This could reflect the well-documented baseline-dependency effect of catecholamergic drugs. However, it might also reflect the fact that the high-performance group is closer to their ceiling. So, a performance-enhancement drug might not have any room to make them better. Note that their results are not consistent with inverted-U-like effects, previously described, where high performers actually get worse on catecholaminergic drugs.

      Given that the authors have the capacity to simulate performance as a function of parameter values, they could specifically simulate how much better performance could get if their high-performance group all moved proportionally closer to optimal levels of the parameter eta. On the basis of that analysis do they have any evidence that they had the power to detect an effect in the high performance group? If not, they should just acknowledge that ceiling effects might have played a role for high performers.

      We agree with the reviewer's interpretation of the results. First, when plotting overall task performance and the probability of correct choices in the high outcome noise condition—the condition where we observe the strongest drug-induced performance enhancement—we find minimal performance variation among high baseline performers. In both testing sessions, high baseline performers cluster around optimal performance, with little evidence of drug-induced changes (see Supplementary Figure 6).

      Furthermore, performance simulations using (a) optimal eta values and (b) observed eta values from the high baseline performance group reveal only a small, non-significant performance difference (points optimal eta: 701.91 (21.66) vs. points high performer: 694.47 (21.71); t(46) = 2.84, p = 0.07, d = 0.059).

      These results suggest that high baseline performers are already near optimal performance, limiting the potential for drug-related performance improvements. We have incorporated this information into the manuscript (page 30, line 24ff).

      “It is important to note, that MA did not bring performance of low baseline performers to the level of performance of high baseline performers. We speculate that high performers gained a good representation of the task structure during the orientation practice session, taking specific features of the task into account (change point probabilities, noise in the reward probabilities). This is reflected in a large signal to noise ratio between real reversals and misleading feedback. Because the high performers already perform the task at a near-optimal level, MA may not further enhance performance (see Supplementary Figure S6 for additional evidence for this claim). Intriguingly, the data do not support an inverted-u-shaped effect of catecholaminergic action (Durstewitz & Seamans, 2008; Goschke & Bolte, 2018) given that performance of high performers did not decrease with MA. One could speculate that catecholamines are not the only factor determining eta and performance. Perhaps high performers have a generally more robust/resilient decision-making system which cannot be perturbed easily. Probably one would need even higher doses of MA (with higher side effects) to impair their performance.”

      Finally, I am confused about why participants are choosing correctly at higher than 50% on the first trial after a reversal (see Figure 3)? How could that be right? If it is not, does this mean that there is a pervasive error in the analysis pipeline?

      Thank you for pointing this out. The observed pattern is an artifact of the smoothing (±2 trials) applied to the learning curves in Figure 3. Below, we reproduce the figure without smoothing.

      Additionally, we confirm that the probability of choosing the correct response is not above chance level (t-test against chance): • All reversals: t(93)=1.64,p=0.10,d=0.17, 99% CI[0.49,0.55] • Reversal to low outcome noise: t(93)=1.67,p=0.10,d=0.17, 99% CI [0.49,0.56] • Reversal to high outcome noise: t(93)=0.87,p=0.38,d=0.09, 99% CI [0.47,0.56]

      We have amended the caption of Figure 3 accordingly. Moreover, we included an additional figure in this revision letter (Author response image 4) showing a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. Notably, this performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Author response image 3.

      Learning curves after reversals suggest that methamphetamine improves learning performance in phases of less predictable reward contingencies in low baseline performer. Top panel of the Figure shows learning curves after all reversals (A), reversals to stimuli with less predictable reward contingencies (B), and reversals to stimuli with high reward probability certainty (C). Bottom panel displays the learning curves stratified by baseline performance for all reversals (D), reversals to stimuli with less predictable reward probabilities (E), and reversals to stimuli with high reward probability certainty (F). Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Results suggest no clear differences in the initial learning between MA and PL. However, learning curves diverged later in the learning, particular for stimuli with less predictable rewards (B) and in subjects with low baseline performance (E). Note. PL = Placebo; MA = methamphetamine; Mean/SEM = line/shading.

      Author response image 4.

      Adaptive behavior following reversals. Each graph shows participants' performance (i.e., stimulus-appropriate behavior: playing good stimuli with 70/80% reward probability and passing on bad stimuli with 20/30% reward probability) around reversals for the (A) orientation session, (B) placebo session, and (C) methamphetamine session. Trial 0 corresponds to the trial when reversals occurred, unbeknownst to participants. Participants' performance exhibited a fast initial adaptation to reversals, followed by a slower, late-stage adjustment to the new stimulus-reward contingencies, eventually reaching a performance plateau. Notably, we observe a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. This performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Minor comments:

      (1) I'm unclear on what the analysis in 6E tells us. What does it mean that the marginal effect of eta on performance predicts changes in performance? Also, if multiple parameters besides eta (e.g. learning rate) are strongly related to actual performance, why should it be that only marginal adjustments to eta in the model anticipate actual performance improvements when marginal adjustments to other model parameters do not?

      We agree that these simulations are somewhat difficult to interpret and have therefore decided to omit these analyses from the manuscript. Our key point was that individuals who benefited the most from methamphetamine were those who exhibited the most advantageous eta adjustments in response to it. We believe this is effectively illustrated by the example individual shown in Figure 8D.

      (2) Does the vertical black line in Figure 1 show when the tasks were completed, as it says in the caption, or when the task starts, as it indicates in the figure itself?

      Apologies for the confusion. There was a mistake in the figure caption—the vertical line indicates the time when the task started (60 minutes post-capsule intake). We have corrected this in the figure caption.

      (3) The marginally significant drug x baseline performance group interaction does not support strong inferences about differences in drug effects on eta between groups...

      We agree and have added information on this limitation to the Discussion. Additionally, we have addressed the complex relationship between drug effects and baseline performance in the supplementary analyses, as detailed in our previous response regarding the binary discretization of baseline performance.

      (4) Should lines 10-11 on page 12 say "We did not find drug-related differences in any other model parameters..."?

      Thank you for bringing this grammatical error to our attention. We have corrected it.

      (5) It would be good to confirm that the effect of MA on p(Correct after single MFB) does not have an opposite sign from the effect of MA on p(Correct after double MFB). I'm guessing the effect after single is just weak, but it would be good to confirm they are in the same direction so that we can be confident the result is not picking up on spurious relationships after two misleading instances of feedback.

      We confirm that the direction of the effect between eta and p(Correct after single MFB) is similar to p(Correct after double MFB). First, we see a similar negative association between p(Correct after single MFB) and eta (r(94) = -.26, p = 0.01). Similarly there was a descriptive increase in p(Correct after single MFB) for low baseline performer on- vs. off-drug ( p(Correct after single MFB): low baseline performance PL: 0.71 (0.02) vs. low baseline performance MA: 0.73 (0.02); t(46) = 1.27, p = 0.20, d = 0.17).

      (6) "implemented equipped" seems like a typo on page 16, line 26

      Thank you for bringing this typo to our attention. We have corrected it.

      Reviewing Editor (Public Review):

      Summary:

      In this well-written paper, a pharmacological experiment is described in which a large group of volunteers is tested on a novel probabilistic reversal learning task with different levels of noise, once after intake of methamphetamine and once after intake of placebo. The design includes a separate baseline session, during which performance is measured. The key result is that drug effects on learning rate variability depend on performance in this separate baseline session.

      The approach and research question are important, the results will have an impact, and the study is executed according to current standards in the field. Strengths include the interventional pharmacological design, the large sample size, the computational modeling, and the use of a reversal-learning task with different levels of noise.

      (i) One novel and valuable feature of the task is the variation of noise (having 70-30 and 8020 conditions). This nice feature is currently not fully exploited in the modeling of the task and the data. For example, recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) could be usefully leveraged here (by Piray and Daw, 2021, Nat Comm). The current 'signal to noise ratio' analysis that is targeting this issue relies on separately assessing learning rates on true reversals and learning rates after misleading feedback, in a way that is experimenter-driven. As a result, this analysis cannot capture a latent characteristic of the subject's computational capacity.

      We thank the reviewing editor for the positive evaluation of our work and the suggestion to leverage new modeling approaches. In the light of the Piray/Daw paper, it is noteworthy, that the choice behavior of the low performance group in our sample mimics the behavior of their lesioned model, in which stochasticity is assumed to be small and constant. Specifically, low performers displayed higher learning rates, particularly in high outcome noise phases in our task. One possible interpretation of this choice pattern is that they have problems to distinguish volatility and noise. Consistently, surprising outcomes may get misattributed to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. This issue particularly surfaces in phases of high stochasticity in our task. Interestingly, methamphetamine seems to reduce this misattribution. In an exploratory analysis, we fit two models to our task structure using modified code provided by the Piray and Daw paper. The control model made inference about both the volatility and stochasticity. A key assumption of the model is, that the optimal learning rate increases with volatility and decreases with stochasticity. This is because greater volatility raises the likelihood that the underlying reward probability has changed since the last observation, increasing the necessity of relying on new information. In contrast, higher stochasticity reduces the relative informativeness of the new observation compared to prior beliefs about the underlying reward probability. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S5 and S6. Interestingly, we found that the inability to make inference about stochasticity leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learnings stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performers of our task and that methamphetamine has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.

      We incorporated information on these explorative analyses to the manuscript and supplementary material.

      Form the result section (page 23, line 12ff):

      “Methamphetamine may reduce misinterpretation of high outcome noise in low performers

      In our task, outcomes are influenced by two distinct sources of noise: process noise (volatility) and outcome noise (stochasticity). Optimal learning rate should increase with volatility and decrease with stochasticity. Volatility was fairly constant in our task (change points around every 30-35 trials). However, misleading feedback (i.e., outcome noise) could be misinterpreted as indicating another change point because participants don’t know the volatility beforehand. Strongly overinterpreting outcome noise as change points will hinder building a correct estimate of volatility and understanding the true structure of the task. Simultaneously estimating volatility and stochasticity poses a challenge, as both contribute to greater outcome variance, making outcomes more surprising. A critical distinction, however, lies in their impact on generated outcomes: volatility increases the autocorrelation between consecutive outcomes, whereas stochasticity reduces it. Recent computational approaches have successfully utilised this fundamental difference to formulate a model of learning based on the joint estimation of stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024). They report evidence that humans successfully dissociate between volatility and stochasticity with contrasting and adaptive effects on learning rates, albeit to varying degrees. Interestingly they show that hypersensitivity to outcome noise, often observed in anxiety disorders, might arise from a misattribution of the outcome noise to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. It is noteworthy, that we observed a similar hypersensitivity to high outcome noise in low performers in our task that is partly reduced by MA. In an exploratory analysis, we fit two models to our task structure using modified code provided by Piray and Daw (2021) (see Methods for formal Description of the model). The control model inferred both the volatility and stochasticity. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S7 and S8). We found that the inability to make inference about stochasticity, leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learning stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performer of our task and that MA has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.”

      From the discussion (page 28, line 15ff):

      “Exploratory simulation studies using a model that jointly estimates stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024), revealed that MA might reduce the oversensitivity to volatility.”

      See methods section “Description of the joint estimation of stochasticity and volatility model “

      (ii) An important caveat is that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance.

      We agree and have added additional analyses on the issue. See also our response to reviewer 2. There is a consistent effect for low-medium baseline performance. We toned done the reference to low baseline performance but still see strong evidence for a baseline dependency of the drug effect.

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      (iii) Both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior studies in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task) are not made explicit, particularly in the introduction.

      Thank you for raising this point. We have added information of the overlap and differences between our paper and the Rostami Kondoodi et al paper to the introduction and disscussion.

      In the intoduction we added a sentence to higlight the Kondoordi findings (page 3, line 24ff).

      For example, Rostami Kandroodi et al. (2021) reported that the re-uptake blocker methylphenidate did not alter reversal learning overall, but preferentially improved performance in participants with higher working memory capacity.”

      In our Discussion, we go back to this paper, and say how our findings are and are not consistent with their findings (page 32, line 16ff).

      Our findings can be contrasted to those of Rostami Kandroodi et al. (2021), who examined effects of methylphenidate on a reversal learning task, in relation to baseline differences on a cognitive task. Whereas Rostami Kandroodi et al. (2021) found that the methylphenidate improved performance mainly in participants with higher baseline working memory performance, we found that methamphetamine improved the ability to dynamically adjust learning from prediction errors to a greater extent in participants who performed poorly-tomedium at baseline. There are several possible reasons for these apparently different findings. First, MA and methylphenidate differ in their primary mechanisms of action: MPH acts mainly as a reuptake blocker whereas MA increases synaptic levels of catecholamines by inhibiting the vesicular monoamine transporter 2 (VMAT2) and inhibiting the enzyme monoamine oxidase (MAO). These differences in action could account for differential effects on cognitive tasks. Second, the tasks used by Rostami Kandroodi et al. (2021) and the present study differ in several ways. The Rostami Kandroodi et al. (2021) task assessed responses to a single reversal event during the session whereas the present study used repeated reversals with probabilistic outcomes. Third, the measures of baseline function differed in the two studies: Rostami Kandroodi et al. (2021) used a working memory task that was not used in the drug sessions, whereas we used the probabilistic learning task as both the baseline measure and the measure of drug effects. Further research is needed to determine which of these factors influenced the outcomes.”

      performance effects, but this is not true in the general sense, given that an accumulating number of studies have shown that the effects of drugs like MA depend on baseline performance on working memory tasks, which often but certainly not always correlates positively with performance on the task under study.

      We recognize that there is a large body of research reporting that the effects of stimulant drugs are related to baseline performance, and we have adjusted our wording in the Discussion accordingly. At the same time, numerous published studies report acute effects of drugs without considering individual differences in responses, including baseline differences in task performance.

      Reviewing Editor (Recommendations for the Authors):

      (i) To leverage recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) might be usefully leveraged (Piray and Daw, 2021, Nat Comm) to help overcome the shortcomings of the 'signal-to-noise ratio' analysis performed here (learning rates on true reversals minus learning rates after misleading feedback) which is experimenter-driven, and thus cannot capture a latent characteristic of the subject's computational capacity.

      Please see our previous response.

      (ii) To highlight more explicitly the fact that various of the key drug x baseline performance interactions did not reach the statistical threshold.

      Please see our previous responses to this issue.

      (iii) To make more explicit, in the introduction, both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior study in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task).

      Please see our previous response.

      (iv) To revise and tone down, in the discussion section, the statement about novelty, that the existing literature has, to date, overlooked baseline performance effects.

      Please see our previous response.

      (v) It is unclear why the data from the 4th session (under some other sedative drug, which is not mentioned) are not reported. I recommend justifying the details of this manipulation and the decision to omit the report of those results. By analogy 4 other tasks were administered in the current study, but not described. Is there a protocol paper, describing the full procedure?

      Thank you for pointing this out. We added additional information to the method section. We are analysing the other cognitive measures in relation to the brain imaging data obtained on sessions 3 and 4. Therefore we argue, that these are beyond the scope of the present paper. We did not administer any sedative drug. However, participants were informed during orientation that they might receive a stimulant, sedative, or placebo on any testing session to maintain blinding of their expectations before each session.

      “Design. The results presented here were obtained from the first two sessions of a larger foursession study (clinicaltrials.gov ID number NCT04642820). During the latter two sessions of the larger study, not reported here, participants participated in two fMRI scans. During the two 4-h laboratory sessions presented here, healthy adults received methamphetamine (20 mg oral; MA) or placebo (PL), in mixed order under double-blind conditions. One hour after ingesting the capsule they completed the 30-min reinforcement reversal learning task. The primary comparisons were on acquisition and reversal learning parameters of reinforcement learning after MA vs PL. Secondary measures included subjective and cardiovascular responses to the drug.”

      “Orientation session. Participants attended an initial orientation session to provide informed consent, and to complete personality questionnaires. They were told that the purpose of the study was to investigate the effects of psychoactive drugs on mood, brain, and behavior. To reduce expectancies, they were told that they might receive a placebo, stimulant, or sedative/tranquilizer. However, participants only received methamphetamine and placebo. They agreed not to use any drugs except for their normal amounts of caffeine for 24 hours before and 6 hours following each session. Women who were not on oral contraceptives were tested only during the follicular phase (1-12 days from menstruation) because responses to stimulant drugs are dampened during the luteal phase of the cycle (White et al., 2002). Most participants (N=97 out of 113) completed the reinforcement learning task during the orientation session as a baseline measurement. This measure was added after the study began. Participants who did not complete the baseline measurement were omitted from the analyses presented in the main text. We run the key analyses on the full sample (n=109). This sample included participants who completed the task only on the drug sessions. When controlling for session order and number (two vs. three sessions) effects, we see no drug effect on overall performance and learning. Yet, we found that eta was also reduced under MA in the full sample, which also resulted in reduced variability in the learning rate (see supplementary results for more details).”

      “Drug sessions. The two drug sessions were conducted in a comfortable laboratory environment, from 9 am to 1 pm, at least 72 hours apart. Upon arrival, participants provided breath and urine samples to test for recent alcohol or drug use and pregnancy (CLIAwaived Inc,Carlsbad, CAAlcosensor III, Intoximeters; AimStickPBD, hCG professional, Craig Medical Distribution). Positive tests lead to rescheduling or dismissal from the study. After drug testing, subjects completed baseline mood measures, and heart rate and blood pressure were measured. At 9:30 am they ingested capsules (PL or MA 20 mg, in color-coded capsules) under double-blind conditions. Oral MA (Desoxyn, 5 mg per tablet) was placed in opaque size 00 capsules with dextrose filler. PL capsules contained only dextrose. Subjects completed the reinforcement learning task 60 minutes after capsule ingestion. Drug effects questionnaires were obtained at multiple intervals during the session. They completed other cognitive tasks not reported here. Participants were tested individually and were permitted to relax, read or watch neutral movies when they were not completing study measures.”

      (vi) Some features of the model including the play bias parameter require justification, at least by referring to prior work exploring these features.

      We have added information to justify the features of the model.

      Form the method section:

      “The base model (M1) was a standard Q-learning model with three parameters: (1) an inverse temperature parameter of the softmax function used to convert trial expected values to action probabilities, (2) a play bias term that indicates a tendency to attribute higher value to gambling behavior (Jang et al., 2019), ….

      The two additional learning rate terms—feedback confirmation and modality—were added to the model set, as these factors have been shown to influence learning in similar tasks (Kirschner et al., 2023; Schüller et al., 2020).”

      Literature

      Doucet, A., & Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteen years later. Oxford University Press.

      Durstewitz, D., & Seamans, J. K. (2008). The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophrenia. Biol Psychiatry, 64(9), 739-749. https://doi.org/10.1016/j.biopsych.2008.05.015

      Gamerman, D., dos Santos, T. R., & Franco, G. C. (2013). A NON-GAUSSIAN FAMILY OF STATE-SPACE MODELS WITH EXACT MARGINAL LIKELIHOOD. Journal of Time Series Analysis, 34(6), 625-645. https://doi.org/https://doi.org/10.1111/jtsa.12039

      Goschke, T., & Bolte, A. (2018). A dynamic perspective on intention, conflict, and volition: Adaptive regulation and emotional modulation of cognitive control dilemmas. In Why people do the things they do: Building on Julius Kuhl’s contributions to the psychology of motivation and volition. (pp. 111-129). Hogrefe. https://doi.org/10.1027/00540-000

      Jang, A. I., Nassar, M. R., Dillon, D. G., & Frank, M. J. (2019). Positive reward prediction errors during decision-making strengthen memory encoding. Nature Human Behaviour, 3(7), 719-732. https://doi.org/10.1038/s41562-019-0597-3

      Jenkins, D. G., & Quintana-Ascencio, P. F. (2020). A solution to minimum sample size for regressions. PLoS One, 15(2), e0229345. https://doi.org/10.1371/journal.pone.0229345

      Kirschner, H., Nassar, M. R., Fischer, A. G., Frodl, T., Meyer-Lotz, G., Froböse, S., Seidenbecher, S., Klein, T. A., & Ullsperger, M. (2023). Transdiagnostic inflexible learning dynamics explain deficits in depression and schizophrenia. Brain, 147(1), 201-214. https://doi.org/10.1093/brain/awad362

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190. https://doi.org/https://doi.org/10.1016/j.jneumeth.2007.03.024

      Morean, M. E., de Wit, H., King, A. C., Sofuoglu, M., Rueger, S. Y., & O'Malley, S. S. (2013). The drug effects questionnaire: psychometric support across three drug types. Psychopharmacology (Berl), 227(1), 177-192. https://doi.org/10.1007/s00213-0122954-z

      Murphy, K., & Russell, S. (2001). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice (pp. 499-515). Springer. Piray, P., & Daw, N. D. (2020). A simple model for learning in volatile environments. PLoS Comput Biol, 16(7), e1007963. https://doi.org/10.1371/journal.pcbi.1007963

      Piray, P., & Daw, N. D. (2021). A model for learning based on the joint estimation of stochasticity and volatility. Nature Communications, 12(1), 6587. https://doi.org/10.1038/s41467-021-26731-9

      Piray, P., & Daw, N. D. (2024). Computational processes of simultaneous learning of stochasticity and volatility in humans. Nat Commun, 15(1), 9073. https://doi.org/10.1038/s41467-024-53459-z

      Rostami Kandroodi, M., Cook, J. L., Swart, J. C., Froböse, M. I., Geurts, D. E. M., Vahabie, A. H., Nili Ahmadabadi, M., Cools, R., & den Ouden, H. E. M. (2021). Effects of methylphenidate on reinforcement learning depend on working memory capacity. Psychopharmacology (Berl), 238(12), 3569-3584. https://doi.org/10.1007/s00213021-05974-w

      Schüller, T., Fischer, A. G., Gruendler, T. O. J., Baldermann, J. C., Huys, D., Ullsperger, M., & Kuhn, J. (2020). Decreased transfer of value to action in Tourette syndrome. Cortex, 126, 39-48. https://doi.org/10.1016/j.cortex.2019.12.027

      West, M. (1987). On scale mixtures of normal distributions. Biometrika, 74(3), 646-648. https://doi.org/10.1093/biomet/74.3.646

      White, T. L., Justice, A. J., & de Wit, H. (2002). Differential subjective effects of Damphetamine by gender, hormone levels and menstrual cycle phase. Pharmacol Biochem Behav, 73(4), 729-741.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development, providing valuable and interesting findings on AD progression and the effect of early-life choline supplantation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important beyond the context of AD.

      Thank you for identifying several strengths.

      Weaknesses:

      (1) The multiple headings and subheadings, focusing on the experimental method rather than the narrative, reduce the readability.

      We have reduced the number of headings.

      (2) Quantification of NeuN and FosB in WT littermates is needed to demonstrate rescue of neuronal death and hyperexcitability by high choline supplementation and also to gain further insights into the adverse effect of low choline on the performance of WT mice in the behavioral test.

      We agree and have added WT data for the NeuN and ΔFosB analyses. These data are included in the text and figures. For NeuN, the Figure is Figure 6. For ΔFosB it is Figure 7. In brief, the high choline diet restored NeuN and ΔFosB to the levels of WT mice.

      Below is Figure 6 and its legend to show the revised presentation of data for NeuN. Afterwards is the revised figure showing data for ΔFosB. After that are the sections of the Results that have been revised.

      Author response image 1.

      Choline supplementation improved NeuN immunoreactivity (ir) in hilar cells in Tg2576 animals. A. Representative images of NeuN-ir staining in the anterior DG of Tg2576 animals. (1) A section from a Tg2576 mouse fed the low choline diet. The area surrounded by a box is expanded below. Red arrows point to NeuN-ir hilar cells. Mol=molecular layer, GCL=granule cell layer, HIL=hilus. Calibration for the top row, 100 µm; for the bottom row, 50 µm. (2) A section from a Tg2576 mouse fed the intermediate diet. Same calibrations as for 1. (3) A section from a Tg2576 mouse fed the high choline diet. Same calibrations as for 1. B. Quantification methods. Representative images demonstrate the thresholding criteria used to quantify NeuN-ir. (1) A NeuN-stained section. The area surrounded by the white box is expanded in the inset (arrow) to show 3 hilar cells. The 2 NeuN-ir cells above threshold are marked by blue arrows. The 1 NeuN-ir cell below threshold is marked by a green arrow. (2) After converting the image to grayscale, the cells above threshold were designated as red. The inset shows that the two cells that were marked by blue arrows are red while the cell below threshold is not. (3) An example of the threshold menu from ImageJ showing the way the threshold was set. Sliders (red circles) were used to move the threshold to the left or right of the histogram of intensity values. The final position of the slider (red arrow) was positioned at the onset of the steep rise of the histogram. C. NeuN-ir in Tg2576 and WT mice. Tg2576 mice had either the low, intermediate, or high choline diet in early life. WT mice were fed the standard diet (intermediate choline). (1) Tg2576 mice treated with the high choline diet had significantly more hilar NeuN-ir cells in the anterior DG compared to Tg2576 mice that had been fed the low choline or intermediate diet. The values for Tg2576 mice that received the high choline diet were not significantly different from WT mice, suggesting that the high choline diet restored NeuN-ir. (2) There was no effect of diet or genotype in the posterior DG, probably because the low choline and intermediate diet did not appear to lower hilar NeuN-ir.

      Author response image 2.

      Choline supplementation reduced ∆FosB expression in dorsal GCs of Tg2576 mice. A. Representative images of ∆FosB staining in GCL of Tg2576 animals from each treatment group. (1) A section from a low choline-treated mouse shows robust ∆FosB-ir in the GCL. Calibration, 100 µm. Sections from intermediate (2) and high choline (3)-treated mice. Same calibration as 1. B. Quantification methods. Representative images demonstrating the thresholding criteria established to quantify ∆FosB. (1) A ∆FosB -stained section shows strongly-stained cells (white arrows). (2) A strict thresholding criteria was used to make only the darkest stained cells red. C. Use of the strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice treated with the choline supplemented diet had significantly less ∆FosB-ir compared to the Tg2576 mice fed the low or intermediate diets. Tg2576 mice fed the high choline diet were not significantly different from WT mice, suggesting a rescue of ∆FosB-ir. (2) There were no significant differences in ∆FosB-ir in posterior sections. D. Methods are shown using a threshold that was less strict. (1) Some of the stained cells that were included are not as dark as those used for the strict threshold (white arrows). (2) All cells above the less conservative threshold are shown in red. E. Use of the less strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice that were fed the high choline diet had less ΔFosB-ir pixels than the mice that were fed the other diets. There were no differences from WT mice, suggesting restoration of ∆FosB-ir by choline enrichment in early life. (2) Posterior DG. There were no significant differences between Tg2576 mice fed the 3 diets or WT mice.

      Results, Section C1, starting on Line 691:

      “To ask if the improvement in NeuN after MCS in Tg256 restored NeuN to WT levels we used WT mice. For this analysis we used a one-way ANOVA with 4 groups: Low choline Tg2576, Intermediate Tg2576, High choline Tg2576, and Intermediate WT (Figure 5C). Tukey-Kramer multiple comparisons tests were used as the post hoc tests. The WT mice were fed the intermediate diet because it is the standard mouse chow, and this group was intended to reflect normal mice. The results showed a significant group difference for anterior DG (F(3,25)=9.20; p=0.0003; Figure 5C1) but not posterior DG (F(3,28)=0.867; p=0.450; Figure 5C2). Regarding the anterior DG, there were more NeuN-ir cells in high choline-treated mice than both low choline (p=0.046) and intermediate choline-treated Tg2576 mice (p=0.003). WT mice had more NeuN-ir cells than Tg2576 mice fed the low (p=0.011) or intermediate diet (p=0.003). Tg2576 mice that were fed the high choline diet were not significantly different from WT (p=0.827).”

      Results, Section C2, starting on Line 722:

      “There was strong expression of ∆FosB in Tg2576 GCs in mice fed the low choline diet (Figure 7A1). The high choline diet and intermediate diet appeared to show less GCL ΔFosB-ir (Figure 7A2-3). A two-way ANOVA was conducted with the experimental group (Tg2576 low choline diet, Tg2576 intermediate choline diet, Tg2576 high choline diet, WT intermediate choline diet) and location (anterior or posterior) as main factors. There was a significant effect of group (F(3,32)=13.80, p=<0.0001) and location (F(1,32)=8.69, p=0.006). Tukey-Kramer post-hoc tests showed that Tg2576 mice fed the low choline diet had significantly greater ΔFosB-ir than Tg2576 mice fed the high choline diet (p=0.0005) and WT mice (p=0.0007). Tg2576 mice fed the low and intermediate diets were not significantly different (p=0.275). Tg2576 mice fed the high choline diet were not significantly different from WT (p>0.999). There were no differences between groups for the posterior DG (all p>0.05).”

      “∆FosB quantification was repeated with a lower threshold to define ∆FosB-ir GCs (see Methods) and results were the same (Figure 7D). Two-way ANOVA showed a significant effect of group (F(3,32)=14.28, p< 0.0001) and location (F(1,32)=7.07, p=0.0122) for anterior DG but not posterior DG (Figure 7D). For anterior sections, Tukey-Kramer post hoc tests showed that low choline mice had greater ΔFosB-ir than high choline mice (p=0.0024) and WT mice (p=0.005) but not Tg2576 mice fed the intermediate diet (p=0.275); Figure 7D1). Mice fed the high choline diet were not significantly different from WT (p=0.993; Figure 7D1). These data suggest that high choline in the diet early in life can reduce neuronal activity of GCs in offspring later in life. In addition, low choline has an opposite effect, suggesting low choline in early life has adverse effects.”

      (3) Quantification of the discrimination ratio of the novel object and novel location tests can facilitate the comparison between the different genotypes and diets.

      We have added the discrimination index for novel object location to the paper. The data are in a new figure: Figure 3. In brief, the results for discrimination index are the same as the results done originally, based on the analysis of percent of time exploring the novel object.

      Below is the new Figure and legend, followed by the new text in the Results.

      Author response image 3.

      Novel object location results based on the discrimination index. A. Results are shown for the 3 months-old WT and Tg2576 mice based on the discrimination index. (1) Mice fed the low choline diet showed object location memory only in WT. (2) Mice fed the intermediate diet showed object location memory only in WT. (3) Mice fed the high choline diet showed memory both for WT and Tg2576 mice. Therefore, the high choline diet improved memory in Tg2576 mice. B. The results for the 6 months-old mice are shown. (1-2) There was no significant memory demonstrated by mice that were fed either the low or intermediate choline diet. (3) Mice fed a diet enriched in choline showed memory whether they were WT or Tg2576 mice. Therefore, choline enrichment improved memory in all mice.

      Results, Section B1, starting on line 536:

      “The discrimination indices are shown in Figure 3 and results led to the same conclusions as the analyses in Figure 2. For the 3 months-old mice (Figure 3A), the low choline group did not show the ability to perform the task for WT or Tg2576 mice. Thus, a two-way ANOVA showed no effect of genotype (F(1,74)=0.027, p=0.870) or task phase (F(1,74)=1.41, p=0.239). For the intermediate diet-treated mice, there was no effect of genotype (F(1,50)=0.3.52, p=0.067) but there was an effect of task phase (F(1,50)=8.33, p=0.006). WT mice showed a greater discrimination index during testing relative to training (p=0.019) but Tg2576 mice did not (p=0.664). Therefore, Tg2576 mice fed the intermediate diet were impaired. In contrast, high choline-treated mice performed well. There was a main effect of task phase (F(1,68)=39.61, p=<0.001) with WT (p<0.0001) and Tg2576 mice (p=0.0002) showing preference for the moved object in the test phase. Interestingly, there was a main effect of genotype (F(1,68)=4.50, p=0.038) because the discrimination index for WT training was significantly different from Tg2576 testing (p<0.0001) and Tg2576 training was significantly different from WT testing (p=0.0003).”

      “The discrimination indices of 6 months-old mice led to the same conclusions as the results in Figure 2. There was no evidence of discrimination in low choline-treated mice by two-way ANOVA (no effect of genotype, (F(1,42)=3.25, p=0.079; no effect of task phase, F(1,42)=0.278, p=0.601). The same was true of mice fed the intermediate diet (genotype, F(1,12)=1.44, p=0.253; task phase, F(1,12)=2.64, p=0.130). However, both WT and Tg2576 mice performed well after being fed the high choline diet (effect of task phase, (F(1,52)=58.75, p=0.0001, but not genotype (F(1,52)=1.197, p=0.279). Tukey-Kramer post-hoc tests showed that both WT (p<0.0001) and Tg2576 mice that had received the high choline diet (p=0.0005) had elevated discrimination indices for the test session.”

      (4) The longitudinal analyses enable the performance of multi-level correlations between the discrimination ratio in NOR and NOL, NeuN and Fos levels, multiple EEG parameters, and premature death. Such analysis can potentially identify biomarkers associated with AD progression. These can be interesting in different choline supplementation, but also in the standard choline diet.

      We agree and added correlations to the paper in a new figure (Figure 9). Below is Figure 9 and its legend. Afterwards is the new Results section.

      Author response image 4.

      Correlations between IIS, Behavior, and hilar NeuN-ir. A. IIS frequency over 24 hrs is plotted against the preference for the novel object in the test phase of NOL. A greater preference is reflected by a greater percentage of time exploring the novel object. (1) The mice fed the high choline diet (red) showed greater preference for the novel object when IIS were low. These data suggest IIS impaired object location memory in the high choline-treated mice. The low choline-treated mice had very weak preference and very few IIS, potentially explaining the lack of correlation in these mice. (2) There were no significant correlations for IIS and NOR. However, there were only 4 mice for the high choline group, which is a limitation. B. IIS frequency over 24 hrs is plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. (1) Hilar NeuN-ir is plotted against the preference for the novel object in the test phase of NOL. There were no significant correlations. (2) Hilar NeuN-ir was greater for mice that had better performance in NOR, both for the low choline (blue) and high choline (red) groups. These data support the idea that hilar cells contribute to object recognition (Kesner et al. 2015; Botterill et al. 2021; GoodSmith et al. 2022).

      Results, Section F, starting on Line 801:

      “F. Correlations between IIS and other measurements

      As shown in Figure 9A, IIS were correlated to behavioral performance in some conditions. For these correlations, only mice that were fed the low and high choline diets were included because mice that were fed the intermediate diet did not have sufficient EEG recordings in the same mouse where behavior was studied. IIS frequency over 24 hrs was plotted against the preference for the novel object in the test phase (Figure 9A). For NOL, IIS were significantly less frequent when behavior was the best, but only for the high choline-treated mice (Pearson’s r, p=0.022). In the low choline group, behavioral performance was poor regardless of IIS frequency (Pearson’s r, p=0.933; Figure 9A1). For NOR, there were no significant correlations (low choliNe, p=0.202; high choline, p=0.680) but few mice were tested in the high choline-treated mice (Figure 9B2).

      We also tested whether there were correlations between dorsal hilar NeuN-ir cell numbers and IIS frequency. In Figure 9B, IIS frequency over 24 hrs was plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. For NOL, there was no significant correlation (low choline, p=0.273; high choline, p=0.159; Figure 9B1). However, for NOR, there were more NeuN-ir hilar cells when the behavioral performance was strongest (low choline, p=0.024; high choline, p=0.016; Figure 9B2). These data support prior studies showing that hilar cells, especially mossy cells (the majority of hilar neurons), contribute to object recognition (Botterill et al. 2021; GoodSmith et al. 2022).”

      We also noted that all mice were not possible to include because they died or other reasons, such a a loss of the headset (Results, Section A, Lines 463-464): Some mice were not possible to include in all assays either because they died before reaching 6 months or for other reasons.

      Reviewer #2 (Public Review):

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      We thank the Reviewer for identifying several strengths.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

      We agree and we revised the paper to address the evaluations that were suggested.

      Reviewer #1 (Recommendations For The Authors):

      (1) A reference directing to genotyping of Tg2576 mice is missing.

      We apologize for the oversight and added that the mice were genotyped by the New York University Mouse Genotyping core facility.

      Methods, Section A, Lines 210-211: “Genotypes were determined by the New York University Mouse Genotyping Core facility using a protocol to detect APP695.”

      (2) Which software was used to track the mice in the behavioral tests?

      We manually reviewed videos. This has been clarified in the revised manuscript. Methods, Section B4, Lines 268-270: Videos of the training and testing sessions were analyzed manually. A subset of data was analyzed by two independent blinded investigators and they were in agreement.

      (3) Unexpectedly, a low choline diet in AD mice was associated with reduced frequency of interictal spikes yet increased mortality and spontaneous seizures. The authors attribute this to postictal suppression.

      We did not intend to suggest that postictal depression was the only cause. It was a suggestion for one of many potential explanations why seizures would influence IIS frequency. For postictal depression, we suggested that postictal depression could transiently reduce IIS. We have clarified the text so this is clear (Discussion, starting on Line 960):

      If mice were unhealthy, IIS might have been reduced due to impaired excitatory synaptic function. Another reason for reduced IIS is that the mice that had the low choline diet had seizures which interrupted REM sleep. Thus, seizures in Tg2576 mice typically started in sleep. Less REM sleep would reduce IIS because IIS occur primarily in REM. Also, seizures in the Tg2576 mice were followed by a depression of the EEG (postictal depression; Supplemental Figure 3) that would transiently reduce IIS. A different, radical explanation is that the intermediate diet promoted IIS rather than low choline reducing IIS. Instead of choline, a constituent of the intermediate diet may have promoted IIS.

      However, reduced spike frequency is already evident at 5 weeks of age, a time point with a low occurrence of premature death. A more comprehensive analysis of EEG background activity may provide additional information if the epileptic activity is indeed reduced at this age.

      We did not intend to suggest that premature death caused reduced spike frequency. We have clarified the paper accordingly. We agree that a more in-depth EEG analysis would be useful but is beyond the scope of the study.

      (4) Supplementary Fig. 3 depicts far more spikes / 24 h compared to Fig. 7B (at least 100 spikes/24h in Supplementary Fig. 3 and less than 10 spikes/24h in Fig. 7B).

      We would like to clarify that before and after a seizure the spike frequency is unusually high. Therefore, there are far more spikes than prior figures.

      We clarified this issue by adding to the Supplemental Figure more data. The additional data are from mice without a seizure, showing their spikes are low in frequency.

      All recordings lasted several days. We included the data from mice with a seizure on one of the days and mice without any seizures. For mice with a seizure, we graphed IIS frequency for the day before, the day of the seizure, and the day after. For mice without a seizure, IIS frequency is plotted for 3 consecutive days. When there was a seizure, the day before and after showed high numbers of spikes. When there was no seizure on any of the 3 days, spikes were infrequent on all days.

      The revised figure and legend are shown below. It is Supplemental Figure 4 in the revised submission.

      Author response image 5.

      IIS frequency before and after seizures. A. Representative EEG traces recorded from electrodes implanted in the skull over the left frontal cortex, right occipital cortex, left hippocampus (Hippo) and right hippocampus during a spontaneous seizure in a 5 months-old Tg2576 mouse. Arrows point to the start (green arrow) and end of the seizure (red arrow), and postictal depression (blue arrow). B. IIS frequency was quantified from continuous video-EEG for mice that had a spontaneous seizure during the recording period and mice that did not. IIS frequency is plotted for 3 consecutive days, starting with the day before the seizure (designated as day 1), and ending with the day after the seizure (day 3). A two-way RMANOVA was conducted with the day and group (mice with or without a seizure) as main factors. There was a significant effect of day (F(2,4)=46.95, p=0.002) and group (seizure vs no seizure; F(1,2)=46.01, p=0.021) and an interaction of factors (F(2,4)=46.68, p=0.002)..Tukey-Kramer post-hoc tests showed that mice with a seizure had significantly greater IIS frequencies than mice without a seizure for every day (day 1, p=0.0005; day 2, p=0.0001; day 3, p=0.0014). For mice with a seizure, IIS frequency was higher on the day of the seizure than the day before (p=0.037) or after (p=0.010). For mice without a seizure, there were no significant differences in IIS frequency for day 1, 2, or 3. These data are similar to prior work showing that from one day to the next mice without seizures have similar IIS frequencies (Kam et al., 2016).

      In the text, the revised section is in the Results, Section C, starting on Line 772:

      “At 5-6 months, IIS frequencies were not significantly different in the mice fed the different diets (all p>0.05), probably because IIS frequency becomes increasingly variable with age (Kam et al. 2016). One source of variability is seizures, because there was a sharp increase in IIS during the day before and after a seizure (Supplemental Figure 4). Another reason that the diets failed to show differences was that the IIS frequency generally declined at 5-6 months. This can be appreciated in Figure 8B and Supplemental Figure 6B. These data are consistent with prior studies of Tg2576 mice where IIS increased from 1 to 3 months but then waxed and waned afterwards (Kam et al., 2016).”

      (5) The data indicating the protective effect of high choline supplementation are valuable, yet some of the claims are not completely supported by the data, mainly as the analysis of littermate WT mice is not complete.

      We added WT data to show that the high choline diet restored cell loss and ΔFosB expression to WT levels. These data strengthen the argument that the high choline diet was valuable. See the response to Reviewer #1, Public Review Point #2.

      • Line 591: "The results suggest that choline enrichment protected hilar neurons from NeuN loss in Tg2576 mice." A comparison to NeuN expression in WT mice is needed to make this statement.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      • Line 623: "These data suggest that high choline in the diet early in life can reduce hyperexcitability of GCs in offspring later in life. In addition, low choline has an opposite effect, again suggesting this maternal diet has adverse effects." Also here, FosB quantification in WT mice is needed.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      (7) Was the effect of choline associated with reduced tauopathy or A levels?

      The mice have no detectable hyperphosphorylated tau. The mice do have intracellular A before 6 months. This is especially the case in hilar neurons, but GCs have little (Criscuolo et al., eNeuro, 2023). However, in neurons that have reduced NeuN, we found previously that antibodies generally do not work well. We think it is because the neurons become pyknotic (Duffy et al., 2015), a condition associated with oxidative stress which causes antigens like NeuN to change conformation due to phosphorylation. Therefore, we did not conduct a comparison of hilar neurons across the different diets.

      (8) Since the mice were tested at 3 months and 6 months, it would be interesting to see the behavioral difference per mouse and the correlation with EEG recording and immunohistological analyses.

      We agree that would be valuable and this has been added to the paper. Please see response to Reviewer #1, Public Review Point #4.

      Reviewer #2 (Recommendations For The Authors):

      There were several areas that could be further improved, particularly in the areas of data analysis (particularly with images and supplemental figures), figure presentation, and mechanistic speculation.

      Major points:

      (1) It is understandable that, for the sake of labor and expense, WT mice were not implanted with EEG electrodes, particularly since previous work showed that WT mice have no IIS (Kam et al. 2016). However, from a standpoint of full factorial experimental design, there are several flaws - purists would argue are fatal flaws. First, the lack of WT groups creates underpowered and imbalanced groups, constraining statistical comparisons and likely reducing the significance of the results. Also, it is an assumption that diet does not influence IIS in WT mice. Secondly, with a within-subject experimental design (as described in Fig. 1A), 6-month-old mice are not naïve if they have previously been tested at 3 months. Such an experimental design may reduce effect size compared to non-naïve mice. These caveats should be included in the Discussion. It is likely that these caveats reduce effect size and that the actual statistical significance, were the experimental design perfect, would be higher overall.

      We agree and have added these points to the Limitations section of the Discussion. Starting on Line 1050: In addition, groups were not exactly matched. Although WT mice do not have IIS, a WT group for each of the Tg2576 groups would have been useful. Instead, we included WT mice for the behavioral tasks and some of the anatomical assays. Related to this point is that several mice died during the long-term EEG monitoring of IIS.

      (2) Since behavior, EEG, NeuN and FosB experiments seem to be done on every Tg2576 animal, it seems that there are missed opportunities to correlate behavior/EEG and histology on a per-mouse basis. For example, rather than speculate in the discussion, why not (for example) directly examine relationships between IIS/24 hours and FosB expression?

      We addressed this point above in responding to Reviewer #1, Public Review Point #4.

      (3) Methods of image quantification should be improved. Background subtraction should be considered in the analysis workflow (see Fig. 5C and Fig. 6C background). It would be helpful to have a Methods figure illustrating intermediate processing steps for both NeuN and FosB expression.

      We added more information to improve the methods of quantification. We did use a background subtraction approach where ImageJ provides a histogram of intensity values, and it determines when there is a sharp rise in staining relative to background. That point is where we set threshold. We think it is a procedure that has the least subjectivity.

      We added these methods to the Methods section and expanded the first figure about image quantification, Figure 6B. That figure and legend are shown above in response to Reviewer #1, Point #2.

      This is the revised section of the Methods, Section C3, starting on Line 345:

      “Photomicrographs were acquired using ImagePro Plus V7.0 (Media Cybernetics) and a digital camera (Model RET 2000R-F-CLR-12, Q-Imaging). NeuN and ∆FosB staining were quantified from micrographs using ImageJ (V1.44, National Institutes of Health). All images were first converted to grayscale and in each section, the hilus was traced, defined by zone 4 of Amaral (1978). A threshold was then calculated to identify the NeuN-stained cell bodies but not background. Then NeuN-stained cell bodies in the hilus were quantified manually. Note that the threshold was defined in ImageJ using the distribution of intensities in the micrograph. A threshold was then set using a slider in the histogram provided by Image J. The slider was pushed from the low level of staining (similar to background) to the location where staining intensity made a sharp rise, reflecting stained cells. Cells with labeling that was above threshold were counted.”

      (4) This reviewer is surprised that the authors do not speculate more about ACh-related mechanisms. For example, choline deficiency would likely reduce Ach release, which could have the same effect on IIS as muscarinic antagonism (Kam et al. 2016), and could potentially explain the paradoxical effects of a low choline diet on reducing IIS. Some additional mechanistic speculation would be helpful in the Discussion.

      We thank the Reviewer for noting this so we could add it to the Discussion. We had not because we were concerned about space limitations.

      The Discussion has a new section starting on Line 1009:

      “Choline and cholinergic neurons

      There are many suggestions for the mechanisms that allow MCS to improve health of the offspring. One hypothesis that we are interested in is that MCS improves outcomes by reducing IIS. Reducing IIS would potentially reduce hyperactivity, which is significant because hyperactivity can increase release of A. IIS would also be likely to disrupt sleep since it represents aberrant synchronous activity over widespread brain regions. The disruption to sleep could impair memory consolidation, since it is a notable function of sleep (Graves et al. 2001; Poe et al. 2010). Sleep disruption also has other negative consequences such as impairing normal clearance of A (Nedergaard and Goldman 2020). In patients, IIS and similar events, IEDs, are correlated with memory impairment (Vossel et al. 2016).

      How would choline supplementation in early life reduce IIS of the offspring? It may do so by making BFCNs more resilient. That is significant because BFCN abnormalities appear to cause IIS. Thus, the cholinergic antagonist atropine reduced IIS in vivo in Tg2576 mice. Selective silencing of BFCNs reduced IIS also. Atropine also reduced elevated synaptic activity of GCs in young Tg2576 mice in vitro. These studies are consistent with the idea that early in AD there is elevated cholinergic activity (DeKosky et al. 2002; Ikonomovic et al. 2003; Kelley et al. 2014; Mufson et al. 2015; Kelley et al. 2016), while later in life there is degeneration. Indeed, the chronic overactivity could cause the degeneration.

      Why would MCS make BFCNs resilient? There are several possibilities that have been explored, based on genes upregulated by MCS. One attractive hypothesis is that neurotrophic support for BFCNs is retained after MCS but in aging and AD it declines (Gautier et al. 2023). The neurotrophins, notably nerve growth factor (NGF) and brain-derived neurotrophic factor (BDNF) support the health of BFCNs (Mufson et al. 2003; Niewiadomska et al. 2011).”

      Minor points:

      (1) The vendor is Dyets Inc., not Dyets.

      Thank you. This correction has been made.

      (2) Anesthesia chamber not specified (make, model, company).

      We have added this information to the Methods, Section D1, starting on Line 375: The animals were anesthetized by isoflurane inhalation (3% isoflurane. 2% oxygen for induction) in a rectangular transparent plexiglas chamber (18 cm long x 10 cm wide x 8 cm high) made in-house.

      (3) It is not clear whether software was used for the detection of behavior. Was position tracking software used or did blind observers individually score metrics?

      We have added the information to the paper. Please see the response to Reviewer #1, Recommendations for Authors, Point #2.

      (4) It is not clear why rat cages and not a true Open Field Maze were used for NOL and NOR.

      We used mouse cages because in our experience that is what is ideal to detect impairments in Tg2576 mice at young ages. We think it is why we have been so successful in identifying NOL impairments in young mice. Before our work, most investigators thought behavior only became impaired later. We would like to add that, in our experience, an Open Field Maze is not the most common cage that is used.

      (5) Figure 1A is not mentioned.

      It had been mentioned in the Introduction. Figure B-D was the first Figure mentioned in the Results so that is why it might have been missed. We now have added it to the first section of the Results, Line 457, so it is easier to find.

      6) Although Fig 7 results are somewhat complicated compared to Fig. 5 and 6 results, EEG comes chronologically earlier than NeuN and FosB expression experiments.

      We have kept the order as is because as the Reviewer said, the EEG is complex. For readability, we have kept the EEG results last.

      (7) Though the statistical analysis involved parametric and nonparametric tests, It is not clear which normality tests were used.

      We have added the name of the normality tests in the Methods, Section E, Line 443: Tests for normality (Shapiro-Wilk) and homogeneity of variance (Bartlett’s test) were used to determine if parametric statistics could be used. We also added after this sentence clarification: When data were not normal, non-parametric data were used. When there was significant heteroscedasticity of variance, data were log transformed. If log transformation did not resolve the heteroscedasticity, non-parametric statistics were used. Because we added correlations and analysis of survival curves, we also added the following (starting on Line 451): For correlations, Pearson’s r was calculated. To compare survival curves, a Log rank (Mantel-Cox) test was performed.

      Figures:

      (1) In Fig. 1A, Anatomy should be placed above the line.

      We changed the figure so that the word “Anatomy” is now aligned, and the arrow that was angled is no longer needed.

      In Fig. 1C and 1D, the objects seem to be moved into the cage, not the mice. This schematic does not accurately reflect the Fig. 1C and 1D figure legend text.

      Thank you for the excellent point. The figure has been revised. We also updated it to show the objects more accurately.

      Please correct the punctuation in the Fig. 1D legend.

      Thank you for mentioning the errors. We corrected the legend.

      For ease of understanding, Fig. 1C and 1D should have training and testing labeled in the figure.

      Thank you for the suggestion. We have revised the figure as suggested.

      Author response image 6.

      (2) In Figure 2, error bars for population stats (bar graphs) are not obvious or missing. Same for Figure 3.

      We added two supplemental figures to show error bars, because adding the error bars to the existing figures made the symbols, colors, connecting lines and error bars hard to distinguish. For novel object location (Fig. 2) the error bars are shown in Supp. Fig. 2. For novel object recognition, the error bars are shown in Supplemental Fig. 3.

      (3) The authors should consider a Methods figure for quantification of NeuN and deltaFOSB (expansions of Fig. 5C and Fig. 6C).

      Please see Reviewer #1, Public Review Point #2.

      (4) In Figure 5, A should be omitted and mentioned in the Methods/figure legend. B should be enlarged. C should be inset, zoomed-in images of the hilus, with an accompanying analysis image showing a clear reduction in NeuN intensity in low choline conditions compared to intermediate and high choline conditions. In D, X axes could delineate conditions (figure legend and color unnecessary). Figure 5C should be moved to a Methods figure.

      We thank the review for the excellent suggestions. We removed A as suggested. We expanded B and included insets. We used different images to show a more obvious reduction of cells for the low choline group. We expanded the Methods schematics. The revised figure is Figure 6 and shown above in response to Reviewer 1, Public Review Point #2.

      (5) In Figure 6, A should be eliminated and mentioned in the Methods/figure legend. B should be greatly expanded with higher and lower thresholds shown on subsequent panels (3x3 design).

      We removed A as suggested. We expanded B as suggested. The higher and lower thresholds are shown in C. The revised figure is Figure 7 and shown above in response to Reviewer 1, Public Review Point #2.

      (6) In Figure 7, A2 should be expanded vertically. A3 should be expanded both vertically and horizontally. B 1 and 2 should be increased, particularly B1 where it is difficult to see symbols. Perhaps colored symbols offset/staggered per group so that the spread per group is clearer.

      We added a panel (A4) to show an expansion of A2 and A3. However, we did not see that a vertical expansion would add information so we opted not to add that. We expanded B1 as suggested but opted not to expand B2 because we did not think it would enhance clarity. The revised figure is below.

      Author response image 7.

      (7) Supplemental Figure 1 could possibly be combined with Figure 1 (use rounded corner rat cage schematic for continuity).

      We opted not to combine figures because it would make one extremely large figure. As a result, the parts of the figure would be small and difficult to see.

      (8) Supplemental Figure 2 - there does not seem to be any statistical analysis associated with A mentioned in the Results text.

      We added the statistical information. It is now Supplemental Figure 4:

      Author response image 8.

      Mortality was high in mice treated with the low choline diet. A. Survival curves are shown for mice fed the low choline diet and mice fed the high choline diet. The mice fed the high choline diet had a significantly less severe survival curve. B. Left: A photo of a mouse after sudden unexplained death. The mouse was found in a posture consistent with death during a convulsive seizure. The area surrounded by the red box is expanded below to show the outstretched hindlimb (red arrow). Right: A photo of a mouse that did not die suddenly. The area surrounded by the box is expanded below to show that the hindlimb is not outstretched.

      The revised text is in the Results, Section E, starting on Line 793:

      “The reason that low choline-treated mice appeared to die in a seizure was that they were found in a specific posture in their cage which occurs when a severe seizure leads to death (Supplemental Figure 5). They were found in a prone posture with extended, rigid limbs (Supplemental Figure 5). Regardless of how the mice died, there was greater mortality in the low choline group compared to mice that had been fed the high choline diet (Log-rank (Mantel-Cox) test, Chi square 5.36, df 1, p=0.021; Supplemental Figure 5A).”

      Also, why isn't intermediate choline also shown?

      We do not have the data from the animals. Records of death were not kept, regrettably.

      Perhaps labeling of male/female could also be done as part of this graph.

      We agree this would be very interesting but do not have all sex information.

      B is not very convincing, though it is understandable once one reads about posture.

      We have clarified the text and figure, as well as the legend. They are above.

      Are there additional animals that were seen to be in a specific posture?

      There are many examples, and we added them to hopefully make it more convincing.

      We also added posture in WT mice when there is a death to show how different it is.

      Is there any relationship between seizures detected via EEG, as shown in Supplemental Figure 3, and death?

      Several mice died during a convulsive seizure, which is the type of seizure that is shown in the Supplemental Figure.

      (9) Supplemental Figure 3 seems to display an isolated case in which EEG-detected seizures correlate with increased IIEs. It is not clear whether there are additional documented cases of seizures that could be assembled into a meaningful population graph. If this data does not exist or is too much work to include in this manuscript, perhaps it can be saved for a future paper.

      We have added other cases and revised the graph. This is now Supplemental Figure 4 and is shown above in response to Reviewer #1, Recommendation for Authors Point #4.

      Frontal is misspelled.

      We checked and our copy is not showing a misspelling. However, we are very grateful to the Reviewer for catching many errors and reading the manuscript carefully.

      (10) Supplemental Figure 4 seems incomplete in that it does not include EEG data from months 4, 5, and 6 (see Fig. 7B).

      We have added data for these ages to the Supplemental Figure (currently Supplemental Figure 6) as part B. In part A, which had been the original figure, only 1.2, 2, and 3 months-old mice were shown because there were insufficient numbers of each sex at other ages. However, by pooling 1.2 and 2 months (Supplemental Figure 6B1), 3 and 4 months (B2) and 5 and 6 months (B3) we could do the analysis of sex. The results are the same – we detected no sex differences.

      Author response image 9.

      A. IIS frequency was similar for each sex. A. IIS frequency was compared for females and males at 1.2 months (1), 2 months (2), and 3 months (3). Two-way ANOVA was used to analyze the effects of sex and diet. Female and male Tg2576 mice were not significantly different. B. Mice were pooled at 1.2 and 2 months (1), 3 and 4 months (2) and 5 and 6 months (3). Two-way ANOVA analyzed the effects of sex and diet. There were significant effects of diet for (1) and (2) but not (3). There were no effects of sex at any age. (1) There were significant effects of diet (F(2,47)=46.21, p<0.0001) but not sex (F(1,47)=0.106, p=0.746). Female and male mice fed the low choline diet or high choline diet were significantly different from female and male mice fed the intermediate diet (all p<0.05, asterisk). (2) There were significant effects of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Both female and male mice of the low choline group were significantly different from male mice fed the intermediate diet (both p<0.05, asterisk) but no other pairwise comparisons were significant. (3) There were no significant differences (diet, F(2,23)=1.21, p=0.317); sex, F(1,23)=0.844, p=0.368).

      The data are discussed the Results, Section G, tarting on Line 843:

      In Supplemental Figure 6B we grouped mice at 1-2 months, 3-4 months and 5-6 months so that there were sufficient females and males to compare each diet. A two-way ANOVA with diet and sex as factors showed a significant effect of diet (F(2,47)=46.21; p<0.0001) at 1-2 months of age, but not sex (F1,47)=0.11, p=0.758). Post-hoc comparisons showed that the low choline group had fewer IIS than the intermediate group, and the same was true for the high choline-treated mice. Thus, female mice fed the low choline diet differed from the females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Male mice that had received the low choline diet different from females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Female mice fed the high choline diet different from females (p=0.002) and males (p<0.0001) fed the intermediate diet, and males fed the high choline diet difference from females (p<0.0001) and males (p<0.0001) fed the intermediate diet.

      For the 3-4 months-old mice there was also a significant effect of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Post-hoc tests showed that low choline females were different from males fed the intermediate diet (p=0.007), and low choline males were also significantly different from males that had received the intermediate diet (p=0.006). There were no significant effects of diet (F(2,23)=1.21, p=0.317) or sex (F(1,23)=0.84, p=0.368) at 5-6 months of age.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain. They claim that ORF1p is expressed in the human and mouse brain at a steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is not surprising, but important to document.

      Thank you for recognizing the importance of this study. The two cited papers have indeed reported the presence of full-length transcripts in the mouse and human brain. However, the first (PMID: 38773348) report has shown evidence of full-length LINE-1 RNA and ORF1 protein expression in the mouse hippocampus (but not elsewhere) and the second (PMID: 37910626) shows full-length LINE-1 RNA expression and H3K4me3-ChIP data in the frontal and temporal lobe of the human brain, but not protein expression.

      Strengths:

      Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the evidence for steady-state expression of ORF1p in the mouse brain appears robust.

      Weaknesses:

      Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments:

      (1) The expression of ORF1p in the human brain shown in Figure 1j is not convincing. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not nonspecific labelling? Additional validations and controls are needed to verify the specificity of this signal.

      We have validated the antibody against human ORF1p (Abcam 245249-> https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr22227-6-ab245249), which we use for Western blotting experiments (please see Fig1J and new Suppl Fig.2A,B and C), by several means.

      (1) We have done immunoprecipitations and co-immunoprecipitations followed by quantitative mass spectrometry (LC-MS/MS; data not shown as they are part of a different study). We efficiently detect ORF1p in IPs (Western blot now added in Suppl Fig2B) and by quantitative mass spectrometry (5 independent samples per IP-ORF1p and IP-IgG: ORF1p/IgG ratio: 40.86; adj p-value 8.7e-07; human neurons in culture; data not shown as they are part of a different study). We also did co-IPs followed by Western blot using two different antibodies, either the Millipore clone 4H1 (https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F) or the Abcam antibody to immunoprecipitate and the Abcam antibody for Western blotting on human brain samples. Indeed, the Millipore antibody does not work well on Western Blots in our hands. We consistently revealed a double band indicating that both bands are ORF1p-derived. We have added an ORF1p IP-Western blot as Suppl Fig. 2B which clearly shows the immunoprecipitation of both bands by the Abcam antibody. Abcam also reports a double band, and they suspect that the lower band is a truncated form (see the link to their website above). ORF1p Western blots done by other labs with different antibodies have detected a second band in human samples

      • Sato, S. et al. LINE-1 ORF1p as a candidate biomarker in high grade serous ovarian carcinoma. Sci Rep 13, 1537 (2023) in Figure 1D

      • McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 119, e2115999119 (2022)) showing a Western blot of an inducible LINE-1 (ORFeus) detected by the MABC1152 ORF1p antibody from Millipore Sigma in Figure 7 - Walter et al. eLife 2016;5:e11418. (DOI: 10.7554/eLife.11418) in mouse ES cells with an antibody made inhouse (gift from another lab; in Figure 2B)

      The lower band might thus be a truncated form of ORF1p or a degradation product which appears to be shared by mouse and human ORF1p. We have now mentioned this in the revised version of the paper (lines 183-189).

      (2) We have used the very well characterized antibody from Millipore ((https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NFMABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F)) for immunostainings and detect ORF1p staining in human neurons in the very same brain regions (Fig 2H, new Suppl Fig. 2E) including the cerebellum in the human brain. We added a 2nd antibody-only control (Suppl Fig. 2E).

      (3) We also did antibody validation by siRNA knock-down. However, it is important to note, that these experiments were done in LUHMES cells, a neuronal cell line which we differentiated into human dopaminergic neurons. In these cells, we only occasionally detect a double band on Western blots, but mostly only reveal the upper band at ≈ 40kD. The results of the knockdown are now added as Suppl Fig. 2C.

      Altogether, based on our experimental validations and evidence from the literature, we are very confident that it is indeed ORF1p that we detect on the blots and by immmunostainings in the human brain.

      (2) The data shown in Figure 2g are not convincing. How can the authors be sure that this signal controls are needed to verify the specificity of this signal. represents ORF1p expression and not non-specific labelling? Extensive additional validations and

      In line 117-123 of the manuscript, we had specified “Importantly, the specificity of the ORF1p antibody, a widely used, commercially available antibody [18,34–38], was confirmed by blocking the ORF1p antibody with purified mouse ORF1p protein resulting in the complete absence of immunofluorescence staining (Suppl Fig. 1A), by using an inhouse antibody against mouse ORF1p[17] which colocalized with the anti-ORF1p antibody used (Suppl Fig. 1B, quantified in Suppl Fig. 1C), and by immunoprecipitation and mass spectrometry used in this study (see Author response image 1)”.

      Figure 2G shows a Western blot using an extensively used and well characterized ORF1p antibody from abcam (mouse ORF1p, Rabbit Recombinant Monoclonal LINE-1 ORF1p antibody-> (https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr21844-108-ab216324; cited in at least 11 publications) after FACS-sorting of neurons (NeuN+) of the mouse brain. We have validated this ORF1p antibody ourselves in IPs (please see Fig 6A) and co-IP followed by mass spectrometry (LC/MS-MS; see Fig 6, where we detect ORF1p exclusively in the 5 independent ORF1p-IP samples and not at all in 5 independent IgG-IP control samples, please also see Suppl Table 2). In this analysis, we detect ORF1p with a ratio and log2fold of ∞ , indicating that this proteins only found in IP-ORF1p samples (5/5) and not in the IP-control samples ((not allowing for the calculation of a ratio with p-value), please see Suppl Table 2)

      Author response image 1.

      In addition, we have added new data showing the entire membrane of the Western blot in Fig1H (now Suppl Fig.1E) and a knock-down experiment using siRNA against ORF1p or control siRNA in mouse dopaminergic neurons in culture (MN9D; new Suppl Fig.1D). This together makes us very confident that we are looking at a specific ORF1p signal. The band in Figure 2G is at the same height as the input and there are no other bands visible (except the heavy chain of the NeuN antibody, which at the same time is a control for the sorting). We added some explanatory text to the revised version of the manuscript in lines 120-124 and lines 253-256).

      Please note that in the IP of ORF1p shown in Fig6A, there is a double band as well, strongly suggesting that the lower band might be a truncated or processed form of ORF1p. As stated above, this double band has been detected in other studies (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells using an in-house generated antibody against mouse ORF1p. Thus, with either commercial or in-house generated antibodies in some mouse and human samples, there is a double band corresponding to full-length ORF1p and a truncated or processed version of it.

      We noticed that we have not added the references of the primary antibodies used in Western blot experiments in the manuscript, which was now corrected in the revised version.

      (3) The data showing a reduction in ORF1p expression in the aged mouse brain is confusing and maybe even misleading. Although there is an increase in the intensity of the ORF1p signal in ORF1p+ cells, the data clearly shows that fewer cells express ORF1p in the aged brain. If these changes indicate an overall loss or gain of ORF1p, expression in the aged brain is not resolved. Thus, conclusions should be more carefully phrased in this section. It is important to show the quantification of NeuN+ and NeuN- cells in young vs aged (not only the proportions as shown in Figure 3b) to determine if the difference in the number of ORF1p+ cells is due to loss of neurons or perhaps a sampling issue. More so, it would be essential to perform WB and/or proteomics experiments to complement the IHC data for the aged mouse samples.

      We thank the reviewer for this comment and we agree that the representation has been confusing, which is why we added data to Suppl Fig.5 (F-K) using a different representation. As suggested by the reviewer, in new Suppl Fig. 5F-K, we now show the number of ORF1p+, NeuN+ or NeuN- cells per mm2. These graphs indicate that the number per mm2 of ORF1p+ cells overall do not decrease significantly (with the dorsal striatum as an exception, but possibly due to technical limitations which we now discuss in the results section, line 332-335). Globally, there is thus no loss of ORF1p+ expressing cells. There is also no global nor region-specific decrease in the number of neuronal cells (NeuN+ per mm2) although proportions change (Suppl Fig 2E, confocal acquisitions), thus most likely due to a gain of non-neuronal cells in this region. Concerning Western blots on mouse brain tissues from young and aged individuals, we unfortunately ran into limits regarding tissue availability of aged mice.

      (4) The transcriptomic data presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). Given the read length and the unstranded sequencing approach, I would at least ask the authors to add genome browser tracks of the upregulated loci so that we can properly assess the clarity of the results. I would also suggest adding the mappability profile of the elements in question. In addition, since this manuscript focuses on ORF1p, it would be essential to document changes in protein levels (and not just transcripts) in the ageing human brain.

      We agree that there are limitations to the analysis of TEs with short read sequencing and we have added more text on this aspect in the revised version (results section) and highlighted the problem of limited and disequilibrated sample size in the discussion (line 638-644). The approaches shown in PMID: 38773348 & PMID: 37910626 or even a combination of them, would be ideal of course. However, here we re-analyzed a unique preexisting dataset (Dong et al, Nature Neuroscience, 2018; http://dx.doi.org/10.1038/s41593-018-0223-0), which contains RNA-seq data of human post-mortem dopaminergic neurons in a relatively high number of brain-healthy individuals of a wide age range including some “young” individuals which is rare in post-mortem studies. Such data is unfortunately not available with long read sequencing or any other more appropriate approach yet. Limitations are evident, but all limitations will apply equally to both groups of individuals that we compare. The general mappability profile of the full-length LINE-1 “UIDs” was shown in old Suppl Fig 6A. We have colorhighlighted now in new Suppl Fig 8C the specific elements in this graph. Most importantly, we have now used, as a condensate of suggestions by all reviewers, a combination of mappability score, post-hoc power calculation, visualization and correlation with adjacent gene expression in order to retain a specific locus with confidence or not. Using these criteria, we retained UID-68 (Fig 5D) which has a relatively high mappability score (Suppl Fig.8C) plus an overlap of umap 50 mappability peaks and read mapping when visualizing the locus in IGV (new Fig. 5E), very high post-hoc power (96.6%; continuous endpoint, two independent samples, alpha 0.05) and no correlation with adjacent gene expression per individual (Fig. 5F, G). Based on these criteria, we had to exclude UID-129, UID-37, UID-127 and UID-137, reinforcing the notion that a combination of quality control criteria might be crucial to retain a specific locus with confidence. This is now mentioned in the manuscript in the discussion in line 427430).

      We will not be able to document changes in protein levels in aged human dopaminergic neurons as we do not have access to this material. We have tried to obtain human substantia nigra tissues but were not able to get sufficient amounts to do laser-capture microdissection or FACS analyses, especially of young individuals. There are still important limitations to tissue availability, especially of young individuals, and even more so of specific regions of interest like the substantia nigra pars compacta affected in Parkinson disease.

      (5) More information is needed on RNAseq of microdissections of dopaminergic neurons from 'healthy' postmortem samples of different ages. No further information on these samples is provided. I would suggest adding a table with the clinical information of these samples (especially age, sex, and cause of death). The authors should also discuss whether this experiment has sufficient power. The human ageing cohort seems very small to me.

      This is a re-analysis of a published dataset (Dong et al, Nat Neurosci, 2018; doi:10.1038/s41593-018-0223-0), available through dbgap (phs001556.v1.p1). In this original article, the criteria for inclusion as a brain-healthy control were as follows:

      “…Subjects… were without clinicopathological diagnosis of a neurodegenerative disease meeting the following stringent inclusion and exclusion criteria. Inclusion criteria: (i) absence of clinical or neuropathological diagnosis of a neurodegenerative disease, for example, PD according to the UKPDBB criteria[47], Alzheimer’s disease according to NIA-Reagan criteria[48], or dementia with Lewy bodies by revised consensus criteria[49]; for the purpose of this analysis incidental Lewy body cases (not meeting clinicopathological diagnostic criteria for PD or other neurodegenerative disease) were accepted for inclusion; (ii) PMI ≤ 48 h; (iii) RIN[50] ≥ 6.0 by Agilent Bioanalyzer (good RNA integrity); and (iv) visible ribosomal peaks on the electropherogram. Exclusion criteria were: (i) a primary intracerebral event as the cause of death; (2) brain tumor (except incidental meningiomas); (3) systemic disorders likely to cause chronic brain damage.”

      We do not have access to the cause of death, but we have added available metadata as Suppl_Table 5 to the manuscript.

      We have performed a post-hoc power analysis (using the “Post-hoc Power Calculator” https://clincalc.com/stats/Power.aspx, which evaluates the statistical power of an existing study and added the results to the revision. Due to this analysis, we have indeed taken out Suppl Fig 7 as a whole which had shown data of three full-length LINE-1 loci (UID-37, UID-127 and UID-137) with low power (between 17-66% power). The locus shown in Fig. 5D of the UID-68) had a post-hoc power score of 96.6% which increases our confidence in this full-length LINE-1 element being upregulated in aged dopaminergic neurons. UID-129 had a post-hoc power score of 97%. However, visualization and mappability analysis of the UID-129 locus led us to exclude this UID.

      The post-hoc power analysis for L1HS and L1PA2 revealed a low power (28.4% and 32.8% respectively). We have added these results to the manuscript (line 359-362), but decided to keep the data in as this will hopefully be a motivation for future confirmation studies knowing that the availability of similar data from brain-healthy human dopaminergic neurons especially of young individuals will be low.

      (6) The findings in this manuscript apply to both human and mouse brains. However, the landscape of the evolutionarily young L1 subfamilies between these two species is very different and should be part of the discussion. For example, the regulatory sequences that drive L1 expression are quite different in human and mouse L1s. This should be discussed.

      Indeed, they are different. We have added a paragraph to the discussion (lines 539-548).

      (7) On page 3 the authors write: "generally accepted that TE activation can be both, a cause and consequence of aging". This statement does not reflect the current state of the field. On the contrary, this is still an area of extensive investigation and many of the findings supporting this hypothesis need to be confirmed in independent studies. This statement should be revised to reflect this reality.

      We agree, this is overstated, we have changed this sentence accordingly to:

      “It is now, 31 years after the initial proposition of the “transposon theory of aging” by Driver and McKechnie [14], still a matter of debate whether TE activation can be both, a cause and a consequence of aging [15,16].”

      Reviewer #2 (Public Review):

      Summary:

      Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals, and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS.

      Strengths:

      A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in the brain is also a strength in that it provides a novel dataset for future studies.

      Thank you for highlighting the strength of our study.

      Weaknesses:

      The main weakness of the study is that cell type specificity of ORF1p expression was not examined beyond neuron (NeuN+) vs non-neuron (NeuN-). Indeed, a recent study (Bodea et al. 2024, Nature Neuroscience) found that ORF1p expression is characteristic of parvalbumin-positive interneurons, and it would be very interesting to query whether other neuronal subtypes in different brain regions are distinguished by ORF1p expression.

      We agree that this point is important to address. We have mentioned in the manuscript our previous work, which showed that in the mouse ventral midbrain, dopaminergic neurons (TH+/NeuN+) express ORF1p and that these neurons express higher levels of ORF1p than adjacent non-dopaminergic neurons (TH-/NeuN+; Blaudin de Thé et al, EMBO J, 2018). Others have shown evidence of full-length L1 RNA expression in both excitatory and inhibitory neurons but much less expression in non-neuronal cells (Garza et al, SciAdv, 2023). Further, ORF1p expression was documented in excitatory (CamKIIa-positive) and CamKIIa-negative neurons in the mouse frontal cortex (Zhang et al, Cell Res, 2022, doi.org/10.1038/s41422-022-00719-6). We do detect ORF1p staining in mouse (Fig. 1B, panel 10) and human Purkinje cells (based on morphology and in accordance with data from Takahashi et al, Neuron, 2022; DOI: 10.1016/j.neuron.2022.08.011) and most probably basket cells (based on anatomical location in the molecular layer near Purkinje cells) of the cerebellum (Suppl Fig.4). Some Purkinje cells express PV in mice (https://doi.org/10.1016/j.mcn.2021.103650 and 10.1523/JNEUROSCI.22-1607055.2002), as do stellate and basket cells of the molecular layer (10.1523/JNEUROSCI.22-16-07055.2002). While ORF1p is expressed in PV cells of the hippocampus (Bodea et al, Nat Neurosci, 2024) and in the human and mouse cerebellum in PV-expressing neurons, it does not seem as if ORF1p expression is restricted to PV cells overall. To adress this question experimentally, we have now performed ORF1p stainings in different brain regions (hippocampus, cortex, hindbrain, thalamus, ventral midbrain and cerebellum) together with parvalbumin (PV) stainings and in some cases including the lectin WFA (Wisteria floribunda agglutinin, which specifically stains glycoproteins surrounding PV+ neurons). We have added this data to the manuscript as Suppl Fig.4. While PV-positive neurons often co-stain with ORF1p, not all ORF1p positive cells are PV-positive. We have also deepened the discussion of this aspect in the revised manuscript (line 579-599).

      The data suggesting that ORF1p expression is increased in aged mouse brains is intriguing, although it seems to be based upon modestly (up to 27%, dependent on brain region) higher intensity of ORF1p staining rather than a higher proportion of ORF1+ neurons. Indeed, the proportion of NeuN+/Orf1p+ cells actually decreased in aged animals. It is difficult to interpret the significance and validity of the increase in intensity, as Hoechst staining of DNA, rather than immunostaining for a protein known to be stably expressed in young and aged neurons, was used as a control for staining intensity.

      We have now separated the analysis of NeuN+, ORF1p+ and NeuN- cells (please see new Suppl Fig5F-K) which highlights the fact that there is indeed no change in the number of ORF1p+ cells in the young compared to the aged mouse brain. However, while neuronal cell numbers throughout the brain do not change significantly (new Suppl Fig.5F), while cell proportions in the ventral midbrain (confocal microscopy based quantifications) change, possibly due to a combination of a slight loss in neurons and a gain in non-neuronal cell numbers (Suppl Fig3E). Please also keep in mind that the ventral midbrain region on images taken on a confocal microscope are a much smaller region than the midbrain motor region as specified by ABBA on images taken by the slide scanner. A different marker than DNA as a control requires the use of a protein that is stably expressed throughout the brain and throughout age. We are not aware of a protein for which this has been established. To nevertheless try to address this issue, we used whole-brain imaging intensity data for the protein Rbfox3 (NeuN) which we originally used as a marker for cell identity. We have now added the quantifications of the protein Rbfox3 (NeuN) to Fig3 (new Fig3B). As shown in this figure, NeuN intensity is not stable from one individual to another, neither in control mice nor in the aged control group. Most importantly, NeuN staining intensity does not increase in aged mice. As we did not use NeuN intensity but presence or absence of NeuN as a marker for cell identity, the instability of NeuN intensity from one individual mouse to another does not have an influence on the data presented in this manuscript. It does indicate however, that the overall increase of ORF1p in aged mice is not a mere reflection of a general decrease in protein turnover. As stated above, the DNA staining with Hoechst controls for technical artefacts. Using Hoechst and NeuN as control, we have thus provided evidence for the fact that the increase in ORF1p intensity per cell is indeed specific for ORF1p. This is now added to the results section (line 299-301).

      The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue.

      As stated in the manuscript, the list of previously published datasets does include a mouse dataset with ORF1p interacting proteins in mouse spermatocytes (please see line 479-480: “ORF1p interactors found in mouse spermatocytes were also present in our analysis including CNOT10, CNOT11, PRKRA and FXR2 among others (Suppl_Table4).”) -> De Luca, C., Gupta, A. & Bortvin, A. Retrotransposon LINE-1 bodies in the cytoplasm of piRNA-deficient mouse spermatocytes: Ribonucleoproteins overcoming the integrated stress response. PLoS Genet 19, e1010797 (2023)). We indeed did not validate any interactors for several reasons (economic reasons and time constraints (post-doc leaving)). However, we feel that the significant overlap with previously published interactors highlights the validity of our data and we anticipate that this list of ORF1p protein interactors in the mouse brain will be of further use for the community.

      The authors achieved the goals of broadly characterizing ORF1p expression across different regions of the mouse brain, and identifying putative ORF1p interactors in the mouse brain. However, findings from both parts of the study are somewhat superficial in depth.

      This provides a useful dataset to the field, which likely will be used to justify and support numerous future studies into L1 activity in the aging mammalian brain and in neurodegenerative disease. Similarly, the list of ORF1p interacting proteins in the brain will likely be taken up and studied in greater depth.

      Reviewer #3 (Public Review):

      The question about whether L1 exhibits normal/homeostatic expression in the brain (and in general) is interesting and important. L1 is thought to be repressed in most somatic cells (with the exception of some stem/progenitor compartments). However, to our knowledge, this has not been authoritatively / systematically examined and the literature is still developing with respect to this topic. The full gamut of biological and pathobiological roles of L1 remains to be shown and elucidated and this area has garnered rapidly increasing interest, year-by-year. With respect to the brain, L1 (and repeat sequences in general) have been linked with neurodegeneration, and this is thought to be an aging-related consequence or contributor (or both) of inflammation. This study provides an impressive and apparently comprehensive imaging analysis of differential L1 ORF1p expression in mouse brain (with some supporting analysis of the human brain), compatible with a narrative of non-pathological expression of retrotransposition-competent L1 sequences. We believe this will encourage and support further research into the functional roles of L1 in normal brain function and how this may give way to pathological consequences in concert with aging. However, we have concerns with conclusions drawn, in some cases regardless of the lack of statistical support from the data. We note a lack of clarity about how the 3rd party pre-trained machine learning models perform on the authors' imaging data (validation/monitoring tests are not reported), as well as issues (among others) with the particular implementation of co-immunoprecipitation (ORF1p is not among the highly enriched proteins and apparently does not reach statistical significance for the comparison) - neither of which may be sufficiently rigorous.

      Thank you for your comments on our manuscript.

      We have addressed the concerns about the machine learning paradigm (see Author response image 1). Concerning the co-IP-MS, we can confirm that ORF1p is among the highly enriched proteins as it was not found in the IgG control (in 5 independent samples), only in the ORF1p-IP (in 5 out of 5 independent samples). This is what the infinite sign in Suppl Table 2 indicates and this is why there is no p-value assigned as infinite/0 doesn’t allow to calculate a pvalue. We have made this clearer in the revised version of the manuscript and added a legend to Suppl Table 2.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors remove the human data and expand the analysis of the aged mice. This would most likely result in a much stronger manuscript.

      We do think that the imaging data and the Western blots are convincing (please also see our detailed response above to the criticism concerning the antibody we used and the newly added data) and very much reflects what we find in the mouse brain, i.e. concerning the percentage of neurons expressing ORF1p and the percentage of ORF1p+ cells being neuronal. When it comes to the transcriptomic data on aged dopaminergic neurons, we have further discussed the limitations of this study in the revised manuscript and hope that the findings inspire others in the field to redo these types of analyses using the now state-of-the-art NGS technologies to address the question and validate what we have found.

      Reviewer #2 (Recommendations For The Authors):

      The characterization of ORF1p expression across the mouse brain would be vastly more informative if cell identity was established beyond NeuN+/NeuN---the neuronal predominance of L1 activity in the brain has long been observed. Indeed, even corroboration of the PV+ interneuron signature previously reported would both lend credence to the present study and provide valuable confirmation to the field.

      We agree. Please see our response above as well as the new experimental data we added (Suppl Fig5.F-K).

      The increased intensity (but not prevalence in terms of % of Orf1p positive cells) of Orf1p expression in aged mouse brains would be more convincing with further context and perhaps better controls. Is overall protein turnover in aged neurons simply slower than in neurons from younger brains? Immunostaining with another protein marker, rather than Hoescht staining of DNA, to demonstrate that increased staining intensity is unique to Orf1p, would make this result more compelling.

      To address this question, we have now added the quantifications of the protein Rbfox3 (NeuN) to Fig3 (Fig. 3B). As shown in this figure, NeuN intensity is not stable from one individual to another, neither in control mice nor in the aged control group. As we did not use NeuN intensity but presence or absence of NeuN as a marker for cell identity, this does not have any influence on the data presented in this manuscript. It does indicate however, that the overall increase of ORF1p in aged mice is not a mere reflection of a general decrease in protein turnover. As stated above, the DNA staining with Hoechst controls for technical artefacts. Using Hoechst and NeuN as control, we have thus provided evidence for the fact that the increase in ORF1p intensity per cell is indeed specific for ORF1p.

      Western blotting on cell lysates from aged vs young NeunN+ sorted cells would also strengthen this conclusion, although I appreciate the technical challenge of physically isolating whole mature neuronal cells.

      Indeed, this would be feasible but only after FACS sorting, which is technically challenging on whole brain cells (less so on nuclei). We unfortunately do not have the possibility to embark on this right now.

      Concerning data presentation, Figure 3A would be much more informative if the graph was broken down to show the proportion of ORF1p+ and ORF1p- cells, regardless of NeuN status, and the proportion of NeuN+ and NeuN- cells shown independently of Orf1p status. It is difficult to ascertain the relationship of either of these variables to age, as the graph is presented now.

      We followed the suggestions of the reviewer agreeing that breaking down this figure into either ORF1p+ or NeuN+ or NeuN- cells without double attribution is easier to interpret. However, we also chose to use cell densities (cell numbers/ per mm2) to represent the data (new Suppl Fig.5F-K) which is even more precise while proportions are now shown in Suppl Fig.3A-E. Indeed, while it is important to realize that the variables ORF1p+/- or NeuN+/- are not completely independent of each other (as shown in proportions of old Fig4A and B, new Suppl Fig3A and B) as they form four categories (NeuN+/ORF1p+; NeuN+/ORF1p-. NeuN-/ORF1p+, NeuN-/ORF1p-), we can see from the data that there is no overall change in neuron number in the mouse brain between 3 month and 16 months of age. There isn’t an overall change of the density of ORF1p+ cells nor NeuN- cells in the mouse brain with the exception of a decrease in cell density of ORF1p-positive cells in the dorsal striatum accompanied by an increase in non-neuronal cell density (but as discussed above and in the manuscript (line 332-337), this might be due to technical limitations). Thus, while ORF1p intensities per cell increase significantly in older mice, here is no significant change in ORF1p+ cell number.

      Reviewer #3 (Recommendations For The Authors):

      (1) According to the description in Materials and Methods on the analysis of the confocal images (lines 731-743) the authors used Cell-Pose for both the nuclei and cell segmentation tasks, using model=cyto and diameter=30 for the first (nuclei) and model=cyto2 and diameter=40 for the second (cell). Description of analysis of sagittal brain regions (lines 746-764) indicates the pre-trained model DSB2018 from StarDist 2D was used for nuclei detection, and Cell-Pose using model cyto2 and diameter=30 for cell segmentation. Detected nuclei were then matched to segmented cell areas based on overlap criteria and each nucleus was labeled as 'positive' or 'negative' for either OFR1P or NEU-N.

      As described in its three publications (1, 2, 3), Cell-Pose as a segmentation tool is trained in different datasets, with cyto2 being trained on a more varied dataset than cyto. In their library they also offer a model specific for nuclei2. Some description and explanation on the reasons two different models were used for nuclei detection and not choosing the offered specific pre-trained model by Cell-Pose in either case.

      According to the cellpose library documentation "Changing the diameter will change the results that the algorithm outputs. When the diameter is set smaller than the true size then cellpose may over-split cells. Similarly, if the diameter is set too big then cellpose may over-merge cells.". It would be useful to offer the justification of the pixels chosen for the analysis (possibly average pixel counts in a subsample of Hoechst images).

      Answers to questions 1-5:

      Regarding ABBA, slices were first positioned and oriented manually along the Z-axis, without using DeepSlice. Automated affine registration was then applied in the XY plane, followed by manual refinement. 1 slice per mouse brain, 4 mouse brains per condition.

      Regarding the gradient heatmap, as stated in the figure legend of Fig3F; Represented is the fold-change in percent (aged vs young) of the “mean of the mean” ORF1p expression per ORF1p+ cell quantified mapped onto the nine different regions analyzed. More precisely, the heatmap shows the percentage increase in the mean of all mean cell intensities in the aged condition, normalized to the mean of all mean cell intensities in the young condition. The pre-trained models and hyperparameters were selected based on their optimal performance across our image datasets. For slide scanner images, the StarDist DSB 2018 model was chosen over a Cellpose model because it more effectively avoided detecting out-of-focus nuclei, which were common in slide scanner images due to the lack of optical sectioning. This issue was not present in confocal images, where Cellpose cyto model was used instead. To assess the performance of each model and diameter setting, we computed the average precision (AP) metric, which is defined as AP = TP/(TP+FP+FN), where TP = true positives, FP = false positives, and FN = false negatives. The AP was calculated at the commonly used Intersection over Union (IoU) threshold of 0.5. For confocal images, Cellpose models and hyperparameters were evaluated on eight images per channel, capturing intensity variability across different mouse ages and brain regions. A total of approximately 2,000 nuclei and 1,000 NeuN and ORF1p cells were manually annotated. The AP values at an IoU threshold of 0.5 were: 0.995 for nuclei, 0.960 for NeuN, and 0.974 for ORF1p cells. These high AP values confirm that the selected models and diameter settings were well-suited for analyzing the entire dataset. For slide scanner images, nuclei and cell detection were evaluated on 14 images per channel, with approximately 800 nuclei and 400 NeuN and ORF1p cells manually annotated. The AP values were lower compared to confocal images, mainly due to a lower signal-to-noise ratio, which led to an increased number of false positives and false negatives: 0.806 for nuclei, 0.675 for NeuN, and 0.695 for ORF1p cells. This decline in performance was expected given the challenges posed by slide scanner images, including background noise and out-of-focus objects. Notably, the observed false positives primarily correspond to small-sized nuclei/cells or those with low intensity, which evade the stringent filters that were applied. While fine-tuning the models could further enhance detection robustness, we considered that the selected models and diameter settings were suitable for processing the entire dataset.

      We added a paragraph to the materials & methods section with this new information; for confocal images (line 847-855), slide scanner images (line 878-885).

      Author response table 1.

      (2) Next to no information is offered regarding the brain segment registration and how the results were analyzed: The ABBA plug-in has two modules manual and automatic, via a DL pre-trained model called DeepSlice. The authors should report which mode of ABBA they used, how many slices per mouse brain, and how many brains. Moreover, there is no explanation of how the gradient heatmap of the brain regions (Figure 3G) was calculated.

      Please see above

      (3) Even the best algorithms produce some False predictions. In this application of the (3rd party) cellpose, StarDist, and ABBA pre-trained models, such cases of wrong predictions would have amplified downstream effects on the analysis e.g., wrongly characterizing certain cells as 'negative' (falsely not detected cell, falsely detected nucleus), or worse, biasing against certain cell subgroups (falsely not detected 'type' of nuclei). This is even more troubling with the variety of models used for the nuclei segmentation task, and the parameters in each. It is possible the authors performed optimizations and reported exactly such optimized values for their dataset, they should however still explicitly offer these detailed validation and optimization processes. The low statistical significance throughout the quantified results from these IF experiments (Figures 1-3) is also a cause for needing an explicit description of how these algorithms perform on the authors' data.

      It is good practice that a pre-trained model when applied to a new dataset like the one that the authors produced for this work, would require basic monitoring for how it performs in the new, previously unseen dataset, even when the model's generalizability has been reported previously as great. It would be best if the authors had handannotated a few images as the validation set and produced some model performance metrics as a supplemental table for all pre-trained models they used, in the datasets they used them at. Alternatively, the authors are offered the ability by the cellpose team to fine-tune the model for their data, and this could be used to perform the experiments for this work instead if the performance metrics of the used cellpose (cyto and cyto2) models prove to be poor.

      Please see above

      (4) The legend for Figure 1A indicates that Cell-Pose was used for cell detection and StarDist for nuclei detection in the confocal images (line 960). This needs clarification and correction.

      Please see above

      (5) Some explanation of why the models used were changed when using confocal or the slide scanner microscope would be nice.

      Please see above

      (6) The legend title of Figure 3 (line 1040) "Fig. 3: ORF1p expression is increased throughout the whole mouse brain in the context of aging" is misleading as half the panels in the figure demonstrate a decrease in ORF1pexpressing cells. The two can be both true, but in a more nuanced relationship. A more modest representation of the data in the title is also warranted by the unimpressive statistical significance achieved (notably with no correction for multiple testing, which would further inflate them).

      We have toned down the tile of Fig. 3 to “ORF1p expression is increased in some regions of the aged mouse brain” while leaving its meaning as globally. There is indeed no significant loss of ORF1p expressing cells (Suppl Fig. 5F; except in the dorsal striatum (Supl Fig. 5I, please see also discussion above), but there is a significant increase in ORF1p intensity per cell overall (Fig. 3A,C,F) and in several regions of the mouse brain (Fig E, G and H).

      (7) Figure 4 suffers for significance. For example in panel A, the few genes with the highest -log10P value, ie above 1.3 (p-value of ~0.05) have a log2-fold change of 0.2-0.3 (fold change 1.14-1.23). There are no hits with even the modest log2-fold change of 0.5 (fold-change 1.4). The big imbalance between young/old samples for these RNA seq experiments (6 vs 36 mice) could be an issue here too.

      The reviewer refers to mouse samples (“6 to 36 mice”), but this is data of human post-mortem dopaminergic neurons from brain-healthy individuals which were laser-captured and sequenced as reported by Dong et al, Nat Neurosci, 2018. There is indeed a big imbalance between young and old samples which are linked to the difficulties in availability of brain-healthy post-mortem tissues from young individuals which are obviously much rarer than from older people. We agree that the fold-enrichment are modest and p-values rather high, but we argue to keep this data in as it is based on rare post-mortem human brain tissues which were difficult to obtain and will be very difficult to obtain in sufficient number in future studies. We hope however, that these results will encourage such studies in the future and motivate researchers to further look into the expression of TEs in aging brain tissues with higher sample sizes and more suitable sequencing techniques. We have now in the revised version toned down some sentences (i.e. line 359: modest, but significant increase in several young…) and have now also added a post-hoc power analysis (results section line 359-362: “There was a modest but significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the “name” level (Fig. 4A, B), an analysis which was however underpowered (post-hoc power calculation; L1HS: 28.4%; L1PA2: 32.8%) and thus awaits further confirmation in independent studies.”)

      (8) Figure legend 4C (line 1088) should offer more explanation on what is compared for these correlations: the young vs old results, all intensities of all experiments, and intensities separately for each sample.

      We have added the missing information to Figure legend 4C (line 1209-1215): “Correlation of the RNA expression levels of LINE-1 elements with known transposable element regulators in human dopaminergic neurons (all ages included). What was compared are the expression levels of LINE-1 elements with known regulators of TEs for each individual sample, all ages included.”

      (9) Figure 5, panel D. The regressions are all driven by 1-2 outliers. Should be removed as they don't add anything.

      We agree and therefore have performed an outlier test (ROUT (Q=1%) and identified outliers (1 in each graph) have been taken out from the analysis. We argue that the information of a non-correlation of UID-68 and adjacent gene expression is important as it rules out a dependency of expression of the full-length LINE-1 depending on neighboring gene expression (see new Fig5E-G).

      (10) Figure 6 panel B. It is unexpected that the GO terms with the highest enrichment also show weak significance and vice-versa. Fold enrichment in the PANTHER tool is defined as the % of GO-term genes in the sample divided by the %GO-term genes in the background (organism).

      This is not unexpected as GO terms contain different numbers of proteins. Indeed, the significance can be different if the GO term contains for example 3 or 300 proteins. A GO term containing only few proteins with a high fold change between the conditions (here: ORF1p-IP vs whole mouse genome) will lead to a rather low significance for example. If you look at the last 6 categories in Fig 6B, you can appreciate that they have very similar values for enrichment but very different significance levels (FDR).

      (11) Many citations in the References sections are referred to by doi and "Published online" date. These should be corrected to include the citation in standard format (journal name, volume, issue, pages, etc).

      We apologize for this and have corrected this in the revised version.

      (12) (line 970) Legend of Figure 1 is missing label referencing panel C (ie (C) Bar plot showing the total....).

      Thank you for pointing this out, this has been corrected.

      (13) The bottom violin plot in Figure 1C lacks sufficient explanation (what are the M1-4 categories?). The same problem with panel G (same Figure 1).

      This has now been better explained. The M1-M4 categories denominate individual mice numbered from 1 to 4 for (results are shown per individual).

      -> specified in line 1098-1099 (Fig.1C) and new text (1117-1118: Fig.1G): Four three-month-old Swiss/ OF1 mice (labeled as M1 to M4) are represented each by a different color, the scattered line represents the median. ****p<0.0001, nested one-way ANOVA. Total cells analyzed = 4645

      (14) Figure 1B; confocal image 2 (Hippocampus) does not seem to tell the same story as the main slide scanner image. Overall, more explicit phrasing regarding how the Images in Figure 1B are not blow-outs of the bigger one but different, confocal images of the same regions.

      We have changed the sentence to: “Representative images acquired on a confocal microscope of immunostainings showing ORF1p expression (orange) in 10 different regions of the mouse brain.”, which hopefully helps to indicate that these images are indeed not blow-outs of the slide scanner image.

      (15) Young are defined as 3 months and 'old' as 16 months mice. 16-month group name would be better as "adults". Example of age range considered 'old': "Young (3-6-month-old) and aged (18-27-month-old) male mice were age- and source-matched for each experiment." https://www.cell.com/cell-metabolism/fulltext/S1550-4131(23)00462X?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS15504131230 0462X%3Fshowall%3Dtrue

      This is true, but the 16-month age group does not have a designation when looking at Mouse Life history stages in C57Bl/6 mice from the Jackson laboratory (see https://www.jax.org/news-and-insights/jax-blog/2017/november/when-are-mice-considered-old#), they are neither middle-aged nor old. We therefore believe that the designation as “aged” still holds true.

      (16) Lines 63-65 > To our understanding, both ORF1 and ORF2 proteins are thought to exhibit cis preference.

      Yes, that is true, but the sentence as it is does not make a claim about ORF2p not having cis-preference.

      (17) Figure 1I is only referred to as "Figure I". Twice. Page 8, line 173 & 176.

      Thank you, has been corrected.

      (18) Lines 178-182 >To investigate intra-individual expression patterns of ORF1p in the post-mortem human brain, we analyzed three brain regions of a neurologically healthy individual (Figure 1J) by Western blotting. ORF1p was expressed at different levels in the cingulate gyrus, the frontal cortex, and the cerebellum underscoring a widespread expression of human ORF1p across the human brain." > It is difficult for us to gauge how believable the blots are without knowing the amount of protein loaded.

      We have loaded 10ug of tissue lysate per lane (tissue pulverized with a Covaris Cryoprep; amount now mentioned in the materials & methods section). We have added some more information on the antibody in the revised manuscript (line 183-194).

      We say this from our experience conducting similar blots of anti-ORF1p IPs from human brain tissues using the same antibody (4H1) without successful detection of enriched protein by western blot (of course there can be many reasons for that, but knowing the amount of protein loaded is important for reproducibility). In addition, we find the "double" ORF1p bands they see in almost every blot atypical.

      In our hands, the 4H1 antibody does not work well on Western blots, but it immunoprecipitates well and works very well on immunostainings. However, the abcam AB 245249 works well for Western blotting (and IPs) which is why we used this antibody for these applications, respectively. As described above, there is evidence that the double band is not atypical, but rather frequent, which we now also mention in the revised manuscript line 183191: “To investigate intra-individual expression patterns of ORF1p in the post-mortem human brain, we analyzed three brain regions of a neurologically-healthy individual (Fig. 1J, entire Western blot membrane in Suppl Fig. 2A) by Western blotting using a commercial and well characterized antibody which we further validated by several means. The double band pattern in Western blots has been observed in other studies for human ORF1p outside of the brain (Sato et al, SciRep, 2023, McKerrow et al, PNAS, 2022) as well as for mouse ORF1p (Walter et al, eLife, 2016). We also validated the antibody by immunoprecipitation and siRNA knock-down in human dopaminergic neurons in culture (differentiated LUHMES cells, Suppl Fig. 2B and 2C) where we detect however in most cases the upper band only. The nature of the lower band is unknown, but might be due to truncation, specific proteolysis or degradation. ORF1p was expressed at different levels in the human post-mortem cingulate gyrus, the frontal cortex and the cerebellum underscoring a widespread expression of human ORF1p across the human brain. This was in accordance with ORF1p immunostainings of the human post mortem cingulate gyrus (Fig. 2H and Suppl Fig. 2E) and frontal cortex (Suppl Fig. 2E), with an absence of ORF1p staining when using the secondary antibody only (Suppl Fig. 2E).”

      In some images a band is labeled as IgG heavy chain (e.g. presumably from the FACS, Figure 2G, and IP, Figure 6A - which could contain residual antibody) - however, this is avoidable by using a different antibody for capture than detection - which also helps reduce false positive results.

      Unfortunately, we have only an antibody raised in rabbit available to perform IPs and Western blots on mouse tissues and therefore cannot avoid the detection of the IgG heavy chain.

      Aside from these, there seem to be persistent 'double bands' in the region of ORF1p. Generally, we are unaccustomed to seeing such 'double bands' in human anti-ORF1p western blots and IP-western blots, and since, in this study, this is seen in both mouse and human blots, it raises some doubts. Having the molecular mass ladder on each blot to at least allow for the assessment of migration consistency and would therefore be very helpful.

      We have added the molecular weights on the Western blots (Fig.1H, Fig. 2G and Suppl Fig.1D and E). As discussed also above, there is accumulating evidence that in some tissues, there are persistent double bands detected using ORF1p antibodies in both, mouse and human tissues.

      Human ORF1p detection:

      We have validated the antibody against human ORF1p (Abcam 245249-> https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr22227-6-ab245249), which we use for Western blotting experiments (please see Fig1J and new Suppl Fig.2A,B and C), by several means.

      (1) We have done immunoprecipitations and co-immunoprecipitations followed by quantitative mass spectrometry (LC-MS/MS; data not shown as they are part of a different study). We efficiently detect ORF1p in IPs (Western blot now added in Suppl Fig2B) and by quantitative mass spectrometry (5 independent samples per IP-ORF1p and IP-IgG: ORF1p/IgG ratio: 40.86; adj p-value 8.7e-07; human neurons in culture; data not shown as they are part of a different study). We also did co-IPs followed by Western blot using two different antibodies, either the Millipore clone 4H1 (https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone- 4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F) or the Abcam antibody to immunoprecipitate and the Abcam antibody for Western blotting on human brain samples. Indeed, the Millipore antibody does not work well on Western Blots in our hands. We consistently revealed a double band indicating that both bands are ORF1p-derived. We have added an ORF1p IP-Western blot as Suppl Fig. 2B which clearly shows the immunoprecipitation of both bands by the Abcam antibody. Abcam also reports a double band, and they suspect that the lower band is a truncated form (see the link to their website above). ORF1p Western blots done by other labs with different antibodies have detected a second band in human samples

      • Sato, S. et al. LINE-1 ORF1p as a candidate biomarker in high grade serous ovarian carcinoma. Sci Rep 13, 1537 (2023) in Figure 1D

      • McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 119, e2115999119 (2022)) showing a Western blot of an inducible LINE-1 (ORFeus) detected by the MABC1152 ORF1p antibody from Millipore Sigma in Figure 7 - Walter et al. eLife 2016;5:e11418. (DOI: 10.7554/eLife.11418) in mouse ES cells with an antibody made inhouse (gift from another lab; in Figure 2B)

      The lower band might thus be a truncated form of ORF1p or a degradation product which appears to be shared by mouse and human ORF1p. We have now mentioned this in the revised version of the paper (lines 183-189).

      (2) We have used the very well characterized antibody from Millipore ((https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F)) for immunostainings and detect ORF1p staining in human neurons in the very same brain regions (Fig 2H, new Suppl Fig. 2E) including the cerebellum in the human brain. We added a 2nd antibody-only control (Suppl Fig. 2E).

      (3) We also did antibody validation by siRNA knock-down. However, it is important to note, that these experiments were done in LUHMES cells, a neuronal cell line which we differentiated into human dopaminergic neurons. In these cells, we only occasionally detect a double band on Western blots, but mostly only reveal the upper band at ≈ 40kD. The results of the knockdown are now added as Suppl Fig. 2C.

      Altogether, based on our experimental validations and evidence from the literature, we are very confident that it is indeed ORF1p that we detect on the blots and by immmunostainings in the human brain.

      Mouse ORF1p detection: In line 117-123 of the manuscript, we had specified “Importantly, the specificity of the ORF1p antibody, a widely used, commercially available antibody [18,34–38], was confirmed by blocking the ORF1p antibody with purified mouse ORF1p protein resulting in the complete absence of immunofluorescence staining (Suppl Fig. 1A), by using an inhouse antibody against mouse ORF1p[17] which colocalized with the anti-ORF1p antibody used (Suppl Fig. 1B, quantified in Suppl Fig. 1C), and by immunoprecipitation and mass spectrometry used in this study (see Author response image 1)”.

      Figure 2G shows a Western blot using an extensively used and well characterized ORF1p antibody from abcam (mouse ORF1p, Rabbit Recombinant Monoclonal LINE-1 ORF1p antibody-> (https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr21844-108-ab216324; cited in at least 11 publications) after FACS-sorting of neurons (NeuN+) of the mouse brain. We have validated this ORF1p antibody ourselves in IPs (please see Fig 6A) and co-IP followed by mass spectrometry (LC/MS-MS; see Fig 6, where we detect ORF1p exclusively in the 5 independent ORF1p-IP samples and not at all in 5 independent IgG-IP control samples, please also see Suppl Table 2). In this analysis, we detect ORF1p with a ratio and log2fold of ∞ , indicating that this proteins only found in IP-ORF1p samples (5/5) and not in the IP-control samples ((not allowing for the calculation of a ratio with p-value), please see Suppl Table 2)

      In addition, we have added new data showing the entire membrane of the Western blot in Fig1H (now Suppl Fig.1E) and a knock-down experiment using siRNA against ORF1p or control siRNA in mouse dopaminergic neurons in culture (MN9D; new Suppl Fig.1D). This together makes us very confident that we are looking at a specific ORF1p signal. The band in Figure 2G is at the same height as the input and there are no other bands visible (except the heavy chain of the NeuN antibody, which at the same time is a control for the sorting). We added some explanatory text to the revised version of the manuscript in lines 120-124 and lines 253-256).

      Please note that in the IP of ORF1p shown in Fig6A, there is a double band as well, strongly suggesting that the lower band might be a truncated or processed form of ORF1p. As stated above, this double band has been detected in other studies (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells using an in-house generated antibody against mouse ORF1p. Thus, with either commercial or in-house generated antibodies in some mouse and human samples, there is a double band corresponding to full-length ORF1p and a truncated or processed version of it.

      We noticed that we have not added the references of the primary antibodies used in Western blot experiments in the manuscript, which was now corrected in the revised version.

      (19) Figure 1H, 1J, 6A: Show/indicate molecular weight marker.

      The molecular weight markers were added (please see Fig.1H, Fig. 2G and Suppl Fig.1D and E).

      (20) Page 10, line 223. " ...expressing ORF1p and ORF1p"?

      Thank you, this was corrected.

      (21) Lines 279-280 "An increase of ORF1p expression was also observed in three other regions albeit not significant." > This means it is not distinguishable as a change under the assumptions and framework of the analysis; please remove this statement.

      We agree, we removed this sentence.

      (22) Page 13, line 301. Labeling the group with a mean age of 57.5 as "young" might be a bit misleading.

      This is why we put the “young” in quotation marks.

      (23) Lines 309-311 "however there was a significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the "name" level (Figure 4A, B)". > Effect size is tiny; is this really viable as biologically significant? Maybe just remove the volcano plot? Does panel A add anything not covered by B?

      We would like to keep the Volcano plot, even though effect sizes are small (which we acknowledge in the manuscript line 359-362: “There was a modest but significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the “name” level (Fig. 4A, B), an analysis which was however underpowered (posthoc power calculation; L1HS: 28.4%; L1PA2: 32.8%) and thus awaits further confirmation in independent studies.” The reason for this decision is to illustrate a general increase in expression (even with a small effect size) of several LINE-1 elements at the name level with the youngest LINE-1 elements being amongst those with the highest effect.

      (24) Lines 327-328 "The transcripts of these genes showed, although not statistically significant, a trend for decreased expression in the elderly (Supplementary Figure 5D-G). > I do not recommend doing this.

      We agree and take it out.

      (25) Lines 339-342 "While several tools using expectation maximization algorithms in assigning multi-mapping reads have been developed and successfully tested in simulations 48,54, we used a different approach in mapping unique reads to the L1Base annotation of full-length LINE-1" > Generally, this section is not clear - what is the rationale for the approach (compared to the stated norms)? Ideally, justify this analytical choice and provide a basic comparison to other more standard approaches (even if briefly in a supplement).

      We thank the reviewer for his comment. Indeed, randomly assigning multi-mapping reads is usually a good strategy to quantify the expression of repeats at the family level (Teissandier et al. 2019) which we did in the first part of the analysis (class, family and name level). However, our main goal was to focus on specific single fulllength LINE elements which can encode ORF1p. We therefore decided to only use uniquely mapped reads, which is by definition the only way to be sure that a sequencing read really comes from a specific genomic location, and which will to not over-estimate their expression level. In this sense, we have added some explanatory text to this specific section. We also added a section to the discussion (line 638-644): This analysis has technical limitations inherent to transcriptomic analysis of repeat elements especially as it is based on short-read sequences and on a limited and disequilibrated number of individuals in both groups. Nevertheless, we tried to rule out several biases by demonstrating that mappability did not correlate with expression overall and used a combination of visualization, post-hoc power analysis and analysis of the mappability profile of each differentially expressed fulllength LINE-1 locus.

      (26) Page 16, line 389. The age span covered is 59 years although the difference in mean age between the two groups is only 25.5 years - please indicate both metrics.

      We have added this additional metric in line 432.

      (27) Lines 394-397 "Further, correlation analyses suggest that L1HS expression might possibly be controlled by the homeoprotein EN1, a protein specifically expressed in dopaminergic neurons in the ventral midbrain 50, the heterochromatin binding protein HP1, two known regulators of LINE-1, and the DNA repair proteins XRCC5/6." > This reads like a drastic reach unless framed explicitly as a 'tempting speculation' (or similar). I don't think this claim should be made as it is without further validation.

      We believe to have used careful language (“correlation analysis suggests”.“might possibly be controlled”) in the results section as well as in the discussion (line 660-671): “Matrix correlation analysis of several known LINE-1 regulators, both positive and negative, revealed possible regulators of young LINE-1 sequences in human dopaminergic neurons. Despite known and most probable cell-type unspecific regulatory factors like the heterochromatin binding protein CBX5/HP1 [51] or the DNA repair proteins XRCC5 and XRCC6 [49], we identified the homeoprotein EN1 as negatively correlated with young LINE-1 elements including L1HS and L1PA2. EN1 is an essential protein for mouse dopaminergic neuronal survival [50] and binds, in its properties as a transcription factor, to the promoter of LINE-1 in mouse dopaminergic neurons [17]. As EN1 is specifically expressed in dopaminergic neurons in the ventral midbrain, our findings suggests that EN1 controls LINE-1 expression in human dopaminergic neurons as well and serves as an example for a neuronal sub-type specific regulation of LINE-1.” To this we added: “Although these proteins are known regulators of LINE-1, this correlative relationship awaits experimental validation.”

      (28) Mouse protein/gene names are all capital letters on page 17/18. Changes on page 18/19. This should be consistent.

      Thank you, this has been corrected (all capital).

      (29) Page 23, line 559. The estimated ORF1p/ORF2p ratio referenced is based on an overexpression of L1 from a plasmid (ref87). > It should be made clear to the reader that it is still unknown whether such a ratio is representative of native conditions.

      OK, this is indeed true. Thank you for pointing this out. (line 621-622)

      (30) Lines 613-616 "Further, GO term analysis contained expected categories like "P-body", mRNA metabolism related categories, and "ribonucleoprotein granule". We also identified NXF1 as a protein partner of ORF1p, a protein found to interact with LINE-1 RNA related to its nuclear export 89." > There is no reason to speculate that the proteins in the pulldown are specific to L1 RNAs.

      We did not speculate that the proteins in the pulldown are specific to LINE-1 RNA. We just mentioned that NXF1 was an ORF1p protein partner and that it had been found previously as a LINE-1 RNA interactor.

      ORF1p is present in large heterogeneous assemblies - not every protein should be assigned an L1-related function and many proteins will be participating in general RNA-granule functions (given L1 ORFs are known to accumulate in such structures). Moreover, the granules are not the same in every cell type. IP is done in low salt and overnight incubation (poorly controlled for non-specific accumulation).

      We state that these key interactors are “probably” essential for completing or repressing the LINE-1 life cycle. It is true that we cannot affirm this. We therefore added a sentence to the discussion (line 679): “This supports the validity of the list of ORF1p partners identified, although we cannot rule out the possibility that unspecific protein partners might be pulled down due to colocalization in the same subcellular compartment.”

      (31) Lines 629-631" These results complete the picture of the post-transcriptional and translational control of ORF1p and suggest that these mechanisms, despite a steady-state expression, are operational in neurons." > Stating that these results complete the picture, which is still very much open for completion (granted, these results add to the picture), is an unneeded over-reach.

      We agree. We changed “complete” to “add to “ the picture.

      (32) Lines 641-644 "Finally, we found components of RNA polymerase II and the SWI/SNF complex as partners of ORF1p. This further indicates that ORF1p has access to the nucleus in mouse brain neurons as described for other cells 95,96, implying that ORF1p potentially has access to chromatin." > There is no way to know if this is a post-lysis effect - we have no real specificity information. The mock IP control is insufficient for this conclusion without further validation.

      We added: “however a bias due to a post-lysis effect cannot be excluded.” Line 711

      (33) ab216324 for IF and ab245122 for IP - why? What is the difference? Both are rated equally for IF and IP - please provide a rationale for reagent selection and use.

      These two antibodies are the same except their storage buffer. ab245122 is azide and BSA-free, while ab216324 contains the preservative sodium azide (0.01%) and the following constituents: PBS, 40% Glycerol (glycerin, glycerine), 0.05% BSA. As azide and BSA can affect coupling of antibodies to beads, antibodies which do not contain these components in their buffer are preferred for IPs (but can be stored less long).

      (34) Page 35, line 862. "1.3 x 105" should be "1.3 x 105".

      We added a regular x but we are not sure if this is what the reviewer was referring to ?

      (35) MS comparison in Figure 6. Why is the comparison not being made between young vs. old brain/neurons? This would be more informative instead of just showing what they IP over a mock IgG control and the comparison would track better with other experiments in the rest of the paper.

      Yes, that is true. However, we did not do this at the time as we did not have old mouse brain tissue available. Services from official animal providers in France have unfortunately only recently expanded their offer with regard to the availability of aged animals.

      (36) Supplementary Table 2 (MS data) is lacking information. How many peptides (unique/total) were discovered for each protein? Why are all ratios and p-values not listed for every protein in the table? LFQ protein intensity values should also be listed. Each supplementary table should have a legend as a separate tab in the document.

      As stated in the SupplTable2 and now made clearer in an independent tab file in SupplTable2 which contains a legend to the table, some proteins do not have associated p values and ratios as these proteins are found only in the ORF1p IP and not in the IgG control. This is why these proteins have an indefinite sign instead of a foldenrichment and no p-value assigned as we cannot calculate a ratio with X/0 which again makes it impossible to obtain a p-value. Concerning the absence of LFQ protein intensity values, as stated in the materials & methods section, we did not use these values (linear model) but instead the intensity values of the peptides: “The label free quantification was performed by peptide Extracted Ion Chromatograms (XICs), reextracted by conditions and computed with MassChroQ version 2.2.21 109. For protein quantification, XICs from proteotypic peptides shared between compared conditions (TopN matching) with missed cleavages were used. Median and scale normalization at peptide level was applied on the total signal to correct the XICs for each biological replicate (n=5). To estimate the significance of the change in protein abundance, a linear model (adjusted on peptides and biological replicates) was performed, and p-values were adjusted using the Benjamini–Hochberg FDR procedure.”

      The number of peptides unique/total for each protein has been added to Suppl_Table2 along other available information.

      (37) Poor overlap in 6C could in part be explained by the use of different sample/tissue types, but more likely the big difference could come from the very different conditions at which the IPs were performed (buffers and incubation times etc.).

      The overlap seems poor, but nevertheless is bigger as by chance (representation factor 2.6, p<5.4e-08). We agree that this can be in part explained by different experimental conditions which we now added to the discussion (line 478: “However, differences in experimental conditions could also influence this overlap.”)

      (38) Figure 6D is a very uninspiring representation of the data. What is the point of showing several binary interactions? Was the IgG control proteome also analyzed? Have proteins displayed in Figure 6 been corrected for that?

      The point of showing these interactions is that OFR1p interacts with clustered proteins. ORF1p interacts with proteins that belong to specific GO terms (Fig6b), but these proteins are also interacting with each other more than expected (Fig6C). This is the benefit of showing a STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) representation, which is a database of known and predicted protein–protein interactions. Indeed, proteins in Fig6 have been corrected for the IgG proteome. We only show proteins that were enriched or uniquely present in the ORF1p IP condition compared to the IgG control (please see Suppl_Table2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1: Indirect Estimates of White Matter Connections: While dMRI is a valuable tool, it inherently provides indirect and inferred information about neural pathways. The accuracy and specificity of tractography can be influenced by various factors, including fiber crossing, partial volume effects, and algorithmic assumptions. A potential limitation in the accuracy of indirect estimates might affect the precision of spatial extent measurements, introducing uncertainty in the interpretation of cortico-thalamic connectivity patterns. Addressing the methodological limitations associated with indirect estimates and considering complementary approaches could strengthen the overall robustness of the findings.

      We appreciate the reviewer’s comment and agree tractography is an indirect estimate and subject to limitations. Regarding this manuscript, the key question is not whether the anatomical tracts are without false positives or negatives, and in fact we argue that this question is outside the scope of this manuscript and has been addressed in several previous studies (e.g. Thomas et al. 2015, Schilling et al., 2020, Grisot et al. 2021, and many others). Instead, the key question for this manuscript is whether the focality of termination patterns within the thalamus is systematically biased in a way that the observation of a hierarchy effect is artifactual. The many supplementary analyses in this manuscript do help address this question and increase our confidence that the indirect nature of tractography does not systematically bias the EDpc1 measure such that association areas only appear to have more diffuse connectivity patterns relative to sensorimotor areas.

      Comment 2: An over-arching theme of my review is that, each time I found myself wondering about a detail, a null, or a reference, I had only to read the next sentence or paragraph to find my concern handled in a clear and concise fashion. This is, in my opinion, the mark of work of the highest order. I congratulate the authors on their excellent work, which I believe will be impactful and well-received.

      I have no notes that I feel can help improve what is already an impeccable piece of work.

      We thank the reviewer for the kind comment.

      Reviewer #2:

      Comment 1: Structural thalamocortical connectivity was estimated from diffusion imaging data obtained from the HCP dataset. Consequently, the robustness and accuracy of the results depend on the suitability of this data for such a purpose. Conducting tractography on the cortical-thalamic system is recognized as a challenging endeavor for several reasons. First, diffusion directions lose their clearly defined principal orientations once they reach the deep thalamic nuclei, rendering the tracking of structures on the medial side, such as the medial dorsal (MD) and pulvinar nuclei difficult. Somewhat concerning is those are regions that authors found to show diffuse connectivity patterns. Second, the thalamic radiata diverge into several directions, and routes to the lateral surface often lack the clarity necessary for successful tracking. It is unclear if all cortical regions have similar levels of accuracy, and some of the lateral associative regions might have less accurate tracking, making them appear to be more diffuse, biasing the results.

      As mentioned in the weakness section, it is crucial to address the need for better validation or the inclusion of control analyses to ensure that the results are not systematically biased due to known issues, such as the difficulty in tracking the medial thalamus and the potential for higher false positives when tracking the lateral frontal cortex.

      We thank that reviewer for bringing up an important point. To determine if some areas of the thalamus were more difficult to track and, in turn, biased the EDpc1 measure we added an additional supplemental figure (S31). In this figure, shown below, we calculate the total SC of all ipsilateral cortical areas to each thalamic voxel. We show that, indeed, medial thalamic voxels have a lower total streamline count to ipsilateral cortex, and we see reduced total streamline counts to lateral thalamic areas and the very posterior end of the thalamus. We determined if some cortical areas preferentially projected to parts of the thalamus with lower ipsilateral total SC (i.e. by calculating the overlap between SC and total cortical SC for each thalamic voxel) and found only a weak relationship with our measure. Furthermore, we regressed each voxel’s mean ipsilateral cortical SC from streamline count matrix. We found that the EDpc1 measure didn’t significantly change after the regression.

      Additionally, we note that this analysis assumes that all thalamic voxels should have equal strength of connectivity (i.e., total SC) to the ipsilateral cortex and that such a measure is a proxy for “accuracy.” While both of these assumptions may not be entirely valid, this figure does demonstrate that potential reductions in tracking from the medial thalamus does not significantly affect the EDpc1 measure.

      Comment 2: While the methodology employed by the authors appears to be state-of-the-art, there exists uncertainty regarding its appropriateness for validation, given the well-documented issues of false positives and false negatives in probabilistic diffusion tractography, as discussed by Thomas et al. 2014 PNAS. Although replicating the results in both humans and non-human primates strengthens the study, a more compelling validation approach would involve demonstrating the method's ability to accurately trace known tracts from established tracing studies or, even better, employing phantom track data. Many of the control analyses the authors presented, such as track density, do not speak to accuracy.

      In addition to or response to Reviewer 1 Comment 1, we would like to add the following:

      We agree with the reviewer that tractography methods have known limitations. We would also like to point out that several studies have already performed the studies suggested by the reviewer. Many studies have compared tracts reconstructed from diffusion data using tractography methods to tracer-derived connections (eg. Thomas et al., 2014, as mentioned by the reviewer; Donahue et al., 2016, J Neurosci; Dauguet et al., 2007 NeuroImage; Gao et al., 2013 PloS One; van den Heuvel et al., 2015, Hum Brain Map; Azadbakht et al., 2015 Cereb Cortex; Ambrosen et al., 2020 NeuroIamge). Notably, studies comparing tractography and tracer-derived white matter tracts in the same animal (e.g. Grisot et al., 2021; Gao et al., 2013 PloS One) have demonstrated that tractography errors may be inflated in studies comparing tractography and tracer-derived connections in different animals.

      Additionally, others have employed phantoms to assess the validity of tractography methods (e.g. Drobnjak et al., 2021). For the purposes of this manuscript, phantom data would not be an adequate control because phantom data would likely not capture the biological complexities of tracking subcortical white matter tracts and identifying projections within subcortical grey matter.

      While a comparison of our tractography-derived ED measure to ED calculated on terminations from tracer studies within the thalamus from several somatomotor and associative regions in macaques would provide additional confidence for our results, such a control is certainly outside the scope of this study. Additionally, such a study would not provide a ground truth comparison for the human data. Even if this hypothetical experiment was performed, a negative finding would not refute our results, as any differences could be attributed to evolutionary differences. Unfortunately, there exists no ground truth to compare human white matter connectivity patterns to, which is why we stress-tested our results in as many ways as possible. These stress tests revealed that our main findings are very robust.

      Specifically, as the key validity question of our study was whether there was a confound that systematically biased the ED measure as to make the hierarchy effect artifactual, the control analyses we performed to determine if track density, cortical geometry, bundle integrity, etc in fact do speak the robustness of the results. Regarding the track density analyses we argue that these control analyses do speaks to accuracy. The reviewer mentioned above that some cortical areas may be biased because their anatomical tracts may be more difficult to reconstruct using tractography. The mean streamline count is meant to reflect the density of a fiber bundle, but corticothalamic tracts that are more difficult to track will, by nature, have fewer streamline counts. So, the mean streamline not only reflects the density of a fiber bundle but also how easily that tract is to reconstruct. Therefore, if it was the case that cortical areas with more difficult to reconstruct white matter tracts to the thalamus are also more diffuse, then we should observe a strong positive correlation between the ED measure and the mean streamline count, which we tested directly and found only a weak correlation (Fig. S11). This is true for tracking to the entire thalamus, and the additional supplemental Figure S31 shows that reduced tracking to specific parts of the thalamus (e.g. the medial portion) also does not strongly relate to the ED measure. So, tracts that are more difficult to reconstruct may also be more diffuse, but this seems to add only a little noise and does not account for the strong relationship between the ED measure and T1w/T2w and RSFCpc1 measures the reflect the cortical hierarchy.

      Comment 3: If tracking the medial thalamus is indeed less accurate, characterized by higher false positives and false negatives, it could potentially lead to increased variability among individual subjects. In cases where results are averaged across subjects, as the authors have apparently done, this could inadvertently contribute to the emergence of the "diffuse" motif, as described in the context of the associative cortex. This presents a critical issue that requires a more thorough control analysis and validation process to ensure that the main results are not artifacts resulting from limitations in tractography.

      Additionally, conducting a control analysis to demonstrate that individual variability in tracking endpoints within the thalamus, when averaged across subjects, does not artificially generate a more diffuse connectivity pattern, is essential.

      We thank the reviewer for bringing up this point, and the reviewer is correct that a simple group average of streamline counts across that thalamus could make some thalamic patterns appear more diffuse if those patterns vary slightly in location across people. The simplest way to address this concern is to show that diffuse patterns are present in individual subjects. Fig. 2 panels B, C, H, and I are all subject-level figures, which show that we can replicate the group level findings in Fig. 2 panels F, G. Specifically, Fig 2. Panels H and I show that the effect of association areas exhibiting more diffuse connectivity patterns within the thalamus relative to sensorimotor areas is generalizable across subjects.

      To the reviewer’s point, the other way that averaged streamline counts could make focal connections seem diffuse is by averaging within cortical areas (e.g. to test the possibility that association areas may have highly variability focal patterns, and when averaged within the cortical area it makes these focal patterns appear more diffuse). To test this, we show that we can replicate the hierarchy effect at the vertex level, by calculating the extent of connectivity patterns for every cortical vertex and correlated vertex-level EDpc1 values to vertex-level T1w/T2w and RSFC_pc1 values (Fig S20).

      Hopefully the data shown in Fig. 2 (replication at the individual level) and Fig. S20 (replication at the vertex level) ameliorate the reviewer’s concerns that averaging highly variable focal connectivity patterns within the thalamus (either across people or across vertices) does not artifactually produce diffuse thalamic connectivity patterns for associative cortical areas.

      Comment 4: Because the authors included data from all thresholds, it seems likely that false positive tracks were included in the results. The methodology described seems to unavoidably include anatomically implausible pathways in the spatial extent analyses.

      The thresholding approach taken in the manuscript aimed to control for inter-areal differences in anatomical connection strength that could confound the ED estimates. Here I am not quite clear why inter-areal differences in anatomical connection strength have to be controlled. A global threshold applied on all thalamic voxels might kill some connections that are weak but do exist. Those weak pathways are less likely to survive at high thresholds. In the meantime, the mean ED is weighted, with more conservative thresholds having higher weights. That being said, isn't it possible that more robust pathways might contribute more to the mean ED than weaker pathways?

      This is a good point from the reviewer, and we appreciate them bringing up these points about our thresholding rationale. We would like to clarify two points: why it was appropriate for our question to threshold thalamic voxels for each cortical area separately and why we iteratively thresholded thalamic voxels.

      Regarding thalamic connectivity differences between cortical areas: a global threshold would indeed exclude weak, but potentially true, connections. This was part of our rationale for thresholding thalamic voxels for each cortical area separately. Too conservative of a global threshold would exclude all thalamic voxels for some cortical areas and too liberal of a threshold would include many potentially false positive connections for other cortical areas. Our method of thresholding each cortical area’s thalamic voxels separately ensured that we were sampling thalamic voxels in an equitable manner across cortical areas. We updated the text to clarify this:

      Methods section, pg. 11, section Framework to quantify the extent of thalamic connectivity patterns via Euclidean distance (ED)

      “We used Euclidean distance (ED) to quantify the extent of each cortical area's thalamic connectivity patters. Probabilistic tractography data require thresholding before the ED calculation. To avoid the selection of an arbitrary threshold (Sotiropoulos et al., 2019, Zhang et al., 2022), we calculated ED for a range of thresholds (Figure 1a). Our thresholding framework uses a tractography-derived connectivity matrix as input. We iteratively excluded voxels with lower streamline counts for each cortical parcel such that the same number of voxels was included at each threshold. At each threshold, ED was calculated between the top x\% of thalamic voxels with the highest streamline counts. This produced a matrix of ED values (360 cortical parcels by 100 thresholds). This matrix was used as input into a PCA to derive a single loading for each cortical parcel. While alternative thresholding approaches have been proposed, this framework optimizes the examination of spatial patterns by proportionally thresholding the data, enabling equitable sampling of each cortical parcel's streamline counts within the thalamus.

      This approach controlled for inter-areal differences in anatomical connection strength that could confound the ED estimates. In contrast, a global threshold, which is applied to all cortical areas, may exclude all thalamic streamline counts for some cortical areas that are more difficult to reconstruct, thus making it impossible to calculate ED for that cortical area, as there are no surviving thalamic voxels from which to calculate ED. This would be especially problematic for white matter tracts are more difficult to reconstruct (e.g. the auditory radiation), and cortical areas connected to the thalamus by those white matter tracts would have a disproportionate number of thalamic voxels excluded when using a global threshold.”

      Regarding thalamic connectivity differences across the thalamus for a given cortical area, the thresholding method we use does include anatomically implausible connections in the ED calculation because we sample voxels iteratively, and as more and more thalamic voxels are included in the ED analysis the likelihood that they reflect spurious connections increases. This approach made the most sense to us, because there is no way to identify a threshold that only includes true positive connections. And since this method does not exist, we sampled all thresholds and leveraged the behavior of the ED metric across thresholds to quantify the spread of a connectivity pattern. As the reviewer points out, since the measure is effectively “weighted,” more “robust” or anatomically plausible pathways should contribute more to the EDpc1 rather than weaker pathways. This is exactly the balanced approach we aimed for: a measure that is driven by connections that have the highest likelihood of being a true positive but does not rely on an arbitrary threshold.

      We did also replicate our main findings after thresholding and binarizing the data for separate thresholds, which show that our main effect was strongest only when thalamic voxels with the highest streamline counts (which are assumed to have a lower chance of being false positives) are included in the ED calculation (Fig. S5). This more traditional method of thresholding also supported our results, and increases our overall confidence that associative cortical areas have more diffuse connectivity patterns within the thalamus relative to somatomotor areas.

      Comment 5: In the introduction, there is a bit of ambiguity that needs clarification. The overall goal of the study appears to be the examination of anatomical connectivity from the cortex to the thalamus, specifically whether a cortical region projects to a single thalamic subregion or multiple thalamic subregions. However, certain parts of the introduction also suggest an exploration of the concept of thalamic integration, which typically means a single thalamic region integrating input from multiple cortical regions (converging input). These two patterns, many cortical regions to one thalamic region versus one cortical region to many different thalamic regions, represent distinct and fundamentally different concepts that should be clarified in the manuscript.

      We thank the reviewer for pointing out this ambiguity and have edited the introduction to clarify this point:

      Our argument for a potential mechanism for integration is the following: because corticothalamic connectivity is topographically organized, if a cortical area has a more diffuse anatomical projection across the thalamus that means its connections overlap with more cortical areas. To the reviewer’s point, our argument is simply that one cortical area targeting multiple thalamic nuclei inherently suggests that such a cortical area has overlapping connectivity patterns with many other cortical areas in the same thalamic subregion. We have updated the introduction to clarify this further.

      Intro, pg 1.

      “Studies of cortical-thalamic connectivity date back to the early 19th century, yet we still lack a comprehensive understanding of how these connections are organized (see 13 and 14 for review). The traditional view of the thalamus is based on its histologically-defined nuclear structure (6). This view was originally supported by evidence that cortical areas project to individual thalamic nuclei, suggesting that the thalamus primarily relays information (15). However, several studies have demonstrated that cortical connectivity within the thalamus is topographically organized and follows a smooth gradient across the thalamus (16–21). Additionally, some cortical areas exhibit extensive connections within the thalamus, which target multiple thalamic nuclei (22? ). These extensive connections may enable information integration within the thalamus through overlapping termination patterns from different cortical areas, a key mechanism for higher-order associative thalamic computations (23– 25). However, our knowledge of how thalamic connectivity patterns vary across cortical areas, especially in humans, remains incomplete. Characterizing cortical variation in thalamic connectivity patterns may offer insights into the functional roles of distinct cortico-thalamic loops (6, 7).”

      Discussion, pg 9. Section: The spatial properties of thalamic connectivity pat- terns provide insight into the role of the thalamus in shaping brain-wide information flow.

      “In this study, we demonstrate that association cortical areas exhibit diffuse anatomical connections within the thalamus. This may enable these cortical areas to integrate information from distributed areas across the cortex, a critical mechanism supporting higher-order neural computations. Specifically, because thalamocortical connectivity is organized topographically, a cortical area that projects to a larger set of thalamic subregions has the potential to communicate with many other cortical areas. We observed that anterior cingulate cortical areas had some of the most diffuse thalamic connections. This observation aligns with findings from Phillips et al. that area 24 exhibited the most diffuse anatomical terminations across the mediodorsal nucleus of the thalamus relative to other prefrontal cortical area…”

      Reviewer 3:

      Comment 1: Potential weaknesses of the study are that it seems to largely integrate aspects of the thalamus that have been already described before. The differentiation between sensory and association systems across thalamic subregions is something that has been described before (see: Oldham and Ball, 2023; Zheng et al., 2023; Yang et al., 2020 Mueller, 2020; Behrens, 2003).

      It is true that previous studies have shown that corticothalamic systems vary between sensory and associative cortical areas. Furthermore, there is much evidence that indicates that the sensory-association hierarchy is a major principle of brain organization in general. However, how and why these circuits are different is still not fully known, both across the whole brain and in corticothalamic circuits specifically.

      Our study is the first to compare patterns of anatomical connectivity within the thalamus and determine if cortical areas vary in the extent of those patterns. So our main finding isn't that sensory and association cortical areas show differences in thalamic connectivity, it is that they specifically show differences in their pattern of connectivity within the thalamus. This provides a unique insight into how sensory and associative systems differ in their thalamic connectivity in primates.

      Additionally, we show evidence that provides some insight into why these differences may exist. Although we cannot provide causal evidence, our data suggest that differences in patterns of anatomical connectivity within the thalamus were related to how different cortical areas process information via the thalamus, which aligns with speculations from Phillips et al 2021.

      So our main finding isn't that sensory and association cortical areas show differences in thalamic connectivity, is it that they specifically show differences in their pattern of connectivity within the thalamus and these differences may help us understand how these cortical areas process information and, in turn, how they may support different types of computations, both of which are major goals in neuroscience. To better clarify this in the manuscript, we made the following changes:

      Discussion, Paragraph 1, pg 8:

      “This study contributes to the rich body of literature investigating the organization of cortico-thalamic systems in human and non-human primates. Prior research has shown that features of thalamocortical connectivity differ between sensory and association systems, and our work advances this understanding by demonstrating that these systems also differ in the pattern and spatial extent of their anatomical connections within the thalamus. Using dMRI-derived tractography across species, we show that these connectivity patterns vary systematically along the cortical hierarchy in both humans and macaques. These findings are critical for establishing the anatomical architecture of how information flows within distinct cortico-thalamic systems. Specifically, we identify reproducible tractography motifs that correspond to sensorimotor and association circuits, which were consistent across individuals and generalize across species. Collectively, this study offers convergent evidence that the spatial pattern of anatomical connections within the thalamus differs between sensory and association cortical areas, which may support distinct computations across cortico-thalamic systems.”

      Comment 2: (1) Why not formally test the association between humans and macaques by bringing the brains to the same space?

      We thank the reviewer for this query. We were primarily interested in using the macaque data as a validation of the human data, because it was acquired at a much higher resolution, there are no motion confounds, and it provides a bridge with the tract tracing literature in macaques. We are currently studying interspecies differences in patterns of thalamic connectivity, as well as extensions of our approach into structure-function coupling, and we believe these topics warrant their own paper.

      Comment 3: (2) Possibly flesh out the differences between this study and other studies with related approaches a bit further.

      We updated the discussion section to better clarify the differences in this study from previous research. See response to Reviewer 3 Comment 1 for text changes.

      Comment 4: (3) The current title entails 'cortical hierarchy' but would 'differentiation between sensory and association regions' not be more correct? Or at least a reflection on how cortical hierarchy can be perceived?

      We treat these phrases as synonymous terms. Our definition of cortical hierarchy is a smooth transition in features between sensory and motor areas to higher-order associative areas. The use of cortical hierarchy is meant to reflect that our measure continuously varies across the cortex. We updated the manuscript to make this clearer:

      Abstract, pg 1.

      “Additionally, we leveraged resting-state functional MRI, cortical myelin, and human neural gene expression data to test if the extent of anatomical connections within the thalamus varied along the cortical hierarchy, from sensory and motor to multimodal associative cortical areas.”

      Comment 5: (4) For the core-matrix map, there is a marked left-right differences and also there are only two donors in the right hemisphere, possibly note this as a limitation?

      We thank the reviewer for this observation. We updated Fig. S28 Panel D to show that the correspondence between EDpc1 and the Core-Matrix (CPc) cortical maps holds when the correlation was done for left and right cortex, separately.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Two genes from the Crp/cAMP complex (crp and cyaA) are hypothesized to be key for persistence but key metabolomics and proteomics data are obtained from only one deletion mutant in the crp gene.

      We thank the reviewer for their thoughtful assessment of our manuscript and for providing valuable comments.

      In our study, we have demonstrated that deletion of both cyaA and crp genes results in the same persistence phenotype. In a previous study, we screened knockout strains of global transcriptional regulators using the aminoglycoside (AG) potentiation assay and found that, across a panel of carbon sources, AG potentiation occurred in tolerant cells derived from most knockout strains—except for Δcrp and Δcrp (Mok et al., 2015). This indicated that both genes are critical components of the Crp/cAMP regulatory network in persistence. Because cAMP exerts its effects when bound to its receptor protein Crp, disrupting crp alone should effectively abolish Crp/cAMP complex function (Keseler et al., 2011). Thus, we reasoned that comparing Δcrp to wild-type would be sufficient to capture the key metabolic and proteomic alterations arising from Crp/cAMP perturbation. Given the substantial cost and labor intensity of untargeted metabolomics and proteomics analyses, this experimental design allowed us to extract meaningful insights while maintaining feasibility. Nonetheless, to ensure the robustness of our findings, we have conducted all subsequent validation experiments using both Δcrp and Δcrp strains, confirming that the observed metabolic and proteomic changes are consistent across both mutants. We have now provided a concise justification statement in the manuscript (see lines 197-200 in the current manuscript).

      (2) The deletion of crp and crp have opposite effects on the concentration of cAMP, a comparison of metabolomics and proteomics data obtained using both mutants might aid in understanding this difference.

      Although this is an interesting outcome, we have already discussed in the manuscript that it is likely due to the feedback regulation of the Crp/cAMP complex on crp expression (see Fig. 1 Keseler et al., 2011) (Aiba, 1985; Keseler et al., 2011; Majerfeld et al., 1981). Specifically, perturbation of the Crp/cAMP complex by deleting crp should enhance crp promoter (Pcrp) activity, leading to increased CyaA protein expression and, consequently, elevated intracellular cAMP levels. To experimentally verify this predicted feedback regulation, we utilized E. coli K-12 MG1655 WT, Δcrp, and Δcrp strains harboring the pMSs201 plasmid, which encodes green fluorescent protein (gfp) under the control of the P<sub>cyaA</sub> promoter. This design allowed us to directly assess the effect of Crp/cAMP perturbation on P<sub>cyaA</sub> activity by quantifying gfp expression as a reporter. By comparing the mutant strains to WT, we could determine whether loss of Crp/cAMP function indeed derepresses crp expression. As expected, genetic perturbation of Crp/cAMP enhanced P<sub>cyaA</sub> promoter activity, resulting in increased gfp expression (Figure 1-figure supplement 2). This result supports the role of Crp/cAMP in regulating crp expression via feedback control. We have now explicitly discussed this rationale in the manuscript and included the corresponding data (see lines 410-418 and Figure 1-figure supplement 2 in the current manuscript).

      (3) Metabolomics, proteomics, and metabolic activity data are obtained at the whole population level rather than at the level of the persister sub-population.

      Performing metabolomic, proteomic, and other assays at the level of the persister subpopulation is inherently challenging in this study and across the persister research field, as it requires isolating a pure persister population. While metabolic inhibitors like rifampin and tetracycline can induce dormancy and antibiotic tolerance in the entire population (Kwan et al., 2013), these treatments generate artificially altered cell states that may not accurately reflect naturally occurring persisters. Fluorescent reporters combined with fluorescence-activated cell sorting (FACS) have been utilized to study persister cells, including in our previous studies (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). However, this approach only enriches for persisters rather than isolating a pure population, as persisters still constitute a small fraction of the sorted cells (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). Despite these limitations, our untargeted metabolomics and proteomics analyses at the whole-population level provide valuable insights into the regulatory mechanisms of the Crp/cAMP complex and its potential role in persister formation. We have rigorously examined the impact of these mechanisms on non-growing cell formation (see Figure 4 in the current manuscript) and persister levels (see Figure 5 in the current manuscript) through flow cytometry and single-gene deletion experiments. We appreciate the reviewer’s comment and have acknowledged and discussed these methodological challenges in our manuscript (see lines 397-406 in the current manuscript).

      Reviewer #2:

      (1) The approaches used here are aimed at the major bacterial population, but yet the authors used the data reflecting the major population behavior to interpret the physiology of persister cells that comprise less than 1% of the major bacterial population. How they can pick up a needle from the hay without being fooled by the spill-over artifacts from the major population? Although it is probably very difficult to isolate and directly assay persister cells, firm conclusions for the type proposed by the authors cannot be firmly established without such assays. Perhaps introducing crp/crp mutation into the best example of persistence, the hipA-7 high persistence phenotype may clarify this issue to a certain extent.

      We thank the reviewer for their thoughtful assessment of our manuscript and for providing valuable comments.

      Performing metabolomics and proteomics at the level of the persister subpopulation remains a major challenge in this study and across the persister research field, as it requires isolating a pure persister population. While metabolic inhibitors like rifampin and tetracycline can induce dormancy and antibiotic tolerance in the entire population (Kwan et al., 2013), these treatments generate artificially altered cell states that may not accurately reflect naturally occurring persisters. Similarly, fluorescent reporters combined with fluorescence-activated cell sorting (FACS) have been employed to study persister cells, including in our previous studies (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). However, this approach only results in persister-enriched populations rather than a pure isolate, meaning that persisters still constitute a small fraction of the sorted cells (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). Despite these inherent limitations, our untargeted metabolomics and proteomics analyses at the whole-population level provide valuable insights into the regulatory mechanisms of the Crp/cAMP complex and its potential role in persister formation. Specifically, our data reveal clear indications that Crp/cAMP activity promotes the formation of a non-growing cell subpopulation, while its deletion reduces this effect. We have validated this observation through single-cell analyses (see Figure 4 in the current manuscript). Additionally, our data strongly suggest that energy metabolism plays a critical role in persister cell physiology, and we have rigorously tested this hypothesis using persister assays for single-gene deletions (see Figure 5 in the current manuscript).

      Furthermore, in response to the reviewer’s suggestion, we introduced crp and crp deletions into the HipA-7 high-persistence mutant strain. The impact of these deletions in HipA-7 mirrored their effects in the wild-type strain (Figure 1-figure supplement 8), further supporting our conclusions. This data has been provided and discussed in the manuscript (see lines 185-189, and Figure 1-figure supplement 8 in the current manuscript).

      We acknowledge the challenges in directly assaying persister cells, and we have now discussed this in the manuscript (see lines 397-406 in the current manuscript).

      (2) The authors overlooked/omitted a recently published work regarding cyaA and crp (PMID: 35648826). In that work, a deficiency in cyaA or crp confers tolerance to diverse types of lethal stressors, including all lethal antimicrobials tested. How a mutation conferring pan-tolerance to the major bacterial population would lead to a less protective effect with a minor subpopulation? The authors are kind of obligated to discuss such a paradox in the context of their work because that is the most relevant literature for the present work. It is also very interesting if the cyaA/crp deficiency really has an opposing effect on tolerance and persistence. As a note, most of the conclusions from the omics studies of the present work have been reached in that overlooked literature, which addresses mechanisms of tolerance, a major rather than a minor population behavior. That supports comment #1 above. The inability of the authors to observe tolerance phenotype with the cyaA or crp mutant possibly derived from extremely high antimicrobial concentrations used in the study prevents tolerance phenotype from being observed because tolerance is sensitive to antimicrobial concentration while persistence is not.

      (3) The authors overly stressed the effect of cyaA/crp on persister formation but failed to test an alternative explanation of their effect on persister waking up after antimicrobial treatment. If the cyaA/crp-derived persisters are put into deeper sleep during antimicrobial treatment than wildtype-derived persisters, a 16-h recovery growth might have underestimated viable bacteria. This is often the case especially when extremely high concentrations of antimicrobials are used in performing persister assay. Thus, at least a longer incubation time (e.g. 48 and 72h) of agar plates for persister viable count needs to be performed to test such a scenario.

      (4) The rationale for using extremely high drug concentrations to perform persister assay is unclear. There are 2 issues with using extremely high drug concentrations. First, when overly high concentrations are used, drug removal becomes difficult. For example, a two-time wash will not be able to bring drug concentration from > 100 x MIC to below MIC. This is especially problematic with aminoglycoside because drug removal by washing does not work well with this class of compound. Second, overly high concentrations of drug use may make killing so rapidly and severely that may mask the difference from being observed between mutants and the control wild-type strain. In such cases, you would need to kill over a wide range of drug concentrations to find the right window to show a difference. The gentamicin data in the present work is likely the case that needs to be carefully examined. The mutants and the wild-type strain have very different MICs for gentamicin, but a single absolute drug concentration rather than concentrations normalized to MIC was used. This is like to compare a 12-year-old with a 21-year-old to run a 100-meter dash, which is highly inappropriate.

      The reviewer notes that key literature (PMID: 35648826) was overlooked, showing cyaA/crp deficiency confers broad stress tolerance—contradicting the reported reduction in persister protection. They suggest high drug concentrations may mask tolerance, and also, longer incubation (48–72 h) and normalized drug levels based on MIC are recommended. Given that these three independent comments are interconnected, we will address them together.

      We follow a rigorous washing protocol to minimize antibiotic carryover. After treatment, 1 ml of culture is centrifuged at 13,300 RPM (17,000 x g) for 3 minutes, and >950 µl of supernatant is removed without disturbing the pellet. The pellet is resuspended in 950 µl PBS, diluting antibiotics >20-fold. This step is repeated, resulting in a >400-fold cumulative dilution. After the final wash, cells are resuspended in 100 µl PBS, then serially diluted and plated on antibiotic-free agar to ensure consistency and eliminate residual antibiotics. Preliminary experiments are routinely done in our laboratory to confirm the effectiveness of washing procedures. To address concerns that high antibiotic concentrations may mask phenotypic differences—particularly in the gentamicin assay—we conducted additional experiments using MIC-normalized doses (5×, 10×, and the original study concentration) with six wash steps. As shown in Figure 1-figure supplement 6, all concentrations consistently reduced persister levels, supporting our original findings. While 5× MIC ampicillin allowed detection of persisters in mutant strains, their levels remained multiple orders of magnitude lower than in wild-type, maintaining statistical significance. These results, along with updated washing protocols, are now included in the revised manuscript (see lines 176-185 and Figure 1-figure supplement 6 in the current manuscript).

      Although we standardize the incubation time of the agar plates for all conditions and strains, most strains form sufficiently large colonies within 16 hours, and longer incubation often leads to large, overlapping colonies that hinder accurate counting. We assure the reviewer that we always leave the plates in the incubator beyond the initial counting period to monitor the emergence of any new colonies. Here, we provide plate images of key strains after antibiotic treatments, demonstrating that extended incubation did not alter CFU levels, as shown in Figure 1-figure supplement 7. We have updated the relevant section in the Materials and Methods to clarify this point and included the plate images in the current manuscript (see lines 181-182 and Figure 1-figure supplement 7 in the current manuscript).

      We acknowledge the significance of the study highlighted by the reviewer (Zeng et al., 2022); however, direct comparisons with our results are challenging due to substantial differences in experimental conditions, antibiotic concentrations, treatment durations, and most importantly, the E. coli strains used. The study of Zeng et al., 2022, utilized strains from the Keio collection, a commercially available E. coli BW25113 mutant library, which may contain unknown background mutations that could influence tolerance phenotypes. While we used the Keio collection for initial screening, we always validate single clean deletions in our lab strain, E. coli MG1655, to ensure robust conclusions. The observed variations in tolerance and persistence between studies can largely be attributed to these methodological differences rather than an inherent paradox. The concentrations of ampicillin (200 µg/mL) and ofloxacin (5 µg/mL) used in our assays are in line with concentrations employed in foundational persister studies (Amato & Brynildsen, 2015; Cui et al., 2016; Hansen et al., 2008; Leszczynska et al., 2013; Lin et al., 2022; Orman & Brynildsen, 2015; Shah et al., 2006). These levels represent >10 × the MIC and are necessary to ensure the elimination of actively growing cells, thus enriching for persister cells that, by definition, survive high bactericidal drug exposure. Our aim is not to model pharmacokinetics per se, but to apply a standardized challenge to distinguish phenotypic persistence. Furthermore, pharmacokinetic and pharmacodynamic clinical data show that antibiotics such as ofloxacin and ampicillin can reach levels far exceeding 10× MIC for extended periods in patients (OFLOXACIN, 2019; Soto et al., 2014).

      To assess how cyaA and crp deletions affect antibiotic responses under conditions similar to those used by Zeng et al. (Zeng et al., 2022) —specifically, exponential-phase E. coli BW25113 strains (Keio collection), lower antibiotic concentrations, and short treatments (e.g., 1 hour)—we first tested E. coli MG1655 WT, Δcrp, and Δcrp strains in late stationary phase using reduced antibiotic concentrations and shorter exposures. Both knockouts showed decreased survival following ampicillin and ofloxacin treatment compared to WT (see Figure 1-figure supplement 6), consistent with our findings in Figure 1 in the manuscript. In exponential phase, the knockout strains exhibited reduced survival after ampicillin treatment but increased survival after ofloxacin treatment relative to WT (see Author response image 2A below), again mirroring the trends in Figure 1. Gentamicin treatment, however, produced variable results in MG1655 knockouts, likely due to the brief 1-hour exposure being insufficient for robust conclusions (Author response image 2A). Notably, when we tested the corresponding Keio knockout strains in the BW25113 background, we observed increased tolerance in exponential-phase cells, reproducing Zeng et al.'s findings under their specific conditions (see Author response image 2B below), although BW25113 and MG1655 exhibited distinct persister phenotypes in exponential phase (Author response image 2A, B). These results, altogether, highlight the sensitivity of antibiotic tolerance and persistence phenotypes to factors such as strain background, antibiotic concentration, and treatment duration. This is now discussed in detail in the revised manuscript, with supporting data provided (see lines 460-476, and Supplement File 6, 7 in the current manuscript).

      Author response image 1.

      Persister levels of E. coli K-12 MG1655 WT, Δcrp, and Δcrp strains in late stationary phase. Cells were treated with ampicillin (5× MIC for 4 h), ofloxacin (5× MIC for 2.5 h), and gentamicin (3× MIC for 1 h). Concentrations and treatment durations were selected based on (Zeng et al., 2022).

      Author response image 2.

      Persister levels of E. coli K-12 MG1655 (Panel A) and BW25113 (Panel B) WT, Δcrp, and Δcrp strains in the exponential growth phase. Cells were treated at mid-exponential phase (OD<sub>600</sub> ~0.25) with ampicillin (5× MIC for 4 h), ofloxacin (5× MIC for 2.5 h), and gentamicin (3× MIC for 1 h). Treatment concentrations and durations were based on conditions described in (Zeng et al., 2022).

      Reviewer #3:

      The authors try to draw too many conclusions and it's difficult to identify what their actual findings are. For instance, they do not have any interesting findings with aminoglycosides but include the data and spend a lot of time discussing it, but it is really a distraction. The correlation between the induction of anabolic pathways in the crp mutant in the late stationary phase and the reduction in persisters is potentially very interesting but is buried in the paper with the vast quantities of data, and observations and conclusions that are often not well substantiated.

      We thank the reviewer for their assessment that helped us clarify and strengthen the focus of our manuscript.

      While our study is not focused on aminoglycosides, we believe the related data provide important insights into persister cell physiology. Persisters are traditionally described as metabolically dormant, non-growing cells. However, we consistently observe that aminoglycosides—despite requiring energy-dependent uptake and active protein translation for their activity—can still eliminate persister cells in wild-type E. coli. This finding supports our central hypothesis that persisters may retain a basal level of metabolic activity sufficient to permit aminoglycoside uptake and action during prolonged treatment. We have revised the manuscript to present this point more clearly, ensuring it complements rather than distracts from the main narrative.

      We respectfully emphasize that our conclusions are supported by multiple layers of evidence. Our metabolomics data are corroborated by proteomics and further validated by functional assays, including redox state measurements, growing versus non-growing cell detection, and targeted persister assays. In addition, we performed labor-intensive validations using individually selected Keio mutants treated with antibiotics to quantify persister levels, with key observations further confirmed in single-gene deletions in E. coli MG1655 strains.

      We believe the revisions made in response to all reviewers’ comments have significantly improved the clarity, focus, and overall impact of the manuscript.

      The discussion section is particularly difficult to read and I recommend a large overhaul to increase clarity. For instance, what are the authors trying to conclude in section (iii) of the discussion? That persisters in the stationary phase have higher energy than other cells? Is there data to support that? All sections are similarly lacking in clarity.

      We repeatedly emphasize in the manuscript that while persister survival depends on energy metabolism, this does not imply that persisters have higher metabolic activity than those in the exponential growth phase. We have clarified this point in the revised manuscript (see lines 67-79, and 442-444 in the current manuscript).

      The large number of mutants characterized is a strength, but the quality of the data provided for those experiments is poor. Did some of these mutants lose fitness in the deep stationary phase in the absence of antibiotics? Did some reach a far lower cfu/ml in the stationary phase? These details are important and without them, it is difficult to interpret the data.

      Although metabolic mutations can affect cell growth, we do not observe substantial differences in cell numbers during the late stationary phase, when persister assays are performed. These knockout strains reach stationary phase fully by that time. We emphasize that we routinely measure cell numbers at this stage using flow cytometry before diluting cultures into fresh media and applying antibiotic treatments. Cell counts for the metabolic mutants are shown in Figure 5-figure supplement 4 in the current manuscript, and no significant growth deficiencies are observed in the late stationary phase. This is consistent with our previous publication (Shiraliyev & Orman, 2023) and findings from Lewis’s group (Manuse et al., 2021), where similar knockout strains showed no drastic impact on growth.

      There is an analysis of persister formation in mutants in the pts/CRP pathway that is not discussed (Zeng et al PNAS 2022, Parsons et al PNAS, 2024).

      These studies are now cited and discussed in the revised manuscript (see lines 459-476).

      The authors do not discuss ROS production and antibiotic killing in these experiments. Presumably, the WT would have a greater propensity to produce ROS in response to antibiotics than the crp mutant, but it survives better. Is ROS not involved in antibiotic killing in these conditions?

      The experimental conditions used here are identical to those in our previously published study on persister cells in the late stationary phase (Orman & Brynildsen, 2015), where we specifically investigated the role of ROS in antibiotic tolerance. In that work, we overexpressed key antioxidant enzymes—catalases (katE, katG) and superoxide dismutases (sodA, sodB and sodC)—at stationary phase. These enzymes were confirmed to be catalytically active through functional assays, yet their overexpression had no measurable effect on persister levels. To further decouple ROS from respiratory activity in that study, we performed anaerobic experiments using nitrate as an alternative terminal electron acceptor. We found that anaerobic respiration actually enhanced persister formation, and inhibition of nitrate reductases using KCN reduced it—again, independent of ROS. These findings provide compelling evidence that it is the respiratory activity itself, rather than ROS production, that influences persister formation in our system.

      We have now included this discussion in the revised manuscript to clarify that ROS are unlikely to be a major factor in antibiotic killing under these conditions (see lines 503-513).

      References Aiba, H. (1985). Transcription of the Escherichia coli adenylate cyclase gene is negatively regulated by cAMP-cAMP receptor protein. The Journal of Biological Chemistry, 260(5), 3063–3070.

      Amato, S. M., & Brynildsen, M. P. (2015). Persister Heterogeneity Arising from a Single Metabolic Stress. Current Biology, 25(16), 2090–2098. https://doi.org/10.1016/j.cub.2015.06.034

      Amato, S. M., Orman, M. A., & Brynildsen, M. P. (2013). Metabolic Control of Persister Formation in Escherichia coli. Molecular Cell, 50(4), 475–487. https://doi.org/10.1016/J.MOLCEL.2013.04.002

      Cui, P., Niu, H., Shi, W., Zhang, S., Zhang, H., Margolick, J., Zhang, W., & Zhang, Y. (2016). Disruption of Membrane by Colistin Kills Uropathogenic Escherichia coli Persisters and Enhances Killing of Other Antibiotics. Antimicrobial Agents and Chemotherapy, 60(11), 6867–6871. https://doi.org/10.1128/AAC.01481-16

      Hansen, S., Lewis, K., & Vulić, M. (2008). Role of Global Regulators and Nucleotide Metabolism in Antibiotic Tolerance in Escherichia coli. Antimicrobial Agents and Chemotherapy, 52(8), 2718–2726. https://doi.org/10.1128/AAC.00144-08

      Keseler, I. M., Collado-Vides, J., Santos-Zavaleta, A., Peralta-Gil, M., Gama-Castro, S., Muniz-Rascado, L., Bonavides-Martinez, C., Paley, S., Krummenacker, M., Altman, T., Kaipa, P., Spaulding, A., Pacheco, J., Latendresse, M., Fulcher, C., Sarker, M., Shearer, A. G., Mackie, A., Paulsen, I., … Karp, P. D. (2011). EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Research, 39(Database), D583–D590. https://doi.org/10.1093/nar/gkq1143

      Kwan, B. W., Valenta, J. A., Benedik, M. J., & Wood, T. K. (2013). Arrested protein synthesis increases persister-like cell formation. Antimicrobial Agents and Chemotherapy, 57(3), 1468–1473. https://doi.org/10.1128/AAC.02135-12

      Leszczynska, D., Matuszewska, E., Kuczynska-Wisnik, D., Furmanek-Blaszk, B., & Laskowska, E. (2013). The Formation of Persister Cells in Stationary-Phase Cultures of Escherichia Coli Is Associated with the Aggregation of Endogenous Proteins. PLoS ONE, 8(1), e54737. https://doi.org/10.1371/journal.pone.0054737

      Lin, J. S., Bekale, L. A., Molchanova, N., Nielsen, J. E., Wright, M., Bacacao, B., Diamond, G., Jenssen, H., Santa Maria, P. L., & Barron, A. E. (2022). Anti-persister and Anti-biofilm Activity of Self-Assembled Antimicrobial Peptoid Ellipsoidal Micelles. ACS Infectious Diseases, 8(9), 1823–1830. https://doi.org/10.1021/acsinfecdis.2c00288

      Majerfeld, I. H., Miller, D., Spitz, E., & Rickenberg, H. V. (1981). Regulation of the synthesis of adenylate cyclase in Escherichia coli by the cAMP — cAMP receptor protein complex. Molecular and General Genetics MGG, 181(4), 470–475. https://doi.org/10.1007/BF00428738

      Manuse, S., Shan, Y., Canas-Duarte, S. J., Bakshi, S., Sun, W.-S., Mori, H., Paulsson, J., & Lewis, K. (2021). Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biology, 19(4), e3001194.

      Mok, W. W. K., Orman, M. A., & Brynildsen, M. P. (2015). Impacts of global transcriptional regulators on persister metabolism. Antimicrobial Agents and Chemotherapy, 59(5), 2713–2719.

      OFLOXACIN. (2019). https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?setid=1779c568-d7bb-4bd5-bc29-13bd52ba8a0a&type=display

      Orman, M. A., & Brynildsen, M. P. (2013). Dormancy is not necessary or sufficient for bacterial persistence. Antimicrobial Agents and Chemotherapy, 57(7), 3230–3239.

      Orman, M. A., & Brynildsen, M. P. (2015). Inhibition of stationary phase respiration impairs persister formation in E. coli. Nature Communications, 6(1), 7983.

      Shah, D., Zhang, Z., Khodursky, A. B., Kaldalu, N., Kurg, K., & Lewis, K. (2006). Persisters: a distinct physiological state of E. coli. BMC Microbiology, 6(1), 53. https://doi.org/10.1186/1471-2180-6-53

      Shiraliyev, R. C., & Orman, M. (2023). Metabolic disruption impairs ribosomal protein levels, resulting in enhanced aminoglycoside tolerance. BioRxiv, 2012–2023.

      Soto, E., Shoji, S., Muto, C., Tomono, Y., & Marshall, S. (2014). Population pharmacokinetics of ampicillin and sulbactam in patients with community-acquired pneumonia: evaluation of the impact of renal impairment. British Journal of Clinical Pharmacology, 77(3), 509–521. https://doi.org/10.1111/bcp.12232

      Zeng, J., Hong, Y., Zhao, N., Liu, Q., Zhu, W., Xiao, L., Wang, W., Chen, M., Hong, S., Wu, L., Xue, Y., Wang, D., Niu, J., Drlica, K., & Zhao, X. (2022). A broadly applicable, stress-mediated bacterial death pathway regulated by the phosphotransferase system (PTS) and the cAMP-Crp cascade. Proceedings of the National Academy of Sciences, 119(23). https://doi.org/10.1073/pnas.2118566119

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to illuminate how legumes promote symbiosis with beneficial nitrogen-fixing bacteria while maintaining a general defensive posture towards the plethora of potentially pathogenic bacteria in their environment. Intriguingly, a protein involved in plant defence signalling, RIN4, is implicated as a type of 'gatekeeper' for symbiosis, connecting symbiosis signalling with defence signalling. Although questions remain about how exactly RIN4 enables symbiosis, the work opens an important door to new discoveries in this area.

      Strengths:

      The study uses a multidisciplinary, state-of-the-art approach to implicate RIN4 in soybean nodulation and symbiosis development. The results support the authors' conclusions.

      Weaknesses:

      No serious weaknesses, although the manuscript could be improved slightly from technical and communication standpoints.

      Reviewer #2 (Public Review):

      Summary:

      The study by Toth et al. investigates the role of RIN4, a key immune regulator, in the symbiotic nitrogen fixation process between soybean and rhizobium. The authors found that SymRK can interact with and phosphorylate GmRIN4. This phosphorylation occurs within a 15 amino acid motif that is highly conserved in Nfixation clades. Genetic studies indicate that GmRIN4a/b play a role in root nodule symbiosis. Based on their data, the authors suggest that RIN4 may function as a key regulator connecting symbiotic and immune signaling pathways.

      Overall, the conclusions of this paper are well supported by the data, although there are a few areas that need clarification.

      Strengths:

      This study provides important insights by demonstrating that RIN4, a key immune regulator, is also required for symbiotic nitrogen fixation.

      The findings suggest that GmRIN4a/b could mediate appropriate responses during infection, whether it is by friendly or hostile organisms.

      Weaknesses:

      The study did not explore the immune response in the rin4 mutant. Therefore, it remains unknown how GmRIN4a/b distinguishes between friend and foe.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Toth et al reveals a conserved phosphorylation site within the RIN4 (RPM1-interacting protein 4) R protein that is exclusive to two of the four nodulating clades, Fabales and Rosales. The authors present persuasive genetic and biochemical evidence that phosphorylation at the serine residue 143 of GmRIN4b, located within a 15-aa conserved motif with a core five amino acids 'GRDSP' region, by SymRK, is essential for optimal nodulation in soybean. While the experimental design and results are robust, the manuscript's discussion fails to clearly articulate the significance of these findings. Results described here are important to understand how the symbiosis signaling pathway prioritizes associations with beneficial rhizobia, while repressing immunity-related signals.

      Strengths:

      The manuscript asks an important question in plant-microbe interaction studies with interesting findings.

      Overall, the experiments are detailed, thorough, and very well-designed. The findings appear to be robust.

      The authors provide results that are not overinterpreted and are instead measured and logical.

      Weaknesses:

      No major weaknesses. However, a well-thought-out discussion integrating all the findings and interpreting them is lacking; in its current form, the discussion lacks 'boldness'. The primary question of the study - how plants differentiate between pathogens and symbionts - is not discussed in light of the findings. The concluding remark, "Taken together, our results indicate that successful development of the root nodule symbiosis requires cross-talk between NF-triggered symbiotic signaling and plant immune signaling mediated by RIN4," though accurate, fails to capture the novelty or significance of the findings, and left me wondering how this adds to what is already known. A clear conclusion, for eg, the phosphorylation of RIN4 isoforms by SYMRK at S143 modulates immune responses during symbiotic interactions with rhizobia, or similar, is needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticism of the work, although it could be improved by addressing the following minor points:

      (1) Page 8, Figure 2 legend. Consider changing "proper symbiosis formation" to "normal nodulation" or something that better reflects control of nodule development/number.

      We thank you for the suggestion, the legend was changed to “...required for normal nodule formation” (see Page 10, revised manuscript)

      (2) Page 9. Cut "newly" from the first sentence of paragraph 2, as S143 phosphorylation was identified previously.

      Thank you for the suggestion, we removed “newly” from the sentence.

      (3) Page 10, Figure 3. Panels B showing green-fluorescent nodules are unnecessary given the quantitative data presented in the accompanying panel A. This goes for similar supplemental figures later.

      We appreciate the comment; regarding Figure 3 (complementing rin4b mutant, we updated the figures according to the other reviewer’s comment) and Suppl Figure 6 (OE phenotype of phospho-mimic/negative mutants), we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See Page 10, revised manuscript)

      (4) Consider swapping Figure 3 for Supplemental Figure S7, which I think shows more clearly the importance of RIN4 phosphorylation in nodulation.

      We appreciate the comment and have swapped the figures according to the reviewer’s suggestion. Legend, figure description, and manuscript text have been updated accordingly. (See page 12 and 38, revised manuscript)

      (5) Page 10. Replace "it will be referred to S143..." with "we refer to S143 instead of ....".

      We replaced it according to the comment.

      (6) Page 11, delete "While" from "While no interactions could be observed...".

      We deleted it according to the suggestion.

      (7) Page 33, Fig S5. How many biological replicates were performed to produce the data presented in panel C and what do the error bar and asterisk indicate? Check that this information is provided in all figures that show errors and statistical significance.

      Thank you for the remark. The experiment was repeated three times, and this note was added to the figure description. All the other figure legends with error bar(s) were checked whether replicates are indicated accordingly.

      (8) Page 37, Fig S11, panel B. Are averages of data from the 2 biological and 3 technical replicates shown? Add error bars and tests of significant difference.

      Averages of a total of 6 replicates (from 2 biological replicates, each run in triplicates) are shown. We thank the reviewer for pointing out the missing error bars and statistical test, we have updated the figure accordingly.

      (9) Fig S12. Why are panels A, C, E, and G presented? The other panels seem to show the same data more clearly- showing the linear relationship between peak area ratio and protein concentration.

      We have taken the reviewer’s comment into consideration and revised the figure, removing the calibration curves and showing only four panels. The figure legend has been corrected accordingly. (Please see page 43, revised masnuscript). The original figure (unlike other revised figures) had to be deleted from the revised manuscript,as it caused technical issues when converting the document into pdf.

      Reviewer #2 (Recommendations For The Authors):

      Some small suggestions:

      (1) It's good to include a protein schematic for RIN4 in Figure 1.

      We appreciate the reviewer’s suggestion and we have drawn a protein schematic and added it to Figure 1. The figure legend was updated accordingly.

      (2) There appears to be incorrect labeling in Figure 2c; please double-check and make the necessary corrections.

      With respect, we do not understand the comment about incorrect labeling. Would the reviewer please help us out and give more explanation? In Figure 2C, RIN4a and RIN4b expression was checked in transgenic roots expressing either EV (empty vector) or different silencing constructs targeting RIN4a/b.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed the level of detail and precision in experimental design.

      A discussion point could be - What does it mean that nodule number but not fixation is affected? Is RIN4 only involved in the entry stage of infection but not in nodules during N-fixation?

      Current/Our data suggest that RIN4 does indeed appear to be involved in infection. This hypothesis is supported by the findings that RIN4a/b was found phosphorylated in root hairs but not in root (or it was not detected in the root). The interaction with the early signaling RLKs also suggests that RIN4 is likely involved in the early stage of symbiosis formation.

      How would the authors explain their observation "However, the motif is retained in non-nodulating Fabales (such as C. canadensis, N. schottii; SI Appendix, Figure S2) and Rosales species as well." What does this imply about the role in symbiosis that the authors propose?

      We appreciate the reviewer’s question. The motif seems to be retained, however, it might be not only the motif but also the protein structure that in case of nodulating plants might be different. We have not investigated the structure of RIN4, how it would look based on certain features/upon interaction with another protein and/or post-translational modification(s). Griesman et al, (2018) showed the absence of certain genes within Fabales in non-nodulating species, we can speculate that these absent genes can’t interact with RIN4 in those species, therefore the lack of downstream signaling could be possible (in spite of the retained motif in non-nodulating species). At this point, there is not enough data or knowledge to further speculate.

      qPCR analysis of symbiotic pathway genes showed that both NIN-dependent and NIN-independent branches of the symbiosis signaling pathway were negatively affected in the rin4b mutant. Please derive a conclusion from this.

      We appreciate the comment, it also prompted us to correct the following sentence; original: “Since NIN is responsible for induction of NF-YA and ERN1 transcription factors, their reduced expression in rin4b plants was not unexpected (Fig. 5). “As ERN1 expression is independent of NIN (Kawaharada et al, 2017). The following sentences were also deleted as it represented a repetition of a statement above these sentences: “Soybean NF-YA1 homolog responded significantly to rhizobial treatment in rin4b plants, whereas NF-YA3 induction did not show significant induction (Fig. 5).“

      We added the following conclusion/hypothesis: “Based on the results of the expression data presented above, it seems that both NIN-dependent and NINindependent branches of the symbiotic signaling pathways are affected in the rin4b mutant background. This indicates that the role of RIN4 protein in the symbiotic pathway can be placed upstream of CYCLOPS, as the CYCLOPS transcription activating complex is responsible (directly or indirectly) for the activation of all TFs tested in our expression analysis (Singh et al, 2014/47, 48).” (Please see Page 16, revised manuscript)

      The authors are highly encouraged to write a thoughtful discussion that would accompany the detailed experimental work performed in this manuscript.

      We appreciate the comment, and we did some work on the discussion part of the document. (Please see Pages 17-19, revised manuscript)

      Some minor suggestions for overall readability are below.

      What about immune signaling genes? Given that authors hypothesize that "Absence of AtRIN4 leads to increased PTI responses and, therefore, it might be that GmRIN4b absence also causes enhanced PTI which might have contributed to significantly fewer nodules." Could check marker immune signaling gene expression FLS2 and others.

      We appreciate the reviewer’s comment, and while we believe those are very interesting questions/suggestions, answering them is out of the scope of the current manuscript. Partially because it has been shown that several defenseresponsive genes that were described in leaf immune responses could not be confirmed to respond in a similar manner in root (Chuberre et al., 2018). It was also shown that plant immune responses are compartmentalized and specialized in roots (Chuberre et al., 2018). If we were looking at immune-responsive genes, the signal might be diluted because of its specialized and compartmentalized nature. Another reason why these questions cannot be answered as a part of the current manuscript is because finding a suitable immune responsive gene would require rigorous experiments (not only in root, but also in root hair (over a timecourse) which would be a ground work for a separate study (root hair isolation is not a trivial experiment, it requires at least 250-300 seedlings per treatment/per time-point).

      Regarding FLS2, it is known in Arabidopsis that its expression is tissue-specific within the root, and it seems that FLS2 expression is restricted to the root vasculature (Wyrsch et al, 2015). In our manuscript, we showed that RIN4a/b is highly expressed in root hairs, as well as RIN4 phosphorylation was detectable in root hair but not in the root; therefore, we do not see the reason to investigate FLS2 expression.

      "in our hands only ERN1a could be amplified. One possible explanation for this observation is that primers were designed based on Williams 82 reference genome, while our rin4b mutant was generated in the Bert cultivar background." Is the sequence between the two cultivars and the primers that bind to ERN1b in both cultivars so different? If not, this explanation is not very convincing.

      At the time of performing the experiment the genomic sequence of the Bert cultivar (used for generating rin4b edited lines) was not publicly available. In accordance with the reviewer’s comment, we removed the explanation, as it does not seem to be relevant. (See page 16, revised manuscript)

      The figures are clear and there is a logical flow. The images of fluorescing nodules in Figure 2,3 panels with nodules are not informative or unbiased .

      We appreciate the comment, as for Figure 3 (complementing rin4b mutant), we updated the figures according to the other reviewer’s comment and Suppl. Figure 6 (OE phenotype of phospho-mimic/negative mutants) we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See pages 10, 12 and 38, revised manuscript)

      What does the exercise in isolation of rin4 mutants in lotus tell us? Is it worth including?

      Isolation of the Ljrin4 mutant suggests that RIN4 carries such an importance that the mutant version of it is lethal for the plant (as in Arabidospis, where most of the evidence regarding the role of RIN4 has been described), and an additional piece of evidence that RIN4 is similarly crucial across most land plant species.

      Sentence ambiguous. "Co-expression of RIN4a and b with SymRKßΔMLD and NFR1α _resulted in YFP fluorescence detected by Confocal Laser Scanning Microscopy (SI Appendix, Figure S8) suggesting that RIN4a and b proteins closely associate with both RLKs." Were all 4 expressed together?

      Thank you for the remark. Not all 4 proteins were co-expressed together. We adjusted the sentence as follows: “Co-expression of RIN4a/ and b with SymRKßΔMLD as well as and NFR1α resulted in YFP fluorescence…” I hope it is phrased in a clearer way. (See page 13, revised manuscript)

      Minor spelling errors throughout.. Costume-made (custom made?)

      Thank you for noticing. According to the Cambridge online dictionary, it is written with a hyphen, therefore, we added a hyphen and corrected the manuscript accordingly.

      CRISPR-cas9 or CRISPR/Cas9? Keep it consistent throughout. CRISPR-cas9 is the latest consensus.

      We corrected it to “CRISPR-Cas9” throughout the manuscript.

      References are missing for several 'obvious statements' but please include them to reach a broader audience. For example the first 5 sentences of the introduction. Also, statements such as 'Root hairs are the primary entry point for rhizobial infection in most legumes.'.

      Thank you for the comment. To make it clearer, we also added reference #1, after the third sentence of the introduction, as well as we added an additional review as reference. This additional review was also cited as the source for the sentence “Root hairs are the primary…” (Please see page 2, revised manuscript)

      Can you provide a percent value? Silencing of RIN4a and RIN4b resulted in significantly reduced nodule numbers on soybean transgenic roots in comparison to transgenic roots carrying the empty vector control. Also, this wording suggests it was a double K.D. but from the images, it appears they were individually silenced.

      We appreciate the reviewer's comment. We observed a 50-70% reduction in the number of nodules. We adjusted the text according to the reviewer's remark. (See page 9, revised manuscript)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      This manuscript reports preliminary evidence of successful optogenetic activation of single retinal ganglion cells (RGCs) through the eye of a living monkey using adaptive optics (AO).

      Strengths

      The eventual goals of this line of research have enormous potential impact in that they will probe the perceptual impact of activating single RGCs. While I think more data should be included, the four examples shown look quite convincing. Weaknesses

      While this is undoubtedly a technical achievement and an important step along this group's stated goal to measure the perceptual consequences of single-RGC activations, the presentation lacks the rigor that I would expect from what is really a methods paper. In my view, it is perfectly reasonable to publish the details of a method before it has yielded any new biological insights, but in those publications, there is a higher burden to report the methodological details, full data sets, calibrations, and limitations of the method. There is considerable room for improvement in reporting those aspects. Specifically, more raw data should be shown for activations of neighboring RGCs to pinpoint the actual resolution of the technique, and more than two cells (one from each field of view) should be tested.

      We have expanded sections discussing both the methodology and limitations of this technique via a rewrite of the results and discussion section. The data used in the paper is available online via the link provided in the manuscript. We agree that a more detailed investigation of the strengths and limitations of the approach would have been a laudable goal. However, before returning to more detailed studies, we have shifted our effort to developing the monkey psychophysical performance we need to combine with the single cell stimulation approach described here. In addition, the optogenetic ChrimsonR used in this study is not the best choice for this experiment because of its poor sensitivity. We are currently exploring the use of ChRmine (as described in lines 93-97), which is roughly 2 orders of magnitude more sensitive. We have also been working on methods to improve probe stabilization to reduce tracking errors during eye movements. Once these improvements have been implemented, we will undertake the more detailed studies suggested here. Nonetheless, as a pragmatic matter, we submit that it is valuable to document proof-of-concept with this manuscript.

      Some information about the density of labeled RGCs in these animals would also be helpful to provide context for how many well-isolated target cells exist per animal.

      We agree. Getting reliable information about labeled cell density would be difficult without detailed histology of the retina, which we are reluctant to do because it would require sacrificing these precious and expensive monkeys from which we continue to get valuable information. We are actively exploring methods to reduce the cell density to make isolation easier including the use of the CAMKII promoter as well as the use of intracranial injections via AAV.retro that would allow calcium indicator expression in the peripheral retina where RGCs form a monolayer. It may be that the rarity of isolated RGCS will not be a fundamental limitation of the approach in the future.

      Reviewer #2 (Public Review):

      This proof-of-principle study lays important groundwork for future studies. Murphy et al. expressed ChrimsonR and GCaMP6s in retinal ganglion cells of a living macaque. They recorded calcium responses and stimulated individual cells, optically. Neurons targeted for stimulation were activated strongly whereas neighboring neurons were not.

      The ability to record from neuronal populations while simultaneously stimulating a subset in a controlled way is a high priority for systems neuroscience, and this has been particularly challenging in primates. This study marks an important milestone in the journey towards this goal.

      The ability to detect stimulation of single RGCs was presumably due to the smallness of the light spot and the sparsity of transduction. Can the authors comment on the importance of the latter factor for their results? Is it possible that the stimulation protocol activated neurons nearby the targeted neuron that did not express GCaMP? Is it possible that off-target neurons near the targeted neuron expressed GCaMP, and were activated, but too weakly to produce a detectable GCaMP signal? In general, simply knowing that off-target signals were undetectable is not enough; knowing something about the threshold for the detection of off-target signals under the conditions of this experiment is critical.

      We agree with these points. We cannot rule out the possibility that some nearby cells were activated but we could not detect this because they did not express GCaMP. We also do not know whether cells responded but our recording methods were not sufficiently sensitive to detect them. A related limitation is that we do not know of course what the relationship is between the threshold for detection with calcium imaging and what the psychophysical detection threshold would have been an awake behaving monkey. Nonetheless, the data show that we can produce a much larger response in the target cell than in nearby cells whose response we can measure, and we suggest that that is a valuable contribution even if we can’t argue that the isolation is absolute. We’ve acknowledged these important limitations in the revised manuscript in lines 66-77.

      Minor comments:

      Did the lights used to stimulate and record from the retina excite RGCs via the normal lightsensing pathway? Were any such responses recorded? What was their magnitude?

      The recording light does activate the normal light-sensing pathway to some extent, although it does not fall upon the RGC receptive fields directly. There was a 30 second adaptation period at the beginning of each trial to minimize the impact of this on the recording of optogeneticallymediated responses, as described in lines 222-224. The optogenetic probe does not appear to significantly excite the cone pathway, and we do not see the expected off-target excitations that would result from this.

      The data presented attest to a lack of crosstalk between targeted and neighboring cells. It is therefore surprising that lines 69-72 are dedicated to methods for "reducing the crosstalk problem". More information should be provided regarding the magnitude of this problem under the current protocol/instrumentation and the techniques that were used to circumvent it to obtain the data presented.

      The “crosstalk problem” referred to in this quote refers to crosstalk caused by targeting cells at higher eccentricities that are more densely packed, which are not represented in the data. The data presented is limited to the more isolated central RGCs.

      Optical crosstalk could be spatial or spectral. Laying out this distinction plainly could help the reader understand the issues quickly. The Methods indicate that cells were chosen on the basis that they were > 20 µm from their nearest (well-labeled) neighbor to mitigate optical crosstalk, but the following sentence is about spectral overlap.

      We have added a clearer explanation of what precisely we mean by crosstalk in lines 213-221.

      Figure 2 legend: "...even the nearby cell somas do not show significantly elevated response (p >> 0.05, unpaired t-test) than other cells at more distant locations." This sentence does not indicate how some cells were classified as "nearby" whereas others were classified as being "at more distant locations". Perhaps a linear regression would be more appropriate than an unpaired t-test here.

      The distinction here between “nearby” and “more distant” is 50 µm. We have clarified this in the figure caption. Performing a linear regression on cell response over distance shows a slight downward trend in two of the four cells shown here, but this trend does not reach the threshold of significance.

      Line 56: "These recordings were... acquired earlier in the session where no stimulus was present." More information should be provided regarding the conditions under which this baseline was obtained. I assume that the ChrimsonR-activating light was off and the 488 nmGCaMP excitation light was on, but this was not stated explicitly. Were any other lights on (e.g. room lights or cone-imaging lights)? If there was no spatial component to the baseline measurement, "where" should be "when".

      Your assumptions are correct. There was no spatial component to the baseline measurement, and these measurements are explained in more detail in lines 240-243.

      Please add a scalebar to Figure 1a to facilitate comparison with Figure 2.

      This has been done.

      Lines 165-173: Was the 488 nm light static or 10 Hz-modulated? The text indicates that GCaMP was excited with a 488 nm light and data were acquired using a scanning light ophthalmoscope, but line 198 says that "the 488 nm imaging light provides a static stimulus".

      The 488nm is effectively modulated at 25 Hz by the scanning action of the system. I believe the 10 Hz modulated you speak of is the closed-loop correction rate of the adaptive optics. The text has been updated in lines 217-219 to clarify this.

      A potential application of this technology is for the study of visually guided behavior in awake macaques. This is an exciting prospect. With that in mind, a useful contribution of this report would be a frank discussion of the hurdles that remain for such application (in addition to eye movements, which are already discussed).

      Lines 109-130 now offer an expanded discussion of this topic.

      Reviewer #3 (Public Review):

      This paper reports a considerable technical achievement: the optogenetic activation of single retinal ganglion cells in vivo in monkeys. As clearly specified in the paper, this is an important step towards causal tests of the role of specific ganglion cell types in visual perception. Yet this methodological advance is not described currently in sufficient detail to replicate or evaluate. The paper could be improved substantially by including additional methodological details. Some specific suggestions follow.

      The start of the results needs a paragraph or more to outline how you got to Figure 1. Figure 1 itself lacks scale bars, and it is unclear, for example, that the ganglion cells targeted are in the foveal slope.

      The results have been rewritten with additional explanation of methodology and the location of the RGCs has been clarified.

      The text mentions the potential difficulties targeting ganglion cells at larger eccentricities where the soma density increases. If this is something that you have tried it would be nice to include some of that data (whether or not selective activation was possible). Related to this point, it would be helpful to include a summary of the ganglion cell density in monkey retina.

      This is not something we tried, as we knew that the axial resolution allowed by the monkey’s eye would result in an axial PSF too large to only hit a single cell. The overall ganglion cell density is less relevant than the density of cells expressing ChrimsonR/GCaMP, which we only have limited info about without detailed histology.

      Related to the point in the previous paragraph - do you have any experiments in which you systematically moved the stimulation spot away from the target ganglion cell to directly test the dependence of stimulation on distance? This would be a valuable addition to the paper.

      We agree that this would have been a valuable addition to the paper, but we are reluctant to do them now. We are implementing an improved method to track the eye and a better optogenetic agent in an entirely new instrument, and we think that future experiments along these lines would be best done when those changes are completed.

      The activity in Figure 1 recovers from activation very slowly - much more slowly than the light response of these cells, and much more slowly than the activity elicited in most optogenetic studies. Can you quantify this time course and comment on why it might be so slow?

      We attribute the slow recovery to the calcium dynamics of the cell, and this slow recovery time is consistent with calcium responses seen in our lab elicited via the cone pathway. Similar time courses can be seen in Yin (2013) for RGCs excited via their cone inputs.

      Traces from non-targeted cells should be shown in Figure 1 along with those of targeted cells.

      We have added this as part of Figure 2.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1:

      The authors addressed my previous concerns successfully. However, some critiques are addressed only in the response letter but not in the text (major comment 3, minor point 2). It will be great if they mention these in some parts of their manuscript.

      Major 3: We now mention the effect of acs-2i on life span in the discussion, lines 475-480:

      “Interestingly, acs-2 knockdown abolished glp-1 longevity (data not shown), consistent with previous work showing that NHR-49, a transcription factor that drives acs-2 expression, is required for glp-1 longevity (Ratnappan et al., 2014). Thus, inhibiting fatty acid β-oxidation promotes MML-1 nuclear localization under hxk-1i but abolishes lifespan extension, potentially due to epistatic effects on other transcription factors or processes.”

      Minor 2: We now speculate on the differences concerning hxk-3 knockdown on MML-1 nuclear localization resulting from the low expression of hxk-3 in adults, lines 99-102:

      “Among the three C. elegans hexokinase genes, hxk-1 and hxk-2 more strongly affected MML 1 nuclear localization in two independent MML-1::GFP reporter strains (Figure 1B, Supplementary Figure 1A), while hxk-3 had just a small effect on MML-1 nuclear localization, probably due to its low expression in adult worms (Hutter & Suh, 2016).”

      Reviewer #2:

      The authors have adequately addressed my previous concerns in their revised manuscript. However, I have one remaining minor concern regarding the link between lipid metabolism and MML-1 regulation. As proposed by the authors, HXKs modulate MML-1 localization between LD/mito and the nucleus. They have provided evidence supporting the roles of hxk-2 and the PPP in this regulatory process. Nonetheless, the involvement of hxk-1 and fatty acid oxidation (FAO) within this proposed framework remains unclear. Although FAO is generally believed to affect LD size, the potential effects of hxk-1 and FAO on LD should be investigated within the current study to further substantiate their model.

      We thank the reviewer for this comment. We now examine how hxk-1 and acs-2 affect lipid droplet size. Interestingly, we found that knockdown of acs-2 and hxk-1 acs-2 double knockdown resulted in a mild but significant increase in LD size (Supplementary Figure 4I), supporting the notion that the two hexokinases regulate MML-1 via distinct mechanisms, reflected in the updated model (Figure 5E).

    1. Author response:

      This study builds on, extends, and experimentally validates results/models from our previous study. Our and others’ data implicated SMC5/6, PML nuclear bodies (PML NBs), and SUMOylation in the transcriptional repression of extrachromosomal circular DNA (ecDNA). Moreover, multiple viruses were found to express early genes that combat SMC5/6-based repression through targeted proteasomal degradation (e.g. Hepatitis B virus HBx and HIV-1 Vpr). Thus, our analysis of the roles of the foregoing in plasmid repression yields a coherent set of results for the field to build on.

      In our previous study we presented a model, but no supportive ecDNA silencing data, suggesting that distinct SMC5/6 subcomplexes, SIMC1-SLF2 and SLF1/2, separately control its transcriptional repression and DNA repair activities. In this study we experimentally validate that prediction using an ecDNA silencing assay and SMC5/6 localization analysis following DNA damage.

      Our study further reveals the unexpected dispensability of PML NBs in the silencing of simple plasmid DNA, a departure from current dogma. This raises important questions for the field to address in terms of the silencing mechanisms for different ecDNAs across different cell types. Despite the dispensability of SUMO-rich PML NBs, SUMOylation is required for ecDNA repression. Lastly, the SV40 LT antigen early gene product counteracts ecDNA silencing. These results used genetic epistasis arguments to implicate SUMO and LT in SMC5/6-based transcriptional silencing. We provide provisional responses for some of the reviewer’s general comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound.

      Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism.

      (3) There are some points about the presented data that need to be clarified.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree, this manuscript fits the Research Advance model, which is the format that this manuscript was submitted in.

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      Please see introductory paragraphs above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Although we have no further revisions on the manuscript, we would like to respond to the remaining comments from the reviewers as follows.

      Reviewer 1:

      The authors have addressed some concerns raised in the initial review but some remain. In particular it is still unclear what conclusions can be drawn about taskrelated activity from scans that are performed 30 minutes after the behavioral task. I continue to think that a reorganization/analysis data according to event type would be useful and easier to interpret across the two brain areas, but the authors did not choose to do this. Finally, switching the cue-response association, I am convinced, would help to strengthen this study.

      As for the task-related activity, the strategy for PET scan was explained in our response to the comment 2 from Reviewer 2. Briefly, rats receive intravenous administration of 18F-FDG solution before the start of the behavioral session. The 18FFDG uptake into the cells starts immediately and reaches the maximum level until 30 min, being kept at least for 1 h. A 30-min PET scan is executed 25 min after the session. Therefore, the brain activity reflects the metabolic state during task performance in rats.

      Regarding data presentation of the electrophysiological experiments, we described the subpopulations of event-related neurons showing notable neuronal activity patterns in the order of aDLS and pVLS, according to the procedure of explanations for the behavioral study

      For switching the cue-response association, we mentioned the difference in firing activity between HR and LL trials, suggesting that different combinations between the stimulus and response may affect the level of firing activity. As suggested by the reviewer, an examination of switching the cue-response association is useful to confirm our interpretation. We will address this issue in our future studies.

      Reviewer 2:

      The authors have made important revisions to the manuscript and it has improved in clarity. They also added several figures in the rebuttal letter to answer questions by the reviewers. I would ask that these figures are also made public as part of the authors' response or if not, included in the manuscript.

      We will present the figures publicly available as part of our response.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, van Paassen et al. have studied how CD8 T cell functionality and levels predict HIV DNA decline. The article touches on interesting facets of HIV DNA decay, but ultimately comes across as somewhat hastily done and not convincing due to the major issues.

      (1) The use of only 2 time points to make many claims about longitudinal dynamics is not convincing. For instance, the fact that raw data do not show decay in intact, but do for defective/total, suggests that the present data is underpowered. The authors speculate that rising intact levels could be due to patients who have reservoirs with many proviruses with survival advantages, but this is not the parsimonious explanation vs the data simply being noisy without sufficient longitudinal follow-up. n=12 is fine, or even reasonably good for HIV reservoir studies, but to mitigate these issues would likely require more time points measured per person.

      (1b) Relatedly, the timing of the first time point (6 months) could be causing a number of issues because this is in the ballpark for when the HIV DNA decay decelerates, as shown by many papers. This unfortunate study design means some of these participants may already have stabilized HIV DNA levels, so earlier measurements would help to observe early kinetics, but also later measurements would be critical to be confident about stability.

      We agree that in order to thoroughly investigate reservoir decay in acutely treated individuals, more participants and/or more time points measured per participant would increase the power of the study and potentially, in line with literature, show a significant decay in intact HIV DNA as well. By its design (1) the NOVA study allows for a detailed longitudinal follow-up of reservoir and immunity from start ART onwards. In the present analysis in the NOVA cohort, we decided to focus on the 24- and 156-week time points. We plan to include more individuals in our analysis in the future, so that we can better model the longitudinal dynamics of the HIV reservoir.

      The main goal of the present study, however, was not to investigate the decay or longitudinal dynamics of the viral reservoir, but to understand the relationship of the HIV-specific CD8 T-cell responses early on ART with the reservoir changes across the subsequent 2.5-year period on suppressive therapy. We will revise the manuscript in order to clarify this. Moreover, we agree with the reviewer that the early time point (24 weeks) is a time at which many virological and immunological processes are ongoing and the reservoir may not have stabilized yet for every participant. We will highlight this in the revised manuscript.

      (2) Statistical analysis is frequently not sufficient for the claims being made, such that overinterpretation of the data is problematic in many places.

      (2a) First, though plausible that cd8s influence reservoir decay, much more rigorous statistical analysis would be needed to assert this directionality; this is an association, which could just as well be inverted (reservoir disappearance drives CD8 T cell disappearance).

      The second point that was raised by reviewer 1 is the statistical analysis, which is referred to as “not sufficient for the claims being made”. Moreover, a more “rigorous statistical analysis would be needed”. At this stage, it is unclear from the reviewer's comments what specific type of additional statistical analysis is being requested. Correlation analyses, such as the one used in this study, are a well-established approach to investigate the relationship between the immune response and reservoir size. However, as we aim to perform the most rigorous analysis possible, for the revised submission we will adjust our analysis for putative confounders (e.g. age and antiretroviral regimen).

      We would also like to note that the association between the CD8 T-cell response at 24 weeks and the subsequent decline (the difference between 24 and 156 weeks) in the reservoir cannot be bi-directional (that can only be the case when both variables are measured at the same time point).

      (2b) Words like "strong" for correlations must be justified by correlation coefficients, and these heat maps indicate many comparisons were made, such that p-values must be corrected appropriately.

      For the revised submission, we will provide correlation coefficients to justify the wording, and will adjust the p-values for multiple comparisons.

      (3) There is not enough introduction and references to put this work in the context of a large/mature field. The impacts of CD8s in HIV acute infection and HIV reservoirs are both deep fields with a lot of complexity.

      Lastly, reviewer 1 referred to the introduction and asked for more references and a more focused viewpoint because the field is large and complex. We aim to revise the introduction/discussion based on the suggestions from the reviewer.

      Reviewer #2 (Public review):

      Summary:

      This study investigated the impact of early HIV specific CD8 T cell responses on the viral reservoir size after 24 weeks and 3 years of follow-up in individuals who started ART during acute infection. Viral reservoir quantification showed that total and defective HIV DNA, but not intact, declined significantly between 24 weeks and 3 years post-ART. The authors also showed that functional HIV-specific CD8⁺ T-cell responses persisted over three years and that early CD8⁺ T-cell proliferative capacity was linked to reservoir decline, supporting early immune intervention in the design of curative strategies.

      Strengths:

      The paper is well written, easy to read, and the findings are clearly presented. The study is novel as it demonstrates the effect of HIV specific CD8 T cell responses on different states of the HIV reservoir, that is HIV-DNA (intact and defective), the transcriptionally active and inducible reservoir. Although small, the study cohort was relevant and well-characterized as it included individuals who initiated ART during acute infection, 12 of whom were followed longitudinally for 3 years, providing unique insights into the beneficial effects of early treatment on both immune responses and the viral reservoir. The study uses advanced methodology. I enjoyed reading the paper.

      Weaknesses:

      All participants were male (acknowledged by the authors), potentially reducing the generalizability of the findings to broader populations. A control group receiving ART during chronic infection would have been an interesting comparison.

      We thank the reviewer for their appreciation of our study. The reviewer raises the point that it would be useful to compare our data to a control group. Unfortunately, these samples are not yet available, but our study protocol allows for a control group (chronic infection) to ensure we can include a control group in the future.

      (1) Dijkstra M, Prins H, Prins JM, Reiss P, Boucher C, Verbon A, et al. Cohort profile: the Netherlands Cohort Study on Acute HIV infection (NOVA), a prospective cohort study of people with acute or early HIV infection who immediately initiate HIV treatment. BMJ Open. 2021;11(11):e048582.

    1. Author response:

      We thank you and the reviewers very much for the insightful comments on our manuscript. We plan to revise the manuscript as follows:

      (A) As suggested by Reviewer 1, we will carefully read through the entire manuscript and try to improve its clarity. Regarding the comments and recommendations from Reviewer 2, we plan to address the first recommendation and the specific comments about the analysis of DNA methylation. We can currently not address the second recommendation because the person responsible for gathering the data works at a different university now. However, we keep this in mind for future projects.

      (B) Regarding the two main comments of Reviewer 2, we plan the following:

      (1) The authors group their methylation analysis by sequence context (CG, CHG, CHH). I feel this is insufficient, because CG methylation can appear in two distinct forms: gene body methylation (gbM), which is CG-only methylation within genes, and transposable element (TE) and TE-like methylation (teM), which typically involves all sequence contexts and generally affects TEs, but can also be found within genes. GbM and teM have distinct epigenetic dynamics, and it is hard to know how methylation patterns are changing during the experiment if gbM and teM are mixed. This can also have downstream consequences (see point below).

      We thank Reviewer 2 for this suggestion. We usually separate the three contexts because they are set by different enzymes and not because of the entire process or function. It would indeed be informative to group DMCs into gbM and teM but as there are many regions with overlaps between genes and transposons, this also adds some complexity. Given that there were very few DMCs, we wanted to keep it short and simple. Therefore, we wrote that 87.3% of the DMCs were close to or within genes and that 98.1% were close to and within genes or transposons. Together with the clear overrepresentation of the CG context, this indicates that most of the DMCs were related to gbM. We will update the paragraph and specifically refer to gbM to make this clear.

      (2) For GO analysis, the authors use all annotated genes as a control. However, most of the methylation differences they observe are likely gbM, and gbM genes are not representative of all genes. The authors' results might therefore be explained purely as a consequence of analyzing gbM genes, and not an enrichment of methylation changes in any particular GO group.

      This indeed a point worth considering. We will update the GO analysis and define the background as genes with cytosines that we tested for differences in methylation and which also exhibited overall at least 10% methylation (i.e., one cytosine per gene was sufficient). This will reduce the background gene set from 34'615 to 18'315 genes. A first analysis shows that results will change with respect to the post-translational protein modifications but will remain similar for epigenetic regulation and terms related to transport and growth processes. We will update the paragraph accordingly.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e., 13B onto 13A, or among each other, i.e., 13As onto other 13As, and/or onto leg motoneurons, i.e., 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories, with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to a few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly affect leg grooming. As well as activating or silencing subpopulations, i.e., 3 to 6 elements of the 13A and 13B groups, has marked effects on leg grooming, including frequency and joint positions, and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e., feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e., grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects the generation of the motor behavior, thereby exemplifying their important role in generating grooming.

      We thank the reviewer for their thoughtful and constructive evaluation of our work. We are encouraged by their recognition of the major contributions of our study, including the identification of multiple inhibitory circuit motifs and their contribution to organizing rhythmic leg grooming behavior. We also appreciate the reviewer’s comments highlighting our use of connectomics, targeted manipulations, and modeling to reveal how distinct subsets of inhibitory interneurons contribute to motor behavior.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow for differentiation between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so, open loop experiments, e.g., in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      We appreciate the reviewer’s point regarding the role of sensory feedback in our experimental design. We agree that reafferent (sensory) input from ongoing movements could contribute to the behavioral outcomes of our optogenetic manipulations. However, our aim was not to isolate central versus peripheral contributions, but rather to assess the role of 13A/B neurons within the intact, operational sensorimotor system during natural grooming behavior.

      These inhibitory neurons form recurrent loops, synapse onto motor neurons, and receive proprioceptive input—placing them in a position to both shape central motor output and process sensory feedback. As such, manipulating their activity engages both central control and sensory consequences.

      The finding that silencing 13A neurons in dusted flies disrupts rhythmic leg coordination highlights their role in organizing grooming movements. Prior studies (e.g., Ravbar et al., 2021) show that grooming rhythms persist when sensory input is reduced, indicating a central origin, while sensory feedback refines timing, coordination, and long-timescale stability. We concluded that rhythmicity arises centrally but is shaped and stabilized by mechanosensory or proprioceptive feedback. Our current results are consistent with this view and support a model in which inhibitory premotor neurons participate in a closed-loop control architecture that generates and tunes rhythmic output.

      While we agree that fully removing sensory feedback and parsing distinct roles for neurons that participate in multiple circuit motifs would be desirable, we do not see a plausible experimental path to accomplish this - we would welcome suggestions!

      We considered the method used by Mendes and Mann (eLife 2023) to assess sensory feedback to walking, 5-40-GAL4, DacRE-flp, UAS->stop>TNT + 13A/B-spGAL4 X UAS-csChrimson. This would require converting one targeting system to LexA and presents significant technical challenges. More importantly, we believe the core interpretation issue would remain: broadly silencing proprioceptors would produce pleiotropic effects and impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input.

      We will clarify in the revised manuscript that our behavioral experiments were performed in freely moving flies under closed-loop conditions. We thank the reviewer for highlighting these important considerations and will revise the manuscript to better communicate the scope and interpretation of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      We thank the reviewer for their thoughtful and encouraging evaluation of our work. We are especially grateful for their recognition of our detailed connectome analysis and its contribution to understanding the organization of premotor inhibitory circuits. We appreciate the reviewer’s comments highlighting the integration of connectomics with optogenetic perturbations to functionally interrogate the 13A and 13B circuits, as well as their recognition of our modeling approach as a valuable framework for linking circuit architecture to behavior.

      Weaknesses:

      (1) In Figure 4, while the authors report statistically significant shifts in both proximal inter-leg distance and movement frequency across conditions, the distributions largely overlap, and only in Panel K (13B silencing) is there a noticeable deviation from the expected 7-8 Hz grooming frequency. Could the authors clarify whether these changes truly reflect disruption of the grooming rhythm?

      We are re-analyzing the whole dataset in the light of the reviews (specifically, we are now applying LMM to these statistics). For the panels in question (H-J), there is indeed a large overlap between the frequency distributions, but the box plots show median and quartiles, which partially overlap. (In the current analysis, as it stands, differences in means were small yet significant.) However, there is a noticeable (not yet quantified) difference in variability between the frequencies (the experimental group being the more variable one). If the activations/deactivations of 13A/B circuits disrupt the rhythm, we would indeed expect the frequencies to become more variable. So, in the revised version we will quantify the differences in both the means and the variabilities, and establish whether either shows significance after applying the LMM.

      More importantly, all this data would make the most sense if it were performed in undusted flies (with controls) as is done in the next figure.

      In our assay conditions, undusted flies groom infrequently. We used undusted flies for some optogenetic activation experiments, where the neuron activation triggers behavior initiation, but we chose to analyze the effect of silencing inhibitory neurons in dusted flies because dust reliably activates mechanosensory neurons and elicits robust grooming behavior, enabling us to assess how manipulation of 13A/B neurons alters grooming rhythmicity and leg coordination.

      (2) In Figure 4-Figure Supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (3) For broader lines targeting six or more 13A neurons, the authors provide specific predictions about expected behavioral effects-e.g., that activation should bias the limb toward flexion and silencing should bias toward extension based on connectivity to motor neurons. Yet, when using the more restricted line labeling only two 13A neurons (Figure 4 - Figure Supplement 2), no such prediction is made. The authors report disrupted grooming but do not specify whether the disruption is expected to bias the movement toward flexion or extension, nor do they discuss the muscle target. This is a missed opportunity to apply the same level of mechanistic reasoning that was used for broader manipulations.

      While we know which two neurons are labeled based on confocal expression, assigning their exact identity in the EM datasets has been challenging. One of these neurons appears absent from our 13A reconstructions of the right T1 neuropil in FANC, although we did locate it in MANC. However, its annotation in MANC has undergone multiple revisions, making confident assignment difficult at this time. Since we can’t be sure which motor neurons and muscles are most directly connected, we did not want to predict this line’s effect on leg movements.

      (4) Regarding Figure 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing that the authors get the behavior! It would still be important for the authors to mention the optogenetics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also surprised - and intrigued - by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We tried several different activation paradigms: pulsed from 8Hz to 500Hz and with various on/off intervals. Because several of these different stimulation protocols resulted in grooming, and with different rhythmic frequencies, we think the phenotypes are a specific property of the neural circuits we have activated, rather than the kinetics of CsChrimson itself.

      We will include the data from other frequencies in a new Supplementary Figure, we will discuss the caveats CsChrimson’s slow off-kinetics present to precise temporal control of neural activity, and we will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Thank you!

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study, in its current form, makes an important but overclaimed contribution to the literature due to a mismatch between the claims in the paper and the data presented.

      Strengths:

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      (1) They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      (2) They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      (3) They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      We appreciate the reviewer’s thorough and constructive feedback on our work. We are encouraged by their recognition of the complementary approaches used in our study.

      Weaknesses:

      The manuscript aims to reveal an instructive, rhythm-generating role for premotor inhibition in coordinating the multi-joint leg synergies underlying grooming. It makes a valuable contribution, but currently, the main claims in the paper are not well-supported by the presented evidence.

      Major points

      (1) Starting with the title of this manuscript, "Inhibitory circuits generate rhythms for leg movements during Drosophila grooming", the authors raise the expectation that they will show that the 13A and 13B hemilineages produce rhythmic output that underlies grooming. This manuscript does not show that. For instance, to test how they drive the rhythmic leg movements that underlie grooming requires the authors to test whether these neurons produce the rhythmic output underlying behavior in the absence of rhythmic input. Because the optogenetic pulses used for stimulation were rhythmic, the authors cannot make this point, and the modelling uses a "black box" excitatory network, the output of which might be rhythmic (this is not shown). Therefore, the evidence (behavioral entrainment; perturbation effects; computational model) is all indirect, meaning that the paper's claim that "inhibitory circuits generate rhythms" rests on inferred sufficiency. A direct recording (e.g., calcium imaging or patch-clamp) from 13A/13B during grooming - outside the scope of the study - would be needed to show intrinsic rhythmogenesis. The conclusions drawn from the data should therefore be tempered. Moreover, the "black box" needs to be opened. What output does it produce? How exactly is it connected to the 13A-13B circuit?

      We will modify the title to better reflect our strongest conclusions: “Inhibitory circuits coordinate rhythmic leg movements during Drosophila grooming”

      Our optogenetic activation was delivered in a patterned (70 ms on/off) fashion that entrains rhythmic movements but does not rule out the possibility that the rhythm is imposed externally. In the manuscript, we state that we used pulsed light to mimic a flexion-extension cycle and note that this approach tests whether inhibition is sufficient to drive rhythmic leg movements when temporally patterned. While this does not prove that 13A/13B neurons are intrinsic rhythm generators, it does demonstrate that activating subsets of inhibitory neurons is sufficient to elicit alternating leg movements resembling natural grooming and walking.

      Our goal with the model was to demonstrate that it is possible to produce rhythmic outputs with this 13A/B circuit, based on the connectome. The “black box” is a small recurrent neural network (RNN) consisting of 40 neurons in its hidden layer. The inputs are the “dust” levels from the environment (the green pixels in Figure 6I), the “proprioceptive” inputs (“efference copy” from motor neurons), and the amount of dust accumulated on both legs. The outputs (all positive) connect to the 13A neurons, the 13B neurons, and to the motor neurons. We refer to it as the “black box” because we make no claims about the actual excitatory inputs to these circuits. Its function is to provide input, needed to run the network, that reflects the distribution of “dust” in the environment as well as the information about the position of the legs.

      The output of the “black box” component of the model might be rhythmic. In fact, in most instances of the model implementation this is indeed the case. However, as mentioned in the current version of the manuscript: “But the 13A circuitry can still produce rhythmic behavior even without those external sensory inputs (or when set to a constant value), although the legs become less coordinated.” Indeed, when we refine the model (with the evolutionary training) without the “black box” (using a constant input of 0.1) the behavior is still rhythmic and sustained. Therefore, the rhythmic activity and behavior can emerge from the premotor circuitry itself without a rhythmic input.

      The context in which the 13A and 13B hemilineages sit also needs to be explained. What do we know about the other inputs to the motorneurons studied? What excitatory circuits are there?

      We agree that there are many more excitatory and inhibitory, direct and indirect, connections to motor neurons that will also affect leg movements for grooming and walking. Our goal was to demonstrate what is possible from a constrained circuit of inhibitory neurons that we mapped in detail, and we hope to add additional components to better replicate the biological circuit as behavioral and biomechanical data is obtained by us and others. We will add this clarification of the limits of the scope to the Discussion.

      Furthermore, the introduction ignores many decades of work in other species on the role of inhibitory cell types in motor systems. There is some mention of this in the discussion, but even previous work in Drosophila larvae is not mentioned, nor crustacean STG, nor any other cell types previously studied. This manuscript makes a valuable contribution, but it is not the first to study inhibition in motor systems, and this should be made clear to the reader.

      We thank the reviewer for this important reminder and we will expand our discussion of the relevant history and context in our revision. Previous work on the contribution of inhibitory neurons to invertebrate motor control certainly influenced our research and we should acknowledge this better.

      (2) The experimental evidence is not always presented convincingly, at times lacking data, quantification, explanation, appropriate rationales, or sufficient interpretation.

      We are committed to improving the clarity, rationale, and completeness of our experimental descriptions. We will revisit the statistical tests applied throughout the manuscript and expand the Methods.

      (3) The statistics used are unlike any I remember having seen, essentially one big t-test followed by correction for multiple comparisons. I wonder whether this approach is optimal for these nested, high‐dimensional behavioral data. For instance, the authors do not report any formal test of normality. This might be an issue given the often skewed distributions of kinematic variables that are reported. Moreover, each fly contributes many video segments, and each segment results in multiple measurements. By treating every segment as an independent observation, the non‐independence of measurements within the same animal is ignored. I think a linear mixed‐effects model (LMM) or generalized linear mixed model (GLMM) might be more appropriate.

      We thank the reviewer for raising this important point regarding the statistical treatment of our segmented behavioral data. Our initial analysis used independent t-tests with Bonferroni correction across behavioral classes and features, which allowed us to identify broad effects. However, we acknowledge that this approach does not account for the nested structure of the data. To address this, we will re-analyze key comparisons using linear mixed-effects models (LMMs) as suggested by the reviewer. This approach will allow us to more appropriately model within-fly variability and test the robustness of our conclusions. We will update the manuscript based on the outcomes of these analyses.

      (4) The manuscript mentions that legs are used for walking as well as grooming. While this is welcome, the authors then do not discuss the implications of this in sufficient detail. For instance, how should we interpret that pulsed stimulation of a subset of 13A neurons produces grooming and walking behaviours? How does neural control of grooming interact with that of walking?

      We do not know how the inhibitory neurons we investigated will affect walking or how circuits for control of grooming and walking might compete. We speculate that overlapping pre-motor circuits may participate in walking and grooming because both behaviors have extension flexion cycles at similar frequencies, but we do not have hard experimental data to support. This would be an interesting area for future research. Here, we focused on the consequences of activating specific 13A/B neurons during grooming because they were identified through a behavioral screen for grooming disruptions, and we had developed high-resolution assays and familiarity with the normal movements in this behavior. We will clarify this rationale in the revised discussion.

      (5) The manuscript needs to be proofread and edited as there are inconsistencies in labelling in figures, phrasing errors, missing citations of figures in the text, or citations that are not in the correct order, and referencing errors (examples: 81 and 83 are identical; 94 is missing in text).

      We will carefully proofread the manuscript to fix all figure labeling, citation order, and referencing errors.

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      We carefully read through the second-round reviews and the additional reviews. To us, the review process is somewhat unusual and very much dominated by referee 2, who aggressively insists that we mixed up the trigeminal nucleus and inferior olive and that as a consequence our results are meaningless. We think the stance of referee 2 and the focus on one single issue (the alleged mix-up of trigeminal nucleus and inferior olive) is somewhat unfortunate, leaves out much of our findings and we debated at length on how to deal with further revisions. In the end, we decided to again give priority to addressing the criticism of referees 2, because it is hard to go on with a heavily attacked paper without resolving the matter at stake. The following is a summary of, what we did:

      Additional experimental work:

      (1) We checked if the peripherin-antibody indeed reliably identifies climbing fibers.

      To this end, we sectioned the elephant cerebellum and stained sections with the peripherin-antibody. We find: (i) the cerebellar white matter is strongly reactive for peripherin-antibodies, (ii) cerebellar peripherin-antibody staining of has an axonal appearance. (iii) Cerebellar Purkinje cell somata appear to be ensheated by peripherin-antibody staining. (iv) We observed that the peripherin-antibody reactivity gradually decreases from Purkinje cell somata to the pia in the cerebellar molecular layer. This work is shown in our revised Figure 2. All these four features align with the distribution of climbing fibers (which arrive through the white matter, are axons, ensheat Purkinje cell somata, and innervate Purkinje cell proximally not reaching the pia). In line with previous work, which showed similar cerebellar staining patterns in several species (Errante et al. 1998), we conclude that elephant climbing fibers are strongly reactive for peripherin-antibodies.

      (2) We delineated the elephant olivo-cerebellar tract.

      The strong peripherin-antibody reactivity of elephant climbing fibers enabled us to delineate the elephant olivo-cerebellar tract. We find the elephant olivo-cerebellar tract is a strongly peripherin-antibody reactive, well-delineated fiber tract several millimeters wide and about a centimeter in height. The unstained olivo-cerebellar tract has a greyish appearance. In the anterior regions of the olivo-cerebellar tract, we find that peripherin-antibody reactive fibers run in the dorsolateral brainstem and approach the cerebellar peduncle, where the tract gradually diminishes in size, presumably because climbing fibers discharge into the peduncle. Indeed, peripherin-antibody reactive fibers can be seen entering the cerebellar peduncle. Towards the posterior end of the peduncle, the olivo-cerebellar disappears (in the dorsal brainstem directly below the peduncle. We note that the olivo-cerebellar tract was referred to as the spinal trigeminal tract by Maseko et al. 2013. We think the tract in question cannot be the spinal trigeminal tract for two reasons: (i) This tract is the sole brainstem source of peripherin-positive climbing fibers entering the peduncle/ the cerebellum; this is the defining characteristic of the olivo-cerebellar tract. (ii) The tract in question is much smaller than the trigeminal nerve, disappears posterior to where the trigeminal nerve enters the brainstem (see below), and has no continuity with the trigeminal nerve; the continuity with the trigeminal nerve is the defining characteristic of the spinal trigeminal tract, however.

      The anterior regions of the elephant olivo-cerebellar tract are similar to the anterior regions of olivo-cerebellar tract of other mammals in its dorsolateral position and the relation to the cerebellar peduncle. In its more posterior parts, the elephant olivo-cerebellar tract continues for a long distance (~1.5 cm) in roughly the same dorsolateral position and enters the serrated nucleus that we previously identified as the elephant inferior olive. The more posterior parts of the elephant olivo-cerebellar tract therefore differ from the more posterior parts of the olivo-cerebellar tract of other mammals, which follows a ventromedial trajectory towards a ventromedially situated inferior olive. The implication of our delineation of the elephant olivo-cerebellar tract is that we correctly identified the elephant inferior olive.

      (3) An in-depth analysis of peripherin-antibody reactivity also indicates that the trigeminal nucleus receives no climbing fiber input.

      We also studied the peripherin-antibody reactivity in and around the trigeminal nucleus. We had also noted in the previous submission that the trigeminal nucleus is weakly positive for peripherin, but that the staining pattern is uniform and not the type of axon bundle pattern that is seen in the inferior olive of other mammals. To us, this observation already argued against the presence of climbing fibers in the trigeminal nucleus. We also noted that the myelin stripes of the trigeminal nucleus were peripherin-antibody-negative. In the context of our olivo-cerebellar tract tracing we now also scrutinized the surroundings of the trigeminal nucleus for peripherin-antibody reactivity. We find that the ventral brainstem surrounding the trigeminal nucleus is devoid of peripherin-antibody reactivity. Accordingly, no climbing fibers, (which we have shown to be strongly peripherin-antibody-positive, see our point 1) arrive at the trigeminal nucleus. The absence of climbing fiber input indicates that previous work that identified the (trigeminal) nucleus as the inferior olive (Maseko et al 2013) is unlikely to be correct.

      (4) We characterized the entry of the trigeminal nerve into the elephant brain.

      To better understand how trigeminal information enters the elephant’s brain, we characterized the entry of the trigeminal nerve. This analysis indicated to us that the trigeminal nerve is not continuous with the olivo-cerebellar tract (the spinal trigeminal tract of Maseko et al. 2013) as previously claimed by Maseko et al. 2013. We show some of this evidence in Referee-Figure 1 below. The reason we think the trigeminal nerve is discontinuous with the olivo-cerebellar tract is the size discrepancy between the two structures. We first show this for the tracing data of Maseko et al. 2013. In the Maseko et al. 2013 data the trigeminal nerve (Referee-Figure 1A, their plate Y) has 3-4 times the diameter of the olivocerebellar tract (the alleged spinal trigeminal tract, Referee-Figure 1B, their plate Z). Note that most if not all trigeminal fibers are thought to continue from the nerve into the trigeminal tract (see our rat data below). We plotted the diameter of the trigeminal nerve and diameter of the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) from the Maseko et al. 2013 data (Referee-Figure 1C) and we found that the olivocerebellar tract has a fairly consistent diameter (46 ± 9 mm2, mean ± SD). Statistical considerations and anatomical evidence suggest that the tracing of the trigeminal nerve into the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) is almost certainly wrong. The most anterior point of the alleged spinal trigeminal tract has a diameter of 51 mm2 which is more than 15 standard deviations different from the most posterior diameter (194 mm2) of the trigeminal tract. For this assignment to be correct three-quarters of trigeminal nerve fibers would have to spontaneously disappear, something that does not happen in the brain. We also made similar observations in the African elephant Bibi, where the trigeminal nerve (Referee-Figure 1D) is much larger in diameter than the olivocerebellar tract (Referee-Figure 1E). We could also show that the olivocerebellar tract disappears into the peduncle posterior to where the trigeminal nerve enters (Referee-Figure 1F). Our data are very similar to Maseko et al. indicating that their outlining of structures was done correctly. What appears to have been oversimplified, is the assignment of structures as continuous. We also quantified the diameter of the trigeminal nerve and the spinal trigeminal tract in rats (from the Paxinos & Watson atlas; Referee-Figure 1D); as expected we found the trigeminal nerve and spinal trigeminal tract diameters are essentially continuous.

      In our hands, the trigeminal nerve does not continue into a well-defined tract that could be traced after its entry. In this regard, it differs both from the olivo-cerebellar tract of the elephant or the spinal trigeminal tract of the rodent, both of which are well delineated. We think the absence of a well-delineated spinal trigeminal tract in elephants might have contributed to the putative tracing error highlighted in our Referee-Figure 1A-C.

      We conclude that a size mismatch indicates trigeminal fibers do not run in the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013).

      Author response image 1.

      The trigeminal nerve is discontinuous with the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013). A, Trigeminal nerve (orange) in the brain of African elephant LAX as delineated by Maseko et al. 2013 (coronal section; their plate Y). B, Most anterior appearance of the spinal trigeminal tract of Maseko et al. 2013 (blue; coronal section; their plate Z). Note the much smaller diameter of the spinal trigeminal tract compared to the trigeminal nerve shown in C, which argues against the continuity of the two structures. Indeed, our peripherin-antibody staining showed that the spinal trigeminal tract of Maseko corresponds to the olivo-cerebellar tract and is discontinuous with the trigeminal nerve. C, Plot of the trigeminal nerve and olivo-cerebellar tracts (the spinal trigeminal tract according to Maseko et al. 2013) diameter along the anterior-posterior axis. The trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013). C, D measurements, for which sections are shown in panels C and D respectively. The olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013) has a consistent diameter; data replotted from Maseko et al. 2013. At mm 25 the inferior olive appears. D, Trigeminal nerve entry in the brain of African elephant Bibi; our data, coronal section, the trigeminal nerve is outlined in orange, note the large diameter. E, Most anterior appearance of the olivo-cerebellar tract in the brain of African elephant Bibi; our data, coronal section, approximately 3 mm posterior to the section shown in A, the olivocerebellar tract is outlined in blue. Note the smaller diameter of the olivo-cerebellar tract compared to the trigeminal nerve, which argues against the continuity of the two structures. F, Plot of the trigeminal nerve and olivo-cerebellar tract diameter along the anterior-posterior axis. The nerve and olivo-cerebellar tract are discontinuous and the trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013); our data. D, E measurements, for which sections are shown in panels D and E respectively. At mm 27 the inferior olive appears. G, In the rat the trigeminal nerve is continuous in size with the spinal trigeminal tract. Data replotted from Paxinos and Watson.

      Reviewer 2 (Public Review):

      As indicated in my previous review of this manuscript (see above), it is my opinion that the authors have misidentified, and indeed switched, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex (Vsens). It is this specific point only that I will address in this second review, as this is the crucial aspect of this paper - if the identification of these nuclear complexes in the elephant brainstem by the authors is incorrect, the remainder of the paper does not have any scientific validity.

      Comment: We agree with the referee that it is most important to sort out, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex, respectively.Change: We did additional experimental work to resolve this matter as detailed at the beginning of our response. Specifically, we ascertained that elephant climbing fibers are strongly peripherin-positive. Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum (the referee refers to this structure as the trigeminal nuclear complex). We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Author response image 1). These novel findings support our ideas but are very difficult to reconcile with the referee’s partitioning scheme.

      The authors, in their response to my initial review, claim that I "bend" the comparative evidence against them. They further claim that as all other mammalian species exhibit a "serrated" appearance of the inferior olive, and as the elephant does not exhibit this appearance, that what was previously identified as the inferior olive is actually the trigeminal nucleus and vice versa. 

      For convenience, I will refer to IOM and VsensM as the identification of these structures according to Maseko et al (2013) and other authors and will use IOR and VsensR to refer to the identification forwarded in the study under review. <br /> The IOM/VsensR certainly does not have a serrated appearance in elephants. Indeed, from the plates supplied by the authors in response (Referee Fig. 2), the cytochrome oxidase image supplied and the image from Maseko et al (2013) shows a very similar appearance. There is no doubt that the authors are identifying structures that closely correspond to those provided by Maseko et al (2013). It is solely a contrast in what these nuclear complexes are called and the functional sequelae of the identification of these complexes (are they related to the trunk sensation or movement controlled by the cerebellum?) that is under debate.

      Elephants are part of the Afrotheria, thus the most relevant comparative data to resolve this issue will be the identification of these nuclei in other Afrotherian species. Below I provide images of these nuclear complexes, labelled in the standard nomenclature, across several Afrotherian species. 

      (A) Lesser hedgehog tenrec (Echinops telfairi) 

      Tenrecs brains are the most intensively studied of the Afrotherian brains, these extensive neuroanatomical studies undertaken primarily by Heinz Künzle. Below I append images (coronal sections stained with cresol violet) of the IO and Vsens (labelled in the standard mammalian manner) in the lesser hedgehog tenrec. It should be clear that the inferior olive is located in the ventral midline of the rostral medulla oblongata (just like the rat) and that this nucleus is not distinctly serrated. The Vsens is located in the lateral aspect of the medulla skirted laterally by the spinal trigeminal tract (Sp5). These images and the labels indicating structures correlate precisely with that provide by Künzle (1997, 10.1016, see his Figure 1K,L. Thus, in the first case of a related species, there is no serrated appearance of the inferior olive, the location of the inferior olive is confirmed through connectivity with the superior colliculus (a standard connection in mammals) by Künzle (1997), and the location of Vsens is what is considered to be typical for mammals. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      (B) Giant otter shrew (Potomogale velox) 

      The otter shrews are close relatives of the Tenrecs. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see hints of the serration of the IO as defined by the authors, but we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      (C) Four-toed sengi (Petrodromus tetradactylus) 

      The sengis are close relatives of the Tenrecs and otter shrews, these three groups being part of the Afroinsectiphilia, a distinct branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see vague hints of the serration of the IO (as defined by the authors), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      (D) Rock hyrax (Procavia capensis) 

      The hyraxes, along with the sirens and elephants form the Paenungulata branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per the standard mammalian anatomy. Here we see hints of the serration of the IO (as defined by the authors), but we also see evidence of a more "bulbous" appearance of subnuclei of the IO (particularly the principal nucleus), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      (E) West Indian manatee (Trichechus manatus) 

      The sirens are the closest extant relatives of the elephants in the Afrotheria. Below I append images of cresyl violet (top) and myelin (bottom) stained coronal sections (taken from the University of Wisconsin-Madison Brain Collection, https://brainmuseum.org, and while quite low in magnification they do reveal the structures under debate) through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see the serration of the IO (as defined by the authors). Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      These comparisons and the structural identification, with which the authors agree as they only distinguish the elephants from the other Afrotheria, demonstrate that the appearance of the IO can be quite variable across mammalian species, including those with a close phylogenetic affinity to the elephants. Not all mammal species possess a "serrated" appearance of the IO. Thus, it is more than just theoretically possible that the IO of the elephant appears as described prior to this study. 

      So what about elephants? Below I append a series of images from coronal sections through the African elephant brainstem stained for Nissl, myelin, and immunostained for calretinin. These sections are labelled according to standard mammalian nomenclature. In these complete sections of the elephant brainstem, we do not see a serrated appearance of the IOM (as described previously and in the current study by the authors). Rather the principal nucleus of the IOM appears to be bulbous in nature. In the current study, no image of myelin staining in the IOM/VsensR is provided by the authors. However, in the images I provide, we do see the reported myelin stripes in all stains - agreement between the authors and reviewer on this point. The higher magnification image to the bottom left of the plate shows one of the IOM/VsensR myelin stripes immunostained for calretinin, and within the myelin stripes axons immunopositive for calretinin are seen (labelled with an arrow). The climbing fibres of the elephant cerebellar cortex are similarly calretinin immunopositive (10.1159/000345565). In contrast, although not shown at high magnification, the fibres forming the Sp5 in the elephant (in the Maseko description, unnamed in the description of the authors) show no immunoreactivity to calretinin. 

      Comment: We appreciate the referee’s additional comments. We concede the possibility that some relatives of elephants have a less serrated inferior olive than most other mammals. We maintain, however, that the elephant inferior olive (our Figure 1J) has the serrated appearance seen in the vast majority of mammals.

      Change: None.

      Peripherin Immunostaining 

      In their revised manuscript the authors present immunostaining of peripherin in the elephant brainstem. This is an important addition (although it does replace the only staining of myelin provided by the authors which is unusual as the word myelin is in the title of the paper) as peripherin is known to specifically label peripheral nerves. In addition, as pointed out by the authors, peripherin also immunostains climbing fibres (Errante et al., 1998). The understanding of this staining is important in determining the identification of the IO and Vsens in the elephant, although it is not ideal for this task as there is some ambiguity. Errante and colleagues (1998; Fig. 1) show that climbing fibres are peripherin-immunopositive in the rat. But what the authors do not evaluate is the extensive peripherin staining in the rat Sp5 in the same paper (Errante et al, 1998, Fig. 2). The image provided by the authors of their peripherin immunostaining (their new Figure 2) shows what I would call the Sp5 of the elephant to be strongly peripherin immunoreactive, just like the rat shown in Errant et al (1998), and more over in the precise position of the rat Sp5! This makes sense as this is where the axons subserving the "extraordinary" tactile sensitivity of the elephant trunk would be found (in the standard model of mammalian brainstem anatomy). Interestingly, the peripherin immunostaining in the elephant is clearly lamellated...this coincides precisely with the description of the trigeminal sensory nuclei in the elephant by Maskeo et al (2013) as pointed out by the authors in their rebuttal. Errante et al (1998) also point out peripherin immunostaining in the inferior olive, but according to the authors this is only "weakly present" in the elephant IOM/VsensR. This latter point is crucial. Surely if the elephant has an extraordinary sensory innervation from the trunk, with 400 000 axons entering the brain, the VsensR/IOM should be highly peripherin-immunopositive, including the myelinated axon bundles?! In this sense, the authors argue against their own interpretation - either the elephant trunk is not a highly sensitive tactile organ, or the VsensR is not the trigeminal nuclei it is supposed to be. 

      Comment: We made sure that elephant climbing fibers are strongly peripherin-positive (our revised Figure 2). As we noted in already our previous ms, we see weak diffuse peripherin-reactivity in the trigeminal nucleus (the inferior olive according to the referee), but no peripherin-reactive axon bundles (i.e. climbing fibers) that are seen in the inferior olive of other species. We also see no peripherin-reactive axon bundles (i.e. the olivo-cerebellar tract) arriving in the trigeminal nucleus as the tissue surrounding the trigeminal nucleus is devoid of peripherin-reactivity. Again, this finding is incompatible with the referee’s ideas. As far as we can tell, the trigeminal fibers are not reactive for peripherin in the elephant, i.e. we did not observe peripherin-reactivity very close to the nerve entry, but unfortunately, we did not stain for peripherin-reactivity into the nerve. As the referee alludes to the absence of peripherin-reactivity in the trigeminal tract is a difference between rodents and elephants.

      Change: Our novel Figure 2.

      Summary: 

      (1) Comparative data of species closely related to elephants (Afrotherians) demonstrates that not all mammals exhibit the "serrated" appearance of the principal nucleus of the inferior olive. 

      (2) The location of the IO and Vsens as reported in the current study (IOR and VsensR) would require a significant, and unprecedented, rearrangement of the brainstem in the elephants independently. I argue that the underlying molecular and genetic changes required to achieve this would be so extreme that it would lead to lethal phenotypes. Arguing that the "switcheroo" of the IO and Vsens does occur in the elephant (and no other mammals) and thus doesn't lead to lethal phenotypes is a circular argument that cannot be substantiated. 

      (3) Myelin stripes in the subnuclei of the inferior olivary nuclear complex are seen across all related mammals as shown above. Thus, the observation made in the elephant by the authors in what they call the VsensR, is similar to that seen in the IO of related mammals, especially when the IO takes on a more bulbous appearance. These myelin stripes are the origin of the olivocerebellar pathway, and are indeed calretinin immunopositive in the elephant as I show. 

      (4) What the authors see aligns perfectly with what has been described previously, the only difference being the names that nuclear complexes are being called. But identifying these nuclei is important, as any functional sequelae, as extensively discussed by the authors, is entirely dependent upon accurately identifying these nuclei. 

      (4) The peripherin immunostaining scores an own goal - if peripherin is marking peripheral nerves (as the authors and I believe it is), then why is the VsensR/IOM only "weakly positive" for this stain? This either means that the "extraordinary" tactile sensitivity of the elephant trunk is non-existent, or that the authors have misinterpreted this staining. That there is extensive staining in the fibre pathway dorsal and lateral to the IOR (which I call the spinal trigeminal tract), supports the idea that the authors have misinterpreted their peripherin immunostaining.

      (5) Evolutionary expediency. The authors argue that what they report is an expedient way in which to modify the organisation of the brainstem in the elephant to accommodate the "extraordinary" tactile sensitivity. I disagree. As pointed out in my first review, the elephant cerebellum is very large and comprised of huge numbers of morphologically complex neurons. The inferior olivary nuclei in all mammals studied in detail to date, give rise to the climbing fibres that terminate on the Purkinje cells of the cerebellar cortex. It is more parsimonious to argue that, in alignment with the expansion of the elephant cerebellum (for motor control of the trunk), the inferior olivary nuclei (specifically the principal nucleus) have had additional neurons added to accommodate this cerebellar expansion. Such an addition of neurons to the principal nucleus of the inferior olive could readily lead to the loss of the serrated appearance of the principal nucleus of the inferior olive, and would require far less modifications in the developmental genetic program that forms these nuclei. This type of quantitative change appears to be the primary way in which structures are altered in the mammalian brainstem. 

      Comment: We still disagree with the referee. We note that our conclusions rest on the analysis of 8 elephant brainstems, which we sectioned in three planes and stained with a variety of metabolic and antibody stains and in which assigned two structures (the inferior olive and the trigeminal nucleus). Most of the evidence cited by the referee stems from a single paper, in which 147 structures were identified based on the analysis of a single brainstem sectioned in one plane and stained with a limited set of antibodies. Our synopsis of the evidence is the following.

      (1) We agree with the referee that concerning brainstem position our scheme of a ventromedial trigeminal nucleus and a dorsolateral inferior olive deviates from the usual mammalian position of these nuclei (i.e. a dorsolateral trigeminal nucleus and a ventromedial inferior olive).

      (2) Cytoarchitectonics support our partitioning scheme. The compact cellular appearance of our ventromedial trigeminal nucleus is characteristic of trigeminal nuclei. The serrated appearance of our dorsolateral inferior olive is characteristic of the mammalian inferior olive; we acknowledge that the referee claims exceptions here. To our knowledge, nobody has described a mammalian trigeminal nucleus with a serrated appearance (which would apply to the elephant in case the trigeminal nucleus is situated dorsolaterally).

      (3) Metabolic staining (Cyto-chrome-oxidase reactivity) supports our partitioning scheme. Specifically, our ventromedial trigeminal nucleus shows intense Cyto-chrome-oxidase reactivity as it is seen in the trigeminal nuclei of trigeminal tactile experts.

      (4) Isomorphism. The myelin stripes on our ventromedial trigeminal nucleus are isomorphic to trunk wrinkles. Isomorphism is a characteristic of somatosensory brain structures (barrel, barrelettes, nose-stripes, etc) and we know of no case, where such isomorphism was misleading.

      (5) The large-scale organization of our ventromedial trigeminal nuclei in anterior-posterior repeats is characteristic of the mammalian trigeminal nuclei. To our knowledge, no such organization has ever been reported for the inferior olive.

      (6) Connectivity analysis supports our partitioning scheme. According to our delineation of the elephant olivo-cerebellar tract, our dorsolateral inferior olive is connected via peripherin-positive climbing fibers to the cerebellum. In contrast, our ventromedial trigeminal nucleus (the referee’s inferior olive) is not connected via climbing fibers to the cerebellum.

      Change: As discussed, we advanced further evidence in this revision. Our partitioning scheme (a ventromedial trigeminal nucleus and a dorsolateral inferior olive) is better supported by data and makes more sense than the referee’s suggestion (a dorsolateral trigeminal nucleus and a ventromedial inferior olive). It should be published.

      Reviewer #3 (Public Review):

      Summary: 

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identify large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning that they likely correspond with trunk folds; however this conclusion is not well supported if the nucleus has been misidentified. 

      Comment: The referee provides a summary of our work. The referee also notes that the correct identification of the trigeminal nucleus is critical to the message of our paper.

      Change: In line with these assessments we focused our revision efforts on the issue of trigeminal nucleus identification, please see our introductory comments and our response to Referee 2.

      Strengths: 

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: We appreciate this positive assessment.

      Change: None

      Weaknesses: 

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections. 

      Comment: We understand these criticisms and the need for cautious interpretation. As we noted previously, we think that the Elife-publishing scheme, where critical referee commentary is published along with our ms, will make this contribution particularly valuable.

      Change: Our additional efforts to secure the correct identification of the trigeminal nucleus.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data to different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings. 

      Comment: We understand, why the referee asks for additional comparative data, which would make our study more meaningful. We note that we already published a quantitative comparison of African and Asian elephant facial nuclei (Kaufmann et al. 2022). The quantitative differences between African and Asian elephant facial nuclei are similar in magnitude to what we observed here for the trigeminal nucleus, i.e. African elephants have about 10-15% more facial nucleus neurons than Asian elephants. The referee also notes that data on overall elephant brain size might be important for interpreting our data. We agree with this sentiment and we are preparing a ms on African and Asian elephant brain size. We find – unexpectedly given the larger body size of African elephants – that African elephants have smaller brains than Asian elephants. The finding might imply that African elephants, which have more facial nucleus neurons and more trigeminal nucleus trunk module neurons, are neurally more specialized in trunk control than Asian elephants.

      Change: We are preparing a further ms on African and Asian elephant brain size, a first version of this work has been submitted.

      Reviewer #4 (Public Review): 

      Summary: 

      The authors report a novel isomorphism in which the folds of the elephant trunk are recognizably mapped onto the principal sensory trigeminal nucleus in the brainstem. Further, they identifiy the enlarged nucleus as being situated in this species in an unusual ventral midline position. 

      Comment: The referee summarizes our work.

      Change: None.

      Strengths: 

      The identity of the purported trigeminal nucleus and the isomorphic mapping with the trunk folds is supported by multiple lines of evidence: enhanced staining for cytochrome oxidase, an enzyme associated with high metabolic activity; dense vascularization, consistent with high metabolic activity; prominent myelinated bundles that partition the nucleus in a 1:1 mapping of the cutaneous folds in the trunk periphery; near absence of labeling for the anti-peripherin antibody, specific for climbing fibers, which can be seen as expected in the inferior olive; and a high density of glia.

      Comment: The referee again reviews some of our key findings.

      Change: None. 

      Weaknesses: 

      Despite the supporting evidence listed above, the identification of the gross anatomical bumps, conspicuous in the ventral midline, is problematic. This would be the standard location of the inferior olive, with the principal trigeminal nucleus occupying a more dorsal position. This presents an apparent contradiction which at a minimum needs further discussion. Major species-specific specializations and positional shifts are well-documented for cortical areas, but nuclear layouts in the brainstem have been considered as less malleable. 

      Comment: The referee notes that our discrepancy with referee 2, needs to be addressed with further evidence and discussion, given the unusual position of both inferior olive and trigeminal nucleus in the partitioning scheme and that the mammalian brainstem tends to be positionally conservative. We agree with the referee. We note that – based on the immense size of the elephant trigeminal ganglion (50 g), half the size of a monkey brain – it was expected that the elephant trigeminal nucleus ought to be exceptionally large.

      Change: We did additional experimental work to resolve this matter: (i) We ascertained that elephant climbing fibers are strongly peripherin-positive. (ii) Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum. (iii) We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. (iv) We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Referee-Figure 1). These novel findings support our ideas.

      Reviewer #5 (Public Review): 

      After reading the manuscript and the concerns raised by reviewer 2 I see both sides of the argument - the relative location of trigeminal nucleus versus the inferior olive is quite different in elephants (and different from previous studies in elephants), but when there is a large disproportionate magnification of a behaviorally relevant body part at most levels of the nervous system (certainly in the cortex and thalamus), you can get major shifting in location of different structures. In the case of the elephant, it looks like there may be a lot of shifting. Something that is compelling is that the number of modules separated but the myelin bands correspond to the number of trunk folds which is different in the different elephants. This sort of modular division based on body parts is a general principle of mammalian brain organization (demonstrated beautifully for the cuneate and gracile nucleus in primates, VP in most of species, S1 in a variety of mammals such as the star nosed mole and duck-billed platypus). I don't think these relative changes in the brainstem would require major genetic programming - although some surely exists. Rodents and elephants have been independently evolving for over 60 million years so there is a substantial amount of time for changes in each l lineage to occur.

      I agree that the authors have identified the trigeminal nucleus correctly, although comparisons with more out groups would be needed to confirm this (although I'm not suggesting that the authors do this). I also think the new figure (which shows previous divisions of the brainstem versus their own) allows the reader to consider these issues for themselves. When reviewing this paper, I actually took the time to go through atlases of other species and even look at some of my own data from highly derived species. Establishing homology across groups based only on relative location is tough especially when there appears to be large shifts in relative location of structures. My thoughts are that the authors did an extraordinary amount of work on obtaining, processing and analyzing this extremely valuable tissue. They document their work with images of the tissue and their arguments for their divisions are solid. I feel that they have earned the right to speculate - with qualifications - which they provide. 

      Comment: The referee summarizes our work and appears to be convinced by the line of our arguments. We are most grateful for this assessment. We add, again, that the skeptical assessment of referee 2 will be published as well and will give the interested reader the possibility to view another perspective on our work.

      Change: None. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      With this manuscript being virtually identical to the previous version, it is possible that some of the definitive conclusions about having identified the elephant trigeminal nucleus and trunk representation should be moderated in a more nuanced manner, especially given the careful and experienced perspective from reviewers with first hand knowledge elephant neuroanatomy.

      Comment: We agree that both our first and second revisions were very much centered on the debate of the correct identification of the trigeminal nucleus and that our ms did not evolve as much in other regards. This being said we agree with Referee 2 that we needed to have this debate. We also think we advanced important novel data in this context (the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      Changes: Our revised Figure 2. 

      The peripherin staining adds another level of argument to the authors having identified the trigeminal brainstem instead of the inferior olive, if differential expression of peripherin is strong enough to distinguish one structure from the other.

      Comment: We think we showed too little peripherin-antibody staining in our previous revision. We have now addressed this problem.

      Changes: Our revised Figure 2, i.e. the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      There are some minor corrections to be made with the addition of Fig. 2., including renumbering the figures in the manuscript (e.g., 406, 521). 

      I continue to appreciate this novel investigation of the elephant brainstem and find it an interesting and thorough study, with the use of classical and modern neuroanatomical methods.

      Comment: We are thankful for this positive assessment.

      Reviewer #2 (Recommendations For The Authors):

      I do realise the authors are very unhappy with me and the reviews I have submitted. I do apologise if feelings have been hurt, and I do understand the authors put in a lot of hard work and thought to develop what they have; however, it is unfortunate that the work and thoughts are not correct. Science is about the search for the truth and sometimes we get it wrong. This is part of the scientific process and why most journals adhere to strict review processes of scientific manuscripts. As I said previously, the authors can use their data to write a paper describing and quantifying Golgi staining of neurons in the principal olivary nucleus of the elephant that should be published in a specialised journal and contextualised in terms of the motor control of the trunk and the large cerebellum of the elephant. 

      Comment: We appreciate the referee’s kind words. Also, no hard feelings from our side, this is just a scientific debate. In our experience, neuroanatomical debates are resolved by evidence and we note that we provide evidence strengthening our identification of the trigeminal nucleus and inferior olive. As far as we can tell from this effort and the substantial evidence accumulated, the referee is wrong.

      Reviewer #4 (Recommendations For The Authors):

      As a new reviewer, I have benefited from reading the previous reviews and Author response, even while having several new comments to add. 

      (1) The identification of the inferior olive and trigeminal nuclei is obviously center stage. An enlargement of the trigeminal nuclei is not necessarily problematic, given the published reports on the dramatic enlargement of the trigeminal nerve (Purkart et al., 2022). At issue is the conspicuous relocation of the trigeminal nuclei that is being promoted by Reveyaz et al. Conspicuous rearrangements are not uncommon; for example, primary sensory cortical fields in different species (fig. 1 in H.H.A. Oelschlager for dolphins; S. De Vreese et al. (2023) for cetaceans, L. Krubitzer on various species, in the context of evolution). The difficult point here concerns what looks like a rather conspicuous gross anatomical rearrangement, in BRAINSTEM - the assumption being that the brainstem bauplan is going to be specifically conservative and refractory to gross anatomical rearrangement. 

      Comment: We agree with the referee that the brainstem rearrangements are unexpected. We also think that the correct identification of nuclei needs to be at the center of our revision efforts.

      Change: Our revision provided further evidence (delineation of the olivo-cerebellar tract, characterization of the trigeminal nerve entry) about the identity of the nuclei we studied.

      Why would a major nucleus shift to such a different location? and how? Can ex vivo DTI provide further support of the correct identification? Is there other "disruption" in the brainstem? What occupies the traditional position of the trigeminal nuclei? An atlas-equivalent coronal view of the entire brainstem would be informative. The Authors have assembled multiple criteria to support their argument that the ventral "bumps" are in fact a translocated trigeminal principal nucleus: enhanced CO staining, enhanced vascularization, enhanced myelination (via Golgi stains and tomography), very scant labeling for a climbing fiber specific antibody ( anti-peripherin), vs. dense staining of this in the alternative structure that they identify as IO; and a high density of glia. Admittedly, this should be sufficient, but the proposed translocation (in the BRAINSTEM) is sufficiently startling that this is arguably NOT sufficient. <br /> The terminology of "putative" is helpful, but a more cogent presentation of the results and more careful discussion might succeed in winning over at least some of a skeptical readership. 

      Comment: We do not know, what led to the elephant brainstem rearrangements we propose. If the trigeminal nuclei had expanded isometrically in elephants from the ancestral pattern, one would have expected a brain with big lateral bumps, not the elephant brain with its big ventromedial bumps. We note, however, that very likely the expansion of the elephant trigeminal nuclei did not occur isometrically. Instead, the neural representation of the elephant nose expanded dramatically and in rodents the nose is represented ventromedially in the brainstem face representation. Thus, we propose a ‘ventromedial outgrowth model’ according to which the elephant ventromedial trigeminal bumps result from a ventromedially direct outgrowth of the ancestral ventromedial nose representation.

      We advanced substantially more evidence to support our partitioning scheme, including the delineation of the olivo-cerebellar tract based on peripherin-reactivity. We also identified problems in previous partitioning schemes, such as the claim that the trigeminal nerve continues into the ~4x smaller olivocerebellar tract (Referee-Figure 1C, D); we think such a flow of fibers, (which is also at odds with peripherin-antibody-reactivity and the appearance of nerve and olivocerebellar tract), is highly unlikely if not physically impossible. With all that we do not think that we overstate our case in our cautiously presented ms.

      Change: We added evidence on the identification of elephant trigeminal nuclei and inferior olive.

      (2) Role of myelin. While the photos of myelin are convincing, it would be nice to have further documentation. Gallyas? Would antibodies to MBP work? What is the myelin distribution in the "standard" trigeminal nuclei (human? macaque or chimpanzee?). What are alternative sources of the bundles? Regardless, I think it would be beneficial to de-emphasize this point about the role of myelin in demarcating compartments. <br /> I would in fact suggest an alternative (more neutral) title that might highlight instead the isomorphic feature; for example, "An isomorphic representation of Trunk folds in the Elephant Trigeminal Nucleus." The present title stresses myelin, but figure 1 already focuses on CO. Additionally, the folds are actually mentioned almost in passing until later in the manuscript. I recommend a short section on these at the beginning of the Results to serve as a useful framework.

      Here I'm inclined to agree with the Reviewer, that the Authors' contention that the myelin stipes serve PRIMARILY to separate trunk-fold domains is not particularly compelling and arguably a distraction. The point can be made, but perhaps with less emphasis. After all, the fact that myelin has multiple roles is well-established, even if frequently overlooked. In addition, the Authors might make better use of an extensive relevant literature related to myelin as a compartmental marker; for example, results and discussion in D. Haenelt....N. Weiskopf (eLife, 2023), among others. Another example is the heavily myelinated stria of Gennari in primate visual cortex, consisting of intrinsic pyramidal cell axons, but where the role of the myelination has still not been elucidated. 

      Comment: (1) Documentation of myelin. We note that we show further identification of myelinated fibers by the fluorescent dye fluomyelin in Figure 4B. We also performed additional myelin stains as the gold-myelin stain after the protocol of Schmued (Referee-Figure 2). In the end, nothing worked quite as well to visualize myelin-stripes as the bright-field images shown in Figure 4A and it is only the images that allowed us to match myelin-stripes to trunk folds. Hence, we focus our presentation on these images.

      (2) Title: We get why the referee envisions an alternative title. This being said, we would like to stick with our current title, because we feel it highlights the major novelty we discovered.

      (3) We agree with many of the other comments of the referee on myelin phenomenology. We missed the Haenelt reference pointed out by the referee and think it is highly relevant to our paper

      Change: 1. Review image 2. Inclusion of the Haenelt-reference.

      Author response image 2.

      Myelin stripes of the elephant trunk module visualized by Gold-chloride staining according to Schmued. A, Low magnification micrograph of the trunk module of African elephant Indra stained with AuCl according to Schmued. The putative finger is to the left, proximal is to the right. Myelin stripes can easily be recognized. The white box indicates the area shown in B. B, high magnification micrograph of two myelin stripes. Individual gold-stained (black) axons organized in myelin stripes can be recognized.

      Schmued, L. C. (1990). A rapid, sensitive histochemical stain for myelin in frozen brain sections. Journal of Histochemistry & Cytochemistry,38(5), 717-720.

      Are the "bumps" in any way "analogous" to the "brain warts" seen in entorhinal areas of some human brains (G. W. van Hoesen and A. Solodkin (1993)? 

      Comment: We think this is a similar phenomenon.

      Change: We included the Hoesen and A. Solodkin (1993) reference in our discussion.

      At least slightly more background (ie, a separate section or, if necessary, supplement) would be helpful, going into more detail on the several subdivisions of the ION and if these undergo major alterations in the elephant.

      Comment: The strength of the paper is the detailed delineation of the trunk module, based on myelin stripes and isomorphism. We don’t think we have strong evidence on ION subdivisions, because it appears the trigeminal tract cannot be easily traced in elephants. Accordingly, we find it difficult to add information here.

      Change: None.

      Is there evidence from the literature of other conspicuous gross anatomical translocations, in any species, especially in subcortical regions? 

      Comment: The best example that comes to mind is the star-nosed mole brainstem. There is a beautiful paper comparing the star-nosed mole brainstem to the normal mole brainstem (Catania et al 2011). The principal trigeminal nucleus in the star-nosed mole is far more rostral and also more medial than in the mole; still, such rearrangements are minor compared to what we propose in elephants.

      Catania, Kenneth C., Duncan B. Leitch, and Danielle Gauthier. "A star in the brainstem reveals the first step of cortical magnification." PloS one 6.7 (2011): e22406.

      Change: None.

      (3) A major point concerns the isomorphism between the putative trigeminal nuclei and the trunk specialization. I think this can be much better presented, at least with more discussion and other examples. The Authors mention about the rodent "barrels," but it seemed strange to me that they do not refer to their own results in pig (C. Ritter et al., 2023) nor the work from Ken Catania, 2002 (star-nosed mole; "fingerprints in the brain") or other that might be appropriate. I concur with the Reviewer that there should be more comparative data. 

      Comment: We agree.

      Change: We added a discussion of other isomorphisms including the the star-nosed mole to our paper.

      (4) Textual organization could be improved. 

      The Abstract all-important Introduction is a longish, semi "run-on" paragraph. At a minimum this should be broken up. The last paragraph of the Introduction puts forth five issues, but these are only loosely followed in the Results section. I think clarity and good organization is of the upmost importance in this manuscript. I recommend that the Authors begin the Results with a section on the trunk folds (currently figure 5, and discussion), continue with the several points related to the identification of the trigeminal nuclei, and continue with a parallel description of ION with more parallel data on the putative trigeminal and IO structures (currently referee Table 1, but incorporate into the text and add higher magnification of nucleus-specific cell types in the IO and trigeminal nuclei). Relevant comparative data should be included in the Discussion.

      Comment: 1. We agree with the referee that our abstract needed to be revised. 2. We also think that our ms was heavily altered by the insertion of the new Figure 2, which complemented Figure 1 from our first submission and is concerned with the identification of the inferior olive. From a standpoint of textual flow such changes were not ideal, but the revisions massively added to the certainty with which we identify the trigeminal nuclei. Thus, although we are not as content as we were with the flow, we think the ms advanced in the revision process and we would like to keep the Figure sequence as is. 3. We already noted above that we included additional comparative evidence.

      Change: 1. We revised our abstract. 2. We added comparative evidence.

      Reviewer #5 (Recommendations For The Authors): 

      The data is invaluable and provides insights into some of the largest mammals on the planet. 

      Comment: We are incredibly thankful for this positive assessment.

    1. Author Response:

      Reviewer #1 (Public Review):

      Force sensing and gating mechanisms of the mechanically activated ion channels is an area of broad interest in the field of mechanotransduction. These channels perform important biological functions by converting mechanical force into electrical signals. To understand their underlying physiological processes, it is important to determine gating mechanisms, especially those mediated by lipids. The authors in this manuscript describe a mechanism for mechanically induced activation of TREK-1 (TWIK-related K+ channel. They propose that force induced disruption of ganglioside (GM1) and cholesterol causes relocation of TREK-1 associated with phospholipase D2 (PLD2) to 4,5-bisphosphate (PIP2) clusters, where PLD2 catalytic activity produces phosphatidic acid that can activate the channel. To test their hypothesis, they use dSTORM to measure TREK-1 and PLD2 colocalization with either GM1 or PIP2. They find that shear stress decreases TREK-1/PLD2 colocalization with GM1 and relocates to cluster with PIP2. These movements are affected by TREK-1 C-terminal or PLD2 mutations suggesting that the interaction is important for channel re-location. The authors then draw a correlation to cholesterol suggesting that TREK-1 movement is cholesterol dependent. It is important to note that this is not the only method of channel activation and that one not involving PLD2 also exists. Overall, the authors conclude that force is sensed by ordered lipids and PLD2 associates with TREK-1 to selectively gate the channel. Although the proposed mechanism is solid, some concerns remain.

      1) Most conclusions in the paper heavily depend on the dSTORM data. But the images provided lack resolution. This makes it difficult for the readers to assess the representative images.

      The images were provided are at 300 dpi. Perhaps the reviewer is referring to contrast in Figure 2? We are happy to increase the contrast or resolution.

      As a side note, we feel the main conclusion of the paper, mechanical activation of TREK-1 through PLD2, depended primarily on the electrophysiology in Figure 1b-c, not the dSTORM. But both complement each other.

      2) The experiments in Figure 6 are a bit puzzling. The entire premise of the paper is to establish gating mechanism of TREK-1 mediated by PLD2; however, the motivation behind using flies, which do not express TREK-1 is puzzling.

      The fly experiment shows that PLD mechanosensitivity is more evolutionarily conserved than TREK-1 mechanosensitivity. We should have made this clearer.

      -Figure 6B, the image is too blown out and looks over saturated. Unclear whether the resolution in subcellular localization is obvious or not.

      Figure 6B is a confocal image, it is not dSTORM. There is no dSTORM in Figure 6. This should have been made clear in the figure legend. For reference, only a few cells would fit in the field of view with dSTORM.

      -Figure 6C-D, the differences in activity threshold is 1 or less than 1g. Is this physiologically relevant? How does this compare to other conditions in flies that can affect mechanosensitivity, for example?

      Yes, 1g is physiologically relevant. It is almost the force needed to wake a fly from sleep (1.2-3.2g). See ref 33. Murphy Nature Pro. 2017.

      3) 70mOsm is a high degree of osmotic stress. How confident are the authors that a. cell health is maintained under this condition and b. this does indeed induce membrane stretch? For example, does this stimulation activate TREK-1?

      Yes, osmotic swell activates TREK1. This was shown in ref 19 (Patel et al 1998). We agree the 70 mOsm is a high degree of stress. This needs to be stated better in the paper.

      Reviewer #2 (Public Review):

      This manuscript by Petersen and colleagues investigates the mechanistic underpinnings of activation of the ion channel TREK-1 by mechanical inputs (fluid shear or membrane stretch) applied to cells. Using a combination of super-resolution microscopy, pair correlation analysis and electrophysiology, the authors show that the application of shear to a cell can lead to changes in the distribution of TREK-1 and the enzyme PhospholipaseD2 (PLD2), relative to lipid domains defined by either GM1 or PIP2. The activation of TREK-1 by mechanical stimuli was shown to be sensitized by the presence of PLD2, but not a catalytically dead xPLD2 mutant. In addition, the activity of PLD2 is increased when the molecule is more associated with PIP2, rather than GM1 defined lipid domains. The presented data do not exclude direct mechanical activation of TREK-1, rather suggest a modulation of TREK-1 activity, increasing sensitivity to mechanical inputs, through an inherent mechanosensitivity of PLD2 activity. The authors additionally claim that PLD2 can regulate transduction thresholds in vivo using Drosophila melanogaster behavioural assays. However, this section of the manuscript overstates the experimental findings, given that it is unclear how the disruption of PLD2 is leading to behavioural changes, given the lack of a TREK-1 homologue in this organism and the lack of supporting data on molecular function in the relevant cells.

      We agree, the downstream effectors of PLD2 mechanosensitivity are not known in the fly. Other anionic lipids have been shown to mediate pain see ref 46 and 47. We do not wish to make any claim beyond PLD2 being an in vivo contributor to a fly’s response to mechanical force.

      That said we do believe we have established a molecular function at the cellular level. We showed PLD is robustly mechanically activated in a cultured fly cell line (BG2-c2) Figure 6a of the manuscript. And our previous publication established mechanosensation of PLD (Petersen et. al. Nature Com 2016) through mechanical disruption of the lipids. At a minimum, the experiments show PLDs mechanosensitivity is evolutionarily better conserved across species than TREK1.

      This work will be of interest to the growing community of scientists investigating the myriad mechanisms that can tune mechanical sensitivity of cells, providing valuable insight into the role of functional PLD2 in sensitizing TREK-1 activation in response to mechanical inputs, in some cellular systems.

      The authors convincingly demonstrate that, post application of shear, an alteration in the distribution of TREK-1 and mPLD2 (in HEK293T cells) from being correlated with GM1 defined domains (no shear) to increased correlation with PIP2 defined membrane domains (post shear). These data were generated using super-resolution microscopy to visualise, at sub diffraction resolution, the localisation of labelled protein, compared to labelled lipids. The use of super-resolution imaging enabled the authors to visualise changes in cluster association that would not have been achievable with diffraction limited microscopy. However, the conclusion that this change in association reflects TREK-1 leaving one cluster and moving to another overinterprets these data, as the data were generated from static measurements of fixed cells, rather than dynamic measurements capturing molecular movements.

      When assessing molecular distribution of endogenous TREK-1 and PLD2, these molecules are described as "well correlated: in C2C12 cells" however it is challenging to assess what "well correlated" means, precisely in this context. This limitation is compounded by the conclusion that TREK-1 displayed little pair correlation with GM1 and the authors describe a "small amount of TREK-1 trafficked to PIP2". As such, these data may suggest that the findings outlined for HEK293T cells may be influenced by artefacts arising from overexpression.

      The changes in TREK-1 sensitivity to mechanical activation could also reflect changes in the amount of TREK-1 in the plasma membrane. The authors suggest that the presence of a leak currently accounts for the presence of TREK-1 in the plasma membrane, however they do not account for whether there are significant changes in the membrane localisation of the channel in the presence of mPLD2 versus xPLD2. The supplementary data provide some images of fluorescently labelled TREK-1 in cells, and the authors state that truncating the c-terminus has no effect on expression at the plasma membrane, however these data provide inadequate support for this conclusion. In addition, the data reporting the P50 should be noted with caution, given the lack of saturation of the current in response to the stimulus range.

      We thank the reviewer for his/her concern about expression levels. We did test TREK-1 expression. mPLD decreases TREK-1 expression ~two-fold (see Author response image 1). We did not include the mPLD data since TREK-1 was mechanically activated with mPLD. For expression to account for the loss of TREK-1 stretch current (Figure 1b), xPLD would need to block surface expression of TREK-1. The opposite was true, xPLD2 increased TREK-1 expression increased (see Figure S2c). Furthermore, we tested the leak current of TREK-1 at 0 mV and 0 mmHg of stretch. Basal leak current was no different with xPLD2 compared to endogenous PLD (Figure 1d; red vs grey bars respectively) suggesting TREK-1 is in the membrane and active when xPLD2 is present. If anything, the magnitude of the effect with xPLD would be larger if the expression levels were equal.

      Author response image 1.

      TREK expression at the plasma membrane. TREK-1 Fluorescence was measured by GFP at points along the plasma membrane. Over expression of mouse PLD2 (mPLD) decrease the amount of full-length TREK-1 (FL TREK) on the surface more than 2-fold compared to endogenously expressed PLD (enPLD) or truncated TREK (TREKtrunc) which is missing the PLD binding site in the C-terminus. Over expression of mPLD had no effect on TREKtrunc.

      Finally, by manipulating PLD2 in D. melanogaster, the authors show changes in behaviour when larvae are exposed to either mechanical or electrical inputs. The depletion of PLD2 is concluded to lead to a reduction in activation thresholds and to suggest an in vivo role for PA lipid signaling in setting thresholds for both mechanosensitivity and pain. However, while the data provided demonstrate convincing changes in behaviour and these changes could be explained by changes in transduction thresholds, these data only provide weak support for this specific conclusion. As the authors note, there is no TREK-1 in D. melanogaster, as such the reported findings could be accounted for by other explanations, not least including potential alterations in the activation threshold of Nav channels required for action potential generation. To conclude that the outcomes were in fact mediated by changes in mechanotransduction, the authors would need to demonstrate changes in receptor potential generation, rather than deriving conclusions from changes in behaviour that could arise from alterations in resting membrane potential, receptor potential generation or the activity of the voltage gated channels required for action potential generation.

      We are willing to restrict the conclusion about the fly behavior as the reviewers see fit. We have shown PLD is mechanosensitivity in a fly cell line, and when we knock out PLD from a fly, the animal exhibits a mechanosensation phenotype.

      This work provides further evidence of the astounding flexibility of mechanical sensing in cells. By outlining how mechanical activation of TREK-1 can be sensitised by mechanical regulation of PLD2 activity, the authors highlight a mechanism by which TREK-1 sensitivity could be regulated under distinct physiological conditions.

      Reviewer #3 (Public Review):

      The manuscript "Mechanical activation of TWIK-related potassium channel by nanoscopic movement and second messenger signaling" presents a new mechanism for the activation of TREK-1 channel. The mechanism suggests that TREK1 is activated by phosphatidic acids that are produced via a mechanosensitive motion of PLD2 to PIP2-enriched domains. Overall, I found the topic interesting, but several typos and unclarities reduced the readability of the manuscript. Additionally, I have several major concerns on the interpretation of the results. Therefore, the proposed mechanism is not fully supported by the presented data. Lastly, the mechanism is based on several previous studies from the Hansen lab, however, the novelty of the current manuscript is not clearly stated. For example, in the 2nd result section, the authors stated, "fluid shear causes PLD2 to move from cholesterol dependent GM1 clusters to PIP2 clusters and this activated the enzyme". However, this is also presented as a new finding in section 3 "Mechanism of PLD2 activation by shear."

      For PLD2 dependent TREK-1 activation. Overall, I found the results compelling. However, two key results are missing. 1. Does HEK cells have endogenous PLD2? If so, it's hard to claim that the authors can measure PLD2-independent TREK1 activation.

      Yes, there is endogenous PLD (enPLD). We calculated the relative expression of xPLD2 vs enPLD. xPLD2 is >10x more abundant (Fig. S3d of Pavel et al PNAS 2020, ref 14 of the current manuscript). Hence, as with anesthetic sensitivity, we expect the xPLD to out compete the endogenous PLD, which is what we see. This should have been described more carefully in this paper and the studies pointed out that establish this conclusion.

      1. Does the plasma membrane trafficking of TREK1 remain the same under different conditions (PLD2 overexpression, truncation)? From Figure S2, the truncated TREK1 seem to have very poor trafficking. The change of trafficking could significantly contribute to the interpretation of the data in Figure 1.

      If the PLD2 binding site is removed (TREK-1trunc), yes, the trafficking to the plasma membrane is unaffected by the expression of xPLD and mPLD (Figure R1 above). For full length TREK1 (FL-TREK-1), co-expression of mPLD decreases TREK expression (Figure R1) and co-expression with xPLD increases TREK expression (Figure S2). This is exactly opposite of what one would expect if surface expression accounted for the change in pressure currents. Hence, we conclude surface expression does not account for loss of TREK-1 mechanosensitivity with xPLD2.

      For shear-induced movement of TREK1 between nanodomains. The section is convincing, however I'm not an expert on super-resolution imaging. Also, it would be helpful to clarify whether the shear stress was maintained during fixation. If not, what is the time gap between reduced shear and the fixed state. lastly, it's unclear why shear flow changes the level of TREK1 and PIP2.

      Shear was maintained during the fixing. We do not know why shear changes PIP2 and TREK-1 levels. Presumably endocytosis and or release of other lipid modifying enzymes affect the system. The change in TREK-1 levels appears to be directly through an interaction with PLD as TREKtrunc is not affected by over expression of xPLD or mPLD.

      For the mechanism of PLD2 activation by shear. I found this section not convincing. Therefore, the question of how does PLD2 sense mechanical force on the membrane is not fully addressed. Particularly, it's hard to imagine an acute 25% decrease cholesterol level by shear - where did the cholesterol go? Details on the measurements of free cholesterol level is unclear and additional/alternative experiments are needed to prove the reduction in cholesterol by shear.

      The question “how does PLD2 sense mechanical force on the membrane” we addressed and published in Nature Comm. In 2016. The title of that paper is “Kinetic disruption of lipid rafts is a mechanosensor for phospholipase D” see ref 13 Petersen et. al. PLD is a soluble protein associated to the membrane through palmitoylation. There is no transmembrane domain, which narrows the possible mechanism of its mechanosensation to disruption.

      The Nature Comm. reviewer identified as “an expert in PLD signaling” wrote the following of our data and the proposed mechanism:

      "This is a provocative report that identifies several unique properties of phospholipase D2 (PLD2). It explains in a novel way some long established observations including that the enzyme is largely regulated by substrate presentation which fits nicely with the authors model of segregation of the two lipid raft domains (cholesterol ordered vs PIP2 containing). Although PLD has previously been reported to be involved in mechanosensory transduction processes (as cited by the authors) this is the first such report associating the enzyme with this type of signaling... It presents a novel model that is internally consistent with previous literature as well as the data shown in this manuscript. It suggests a new role for PLD2 as a force transduction tied to the physical structure of lipid rafts and uses parallel methods of disruption to test the predictions of their model."

      Regarding cholesterol. We use a fluorescent cholesterol oxidase assay which we described in the methods. This is an appropriate assay for determining cholesterol levels in a cell which we use routinely. We have published in multiple journals using this method, see references 28, 30, 31. Working out the metabolic fate of cholesterol after sheer is indeed interesting but well beyond the scope of this paper. Furthermore, we indirectly confirmed our finding using dSTORM cluster analysis (Figure 3d-e). The cluster analysis shows a decrease in GM1 cluster size consistent with our previous experiments where we chemically depleted cholesterol and saw a similar decrease in cluster size (see ref 13). All the data are internally consistent, and the cholesterol assay is properly done. We see no reason to reject the data.

      Importantly, there is no direct evidence for "shear thinning" of the membrane and the authors should avoid claiming shear thinning in the abstract and summary of the manuscript.

      We previously established a kinetic model for PLD2 activation see ref 13 (Petersen et al Nature Comm 2016). In that publication we discussed both entropy and heat as mechanisms of disruption. Here we controlled for heat which narrowed that model to entropy (i.e., shear thinning) (see Figure 3c). We provide an overall justification below. But this is a small refinement of our previous paper, and we prefer not to complicate the current paper. We believe the proper rheological term is shear thinning. The following justification, which is largely adapted from ref 13, could be added to the supplement if the reviewer wishes.

      Justification: To establish shear thinning in a biological membrane, we initially used a soluble enzyme that has no transmembrane domain, phospholipase D2 (PLD2). PLD2 is a soluble enzyme and associated with the membrane by palmitate, a saturated 16 carbon lipid attached to the enzyme. In the absence of a transmembrane domain, mechanisms of mechanosensation involving hydrophobic mismatch, tension, midplane bending, and curvature can largely be excluded. Rather the mechanism appears to be a change in fluidity (i.e., kinetic in nature). GM1 domains are ordered, and the palmate forms van der Waals bonds with the GM1 lipids. The bonds must be broken for PLD to no longer associate with GM1 lipids. We established this in our 2016 paper, ref 13. In that paper we called it a kinetic effect, however we did not experimentally distinguish enthalpy (heat) vs. entropy (order). Heat is Newtonian and entropy (i.e., shear thinning) is non-Newtonian. In the current study we paid closer attention to the heat and ruled it out (see Figure 3c and methods). We could propose a mechanism based on kinetic disruption, but we know the disruption is not due to melting of the lipids (enthalpy), which leaves shear thinning (entropy) as the plausible mechanism.

      The authors should also be aware that hypotonic shock is a very dirty assay for stretching the cell membrane. Often, there is only a transient increase in membrane tension, accompanied by many biochemical changes in the cells (including acidification, changes of concentration etc). Therefore, I would not consider this as definitive proof that PLD2 can be activated by stretching membrane.

      Comment noted. We trust the reviewer is correct. In 1998 osmotic shock was used to activate the channel. We only intended to show that the system is consistent with previous electrophysiologic experiments.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank you for sending our manuscript for the second round of review.  We are encouraged by the comments from reviewer #2 that our supplementary work on naïve T cells and antibody blockade work satisfied their previous concerns and is important for our work.

      The Editors raised concerns that we have shared preliminary data on Nrn1 and AMPAR double knockout mice.  We apologize for our enthusiasm for these studies.  Because of the publication model by eLife, we shared that data not because we needed to persuade the reviewer for publication purposes but rather to agree with the reviewer that the molecular target of Nrn1 is important, and we are progressing in understanding this subject.


      The following is the authors’ response to the original reviews.

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.”  The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) data labeling and additional supporting data

      Major points

      (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Support Figure 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points

      (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Figure 3-figure supplement 2D,E,F). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Manuscript Revision based on the Reviewer’s suggestions:

      Reviewer #1:

      Major points (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. 

      Following the suggestion by Reviewer#1, We have included the Nrn1 Ab staining on activated Nrn1-/- CD4 cells in Figure 1D. We have also added the staining of cell surface Nrn1 on Treg cells in Figure 1-figure supplement 1D.

      Major point: (5) “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      In the revised manuscript, we have included the proportion of Foxp3+ cells among Nrn1-/- and ctrl iTreg cells developed under the iTreg culture condition in Figure 2A.

      Minor points  

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      Following reviewer#1’s suggestion, we have changed the Y-axis label in all the relevant figures.

      (3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have used AAinduced cellular MP changes to confirm differential AA transporter expression patterns and their impact on cellular MP levels.  The data are included in the revised manuscript in Figure 3H and Figure 4K.

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We appreciated Reviewer #1’s suggestion and have included the histogram staining data for Figure 3E. We have moved the original Figure 3H to the supplemental figure and included the histogram staining data in Figure 3-figure supplement 1C.  Similarly, we have included the histogram staining data in Figure 4-figure supplement 1C.

      Reviewer#2:

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We greatly appreciate Reviewer#2’s suggestion and have carried out experiments on naïve CD4 cells derived from Nrn1-/- and WT mice. We have compared membrane potential, AA-induced MP change between Nrn1-/- and WT naïve T cells, and the metabolic state of Nrn1-/- and WT naïve T cells by carrying out glucose stress tests and mitochondria stress tests using a seahorse assay.  Moreover, to investigate whether the phenotype revealed in Nrn1-/- CD4 cells was caused by a secondary effect of cell membrane structure change due to Nrn1 deletion, we carried out Nrn1 antibody blockade in WT CD4 cells and investigated the phenotypic change. These new results are included in Figure 3-figure supplement 2.

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. : 1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):  

      Summary:

      In this study, Setogawa et al. employ an auditory discrimination task in freely moving rats, coupled with small animal imaging, electrophysiological recordings, and pharmacological inhibition/lesioning experiments to better understand the role of two striatal subregions: the anterior Dorsal Lateral Striatum (aDLS) and the posterior Ventrolateral Striatum (pVLS), during auditory discrimination learning. Attempting to better understand the contribution of different striatal subregions to sensory discrimination learning strikes me as a highly relevant and timely question, and the data presented in this study are certainly of major interest to the field. The authors have set up a robust behavioral task and systematically tackled the question about a striatal role in learning with multiple observational and manipulative techniques. Additionally, the structured approach the authors take by using neuroimaging to inform their pharmacological manipulation experiments and electrophysiological recordings is a strength.

      However, the results as they are currently presented are not easy to follow and could use some restructuring, especially the electrophysiology. Also, the main conclusion that the authors draw from the data, that aDLS and pVLS contribute to different phases of discrimination learning and influence the animal's response strategy in different ways, is not strongly supported by the data and deserves some additional caveats and limitations of the study in the discussion. 

      We appreciate the reviewer’s valuable feedback, which has been beneficial for improvement of our manuscript. In response to the reviewer’s comments, we have revised multiple parts of the manuscript, including explanations of electrophysiological data. We have also provided additional data to support our main conclusion and addressed caveats and limitations related to the data in the Discussion section. For more details, please refer to the responses to each comment.

      Comment 1: The authors have rigorously used PET neuroimaging, which is an interesting noninvasive method to track brain activity during behavioral states. However, in the case of a freely moving behavior where the scans are performed ~30 minutes after the behavioral task, it is unclear what conclusions can be drawn about task-specific brain activity. The study hinges on the neuroimaging findings that both areas of the lateral striatum (aDLS and pVLS) show increased activity during acquisition, but the DMS shows a reduction in activity during the late stages of behavior, and some of these findings are later validated with complementary experiments. However, the limitations of this technique can be further elaborated on in the discussion and the conclusions.

      As described in our response to the following two comments (a, b) from the reviewer, in the PET imaging study we first analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on different days of the auditory discrimination task with that on Day 4 of the single lever press task as a control. Next, we analyzed learning-dependent activity by comparing the uptake on different days of the discrimination task with that on Day 2 of the same task. Based on the results of both analyses, we concluded that the activity in the striatal subregions changes during the progress of discrimination learning. The behavioral significance of striatal subregions was tested by excitotoxic lesion and pharmacological blockade experiments. The explanation of imaging data analysis may have been insufficient to fully communicate dynamic changes in the activity of striatal subregions. Therefore, we have clarified our voxel-based statistical parametric analysis method to better explain the dynamic activity changes in the striatal subregions. Please refer to the following responses to comments 1 (a, b).

      Comment 1 (a): In commenting on the unilateral shifts in brain striatal activity during behavior, the authors use the single lever task as a control, where many variables affecting neuronal activity might be different than in the discriminatory task. The study might be better served using Day 2 measurements as a control against which to compare activity of all other sessions since the task structures are similar.

      We initially analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on one of Days 2, 6, 10, or 24 of auditory discrimination task with that on Day 4 of the single lever press task. This task was used as a control that does not require a decision process based on the auditory stimulus. We observed significant increases in the activity of the unilateral aDLS on Day 6 and in that of the bilateral pVLS on Day 10 of the discrimination task. We also observed a significant decrease in the unilateral DMS on Day 24 (see Figures 2F and 2G). Next, as suggested, we compared the uptake on one of Days 6, 10, or 24 with that on Day 2 as a control to evaluate learning-dependent activity. The activity showed significant increases in the bilateral aDLS on Day 6 and in the unilateral pVLS on Day 10, and a significant decrease in the bilateral DMS on Day 24 (see Figures 2H). 

      The reviewer has suggested a discrepancy in the activity of the unilateral or bilateral striatal subregions under certain conditions between the image data (shown in Figures 2F–H) and plot data (Figures 2J–L). This discrepancy is also suggested in the following Comment 1 (b). For example, in the image data the brain activity was increased in the unilateral (left) aDLS on Day 6 of the discrimination task as compared to Day 4 of the single lever task (Figure 2F). In the plot data, <sup>18</sup>F-FDG uptake reached a peak on Day 6 in both the left and right sides of the aDLS (Figure 2J), and the uptake in the left aDLS on Day 6 significantly increased relative to the value of the single lever press, whereas the value in the right aDLS on Day 6 tended to increase relative to that of the single lever press with no significant difference. The plot data showing the unilaterality in the aDLS activation relative to the single lever press are consistent with the image data. On the other hand, the <sup>18</sup>F-FDG uptake in the aDLS on Day 6 compared to the value on Day 2 was significantly increased in both sides. Similar observations were made in the activity in the pVLS on Day 10 compared to that on Day 2, as well as in the DMS activity on Day 24 relative to that of the single lever press. 

      Our analysis of both task-related and learning-dependent activities revealed dynamic changes in striatal subregions during discrimination learning. We investigated the brain regions in which <sup>18</sup>F-FDG uptake significantly increased or decreased during the learning processes, applying a statistical significance threshold (p < 0.001, uncorrected) and an extent threshold, by using a voxel-based statistical parametric analysis. In the image data, the voxels showing significant differences between two conditions are visualized on the brain template. The plot data show the amount of <sup>18</sup>FFDG uptake in the voxels, which was detected by the voxel-based analysis. The insufficient explanation of the data analysis of PET imaging in the initial manuscript may have led to a misunderstanding regarding the activity in the unilateral or bilateral striatal subregions. Therefore, we have revised the explanation for voxel-based statistical parametric analysis, adding a more detailed description of the thresholds in the text (page 7, lines 143–145) and Methods (page 27, lines 672–675).

      Comment 1 (b): From the plots in J, K, and L, it seems that shifts in activity in the different substructures are not unilateral but consistently bilateral, in contrast to what is mentioned in the text. Possibly the text reflects comparisons to the single lever task, and here again, I would emphasize comparing within the same task.

      Please see our response to the first comment (a) regarding our explanation of the consistency in the activity of the unilateral or bilateral striatal subregions between the image and plot data. We have also revised the explanation in the corresponding sections of the manuscript, as described above.

      Comment 2: In Figure 2, the authors present compelling data that chronic excitotoxic lesions with ibotenic acid in the aDLS, pVLS, and DMS produce differential effects on discrimination learning. However, the significant reduction in success rate of performance happens as early as Day 6 in both IBO groups in both aDLS and pVLS mice. This would seem to agree with conclusions drawn about the role of aDLS in the middle stages of learning in Figure 2, but not the pVLS, which only shows an increased activity during the late stages of the behavior.  

      Figure 3 shows the behavioral effects of ibotenic acid injections into striatal subregions in rats. For the aDLS injection, we performed two-way repeated ANOVA, which revealed a significant main effect of group or day and a significant interaction of group × day, and added the simple main effects between the treatments to the figure (Figure 3G). We observed significant differences in the success rate mainly at the middle stage of learning. In contrast, for the pVLS injection there was no significant interaction for group × day, although the main effects of group or day was significant by two-way repeated ANOVA (Figure 3H). Consequently, it was unclear as to when exactly the significant reduction occurred. These results indicate that the aDLS and pVLS are necessary for the acquisition of auditory discrimination, and that the aDLS is mainly required for the middle stage. Similar results were observed in the win-shift-win strategy in the aDLS and pVLS (Figures 3J and 3K).

      Next, we performed temporal inhibition of neuronal activity in striatal subregions by muscimol treatment in order to examine whether the activity in the subregions is linked with learning processes at different stages. In this experiment, muscimol was injected into the aDLS or pVLS at the middle or late stage, and the resultant effects on the success rate were investigated. The success rate in the muscimol-injected groups into the aDLS significantly decreased at the middle stage, but not at the early and late stages (Figure 4C). In contrast, the rate in the muscimol groups into the pVLS significantly decreased at the late stage, but not at the early or middle stages (Figure 4D). The results indicate that the aDLS and pVLS are mainly involved in the processes at the middle and late stages, respectively, and support the PET imaging data showing the activation of two striatal subregions at the various stages.

      We have now provided the results of simple main effects analysis for the aDLS lesion (Figures 3G and 3J) and revised the description of the Results section (page 8, lines 174–178, page 8, lines 186–188, and page 9, line 205-206) and Figure legend (page 44, lines 1000‒1003, and page 44, lines 1010–1013). We have also added the results of simple main effects analysis in Figure 3J.

      Comment 3: In Figure 4, the authors show interesting data with transient inactivation of subregions of the striatum with muscimol, validating their findings that the aDLS mediates the middle and the pVLS the late stages of learning, and the function of each area serves different strategies. However, the inference that aDLS inactivation suppresses the WSW strategy "moderately" is not reflected in the formal statistical value p=0.06. While there still may be a subtle effect, the authors would need to revise their conclusions appropriately to reflect the data. In addition, the authors could try a direct comparison between the success rate during muscimol inhibition in the mid-learning session between the aDLS and pVLS-treated groups in Figure 4C (middle) and 4D (middle). If this comparison is not significant, the authors should be careful to claim that inhibition of these two areas differentially affects behavior.

      In Figure 4E, aDLS inhibition showed a tendency to reduce slightly win-shift-win strategy at the middle stage (t[14] = 2.038, p = 0.061, unpaired Student’s t-test). In accordance with the reviewer’s comment, we changed the word “moderate” to “subtle” (page 12, line 272).

      In the temporal inhibition of the striatal subregions, the aDLS and pVLS experiments (panels C and D, respectively) were conducted separately. Since it is difficult to directly compare the data obtained from different experiments, we did not carry out a direct comparison of the success rate between the aDLS and pVLS injections. 

      Comment 4: The authors have used in vivo electrophysiological techniques to systematically investigate the roles of the aDLS and the pVLS in discriminatory learning, and have done a thorough analysis of responses with each phase of behavior over the course of learning. This is a commendable and extremely informative dataset and is a strength of the study. However, the result could be better organized following the sequence of events of the behavioral task to give the reader an easier structure to follow. Ideally, this would involve an individual figure to compare the responses in both areas to Cue, Lever Press, Reward Sound, and First Lick (in this order).

      We first showed changes in the proportion of event-related neurons during the acquisition phase (Figure S5). Next, we conducted a detailed analysis of the characteristics of aDLS and pVLS neuronal activity. Specifically, we found several types of event-related neurons, including: (1) reward sound-related neurons representing behavioral outcomes in the aDLS; (2) first licking-related neurons showing sustained activity after the reward in the aDLS and pVLS; and (3) cue-onset and cue-response neurons associated with the beginning and ending of a behavior in the pVLS.

      Descriptions of the characteristics of event-related neurons according to the sequence of events in a trial, as the reviewer has suggested, is another way to provide an easy structure for understandings on the electrophysiological data. However, we focused on the characteristics of aDLS neurons at the middle stage and pVLS neurons at the late stage of discrimination learning. Therefore, we explained the electrophysiological data based on the order of learning stages rather than the sequence of events in the trial, as described above.

      Comment 5: An important conceptual point presented in the study is that the aDLS neurons, with learning, show a reduction in firing rates and responsiveness to the first lick as well as the behavioral outcome, and don't play a role in other task-related events such as cue onset. However, the neuroimaging data in Figure 2 seems to suggest a transient enhancement of aDLS activity in the mid-stage of discriminatory learning, that is not reflected in the electrophysiology data. Is there an explanation for this difference?

      In the <sup>18</sup>F-FDG PET imaging study, the brain activity in the aDLS reached a peak at the middle stage of the acquisition phase of auditory discrimination (Figure 2J). In the multi-unit electrophysiological recording experiment, the firing activity of the aDLS neuron subpopulations related to the behavioral outcome showed no significant differences among the three stages (Figure 5E), while the proportion of these subpopulations were gradually reduced through the progress of learning stages (Figure 5F). The extent of the firing activity and length of the firing period of other subpopulations showing sustained activation after the reward appeared to show a learning-dependent decrease (Figures 6B and 6C), although the proportion of these subpopulations indicated no correlation with the progress of the learning (Figure 6D). Patterns of the temporal changes in brain activity in striatal subregions across the learning stages did not match completely the time variation in the property or proportion of specific event-related neurons. In our electrophysiological analysis, we identified well-isolated neurons from the striatal subregions during the auditory discrimination task, focusing on putative medium spiny neurons (Figures S4E–S4G). Based on the combinatorial pattern of the tone instruction cue (high tone/H or low tone /L), and lever press (right/R or left/L), we categorized the electrophysiological data into the four trials, including the HR, LL. LR, and HL. We identified HR or LL type neurons showing significant changes in the firing rate related to specific events, such as cue onset, choice response, reward sound, and first licking compared to the baseline firing rate. These neurons were further divided into two groups with increased or decreased activity relative to the baseline firing (Figures S5A and S5B). In the present study, we focused on event-related neurons with increased activity. Because of the analysis limited to neuronal subpopulations related to specific events with the increased activity, it is difficult to fully explain dynamic shifts in the brain activity of striatal subregions dependent on the progress of learning by the time variation of firing activity of individual event-related neurons. The activity of other subpopulations in the striatum may be involved in the shift in brain activity during the learning processes. In addition, recent studies have reported that the activity of glial cells influences the uptake of <sup>18</sup>FFDG (Zimmer et al., Nat Neurosci., 2017) and that these cells regulate spike timingdependent plasticity (Valtcheva and Venance, Nat Commun, 2016). Changes in glial cellular activity, through the control of synaptic plasticity, may partly contribute to the pattern formation of learning-dependent shifts in brain activity.

      To explain the difference in the time course between the brain activity and the firing activity of specific event-related neurons, we have added the aforementioned information to the Limitations section (pages 21 to 22, lines 512–539). 

      Comment 6: A significant finding of the study is that CO-HR and CO-LL responses are strikingly obvious in the pVLS, but not in the aDLS, in line with the literature that the posterior (sensory) striatum processes sound. This study also shows that responses to the highfrequency tone indicating a correct right-lever choice increase with learning in contrast to the low-frequency tone responses. To further address whether this difference arises from the task contingency, and not from the frequency representation of the pVLS, an important control would be to switch the cue-response association in a separate group of mice, such that high-frequency tones require a left lever press and vice versa. This would also help tease apart task-evoked responses in the aDLS, as I am given to understand all the recording sites were in the left striatum.

      We did not conduct an experiment switching cue-response association in the auditory discrimination task. However, the transient activity of cue onset-related neurons in the pVLS, as the reviewer has suggested, did not appear at the early stage of learning, but was observed in a learning-dependent manner (Figures 7A and S8E). In addition, the cue onset-HR activity showed a slight but notable difference between the HR and LL trials at the middle and late stages (Figure 7B), but there was no difference in activity in the HL and LR incorrect trials at the corresponding stages (Wilcoxon signed rank test; early, p = 0.375, middle, p = 0.931, and late, p = 0.668). These results suggest that the activity of cue onset-related neurons in the pVLS is associated with the stimulus and response association (task contingency) rather than the tone frequency.

      Reviewer #1 (Recommendations For The Authors):

      Minor comment 1: The readability and appeal of this study would be improved by explaining the various neuronal response types, and task-related events in slightly more detail in the results section, and minimizing the use of non-standard abbreviations wherever possible.

      As suggested, we have replaced the abbreviations related to electrophysiological events (CO, CR, RS, and FL) with the original terms, and improved the explanation for neuronal response types and event-related neurons. 

      Minor comment 2: It would be helpful to label DLS and VLS recordings more clearly on the figures instead of only in the figure caption.

      Thank you for pointing this out. The terms “aDLS” and “pVLS” have now been added to the panels showing firing pattern of neurons: “aDLS” in Figures 5D, 6A, S6A, S7A, S8A, S8B. S8C, and S8D; and “pVLS” in Figures 6F, 7A, 7D, S6D, S6E, S7F, S8E, and S8F.

      Minor comment 3: The authors suggest that aDLS HR- and LL- neurons are more sensitive to the behavioral outcome than those in pVLS (Fig 5 and S5). However, their conclusions are based on sample sizes as low as n=3 for each response type.

      We identified event-related neurons from single neurons detected in both the aDLS and pVLS using the same criteria. In the pVLS, we found a small number of neurons that increased their activity during the period when the reward sound is presented (Figures S6D and S6E) (6, 4, and 17 HR type neurons at the early, middle, and late stages, respectively; 3, 5, and 15 LL type neurons at the early, middle, and late stages, respectively). The number of LL type neurons at the early stage was particularly lower, as the reviewer has suggested. However, when we plotted the firing rates of these neurons around the event, their activity did not reflect behavioral outcome. In the aDLS, we detected a large number of reward sound-related neurons representing behavioral outcome (Figures 5 and S6A) (43, 37, and 44 HR type neurons at the early, middle, and late stages, respectively; 49, 62, and 59 LL type neurons at the early, middle, and late stages, respectively). These observations suggest that aDLS neurons are more sensitive to behavioral outcomes than pVLS neurons.

      Minor comment 4: Typo in Figure 4C and D, right plots, y-axis label: "subtracted".

      The typographic errors in Figures 4C–4H have now been corrected to “subtracted”.

      Reviewer #2 (Public Reviews):

      The study by Setogawa et al. aims to understand the role that different striatal subregions belonging to parallel brain circuits have in associative learning and discrimination learning (S-O-R and S-R tasks). Strengths of the study are the use of multiple methodologies to measure and manipulate brain activity in rats, from microPET imaging to excitotoxic lesions and multielectrode recordings across anterior dorsolateral (aDLS), posterior ventral lateral (pVLS)and dorsomedial (DMS) striatum. The main conclusions are that the aDLS promotes stimulus-response association and suppresses response-outcome associations. The pVLS is engaged in the formation and maintenance of the stimulus-response association. There is a lot of work done and some interesting findings however, the manuscript can be improved by clarifying the presentation and reasoning. The inclusion of important controls will enhance the rigor of the data interpretation and conclusions.

      We appreciate the reviewer’s valuable feedback, which has been beneficial in our endeavor to improve our manuscript. In response to the comments, we have revised the description of the experimental methods and underlying rationale, as well as the Results section. We have also provided additional data for some of the experiments that support the conclusions. For more details, please refer to the responses to each comment, included below.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1: Generally, the manuscript is hard to read because of the cumbersome sentence structure, overuse of poorly defined acronyms, and lack of clarity on the methods used.

      According to the following comments (a)–(d), we have revised the corresponding text in the manuscript to clarify the sentence structure, definitions of terms, and methodology. 

      Comment 1 (a): For example, the single lever task used as a control for the auditory discrimination task could be introduced better, explaining the reasoning and the strategy for subtracting it from the images obtained during the discrimination phase at the start of the section.

      We analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on Days 2, 6, 10, or 24 of auditory discrimination task with that on Day 4 of the single lever press task. This task was used as a control that does not require a decision process based on the auditory stimulus. For clarification, we have provided a more detailed explanation of the flow of the single lever press task used in the PET experiment, including the rationale for employing this task as a control (page 6, lines 129–135). We have also revised the explanation of voxel-based statistical parametric analysis, adding a more detailed description of the thresholds (page 7, lines 143–145).

      Comment 1 (b): Another example is that important methodological information is buried deep in the text and complicates the interpretation of the results.

      We have revised the following sentences in the manuscript in order to provide clearer methodological information.

      (1) As described above, explanations for the single lever task (page 6, lines 129–135) and voxel-based statistical parametric analysis were added (page 7, lines 143–145). 

      (2) Definition of the early, middle, and late stages were described in the initial behavioral experiment (page 6, lines 113–119). 

      (3) Abbreviations related to behavioral strategies (WSW and LSL) and electrophysiological events (CO, CR, RS, and FL) were replaced with the original terms. 

      Comment 1 (c): The specie being studied is not stated in the abstract, nor the introduction, and only in the middle of the result section. Please include the specie in the abstract and the first part of the result also for clarity.

      We included the name of the species (rats) in the Abstract (page 3, line 47), at the end of the Introduction (page 5, lines 87–88) and at the beginning of the Results (page 5, line 109).

      Comment 1 (d): The last part of the intro is copied/pasted from the abstract. Please revise.

      The last part of the Introduction was revised accordingly (page 5, lines 97–104).

      Comment 2: The glucose microPET imaging is carried out 30 mins after the rats performed the task and it is expected to capture activation during the task. Is this correct? This assumption has to be validated with an experiment, which is a control showing a validation of the microPET approach used, and this way can report activation of brain areas during the task completed 20-30 minutes before. For example, V1 or A1 would be a control that we would expect to be activated during the task.

      Our PET experiment was conducted in accordance with previously established methods (Cui et al, Neuroimage, 2015), where rats received intravenous administration of <sup>18</sup>FFDG solution just before the start of the behavioral session, which lasted for 30 min. The <sup>18</sup>F-FDG uptake in the brain starts immediately and reaches the maximum level until 30 min after the administration, and the level is kept at least for 1 h (Mizuma et al., J Nucl Med, 2010). The rats were returned to their home cages, and a 30-min PET scan started 25 min after the session. The start time of the scan was chosen to allow for sufficient reduction of 18F radioactivity in arterial blood to increase the S/N ratio of the radioactivity (Mizuma et al., J Nucl Med, 2010). As shown in Table S1, we confirmed that the brain activity in the medial geniculate body (auditory thalamus) was increased on Days 6 and 10 in the acquisition phase, although the activity in the auditory cortex was not changed, which is consistent with the results of a previous study reporting that the auditory cortex does not show the causality for the pure-tone discrimination task (Gimenez et al., J Neurophysiol., 2015).

      Comment 3: Why are Days 2, 6, 10, and 13 chosen and compared for the behavior? Why aren't these the same days chosen in the other part of the study? It is unclear why authors focused on these days and why the focus changed later.

      We conducted daily training of the discrimination task. The success rate reached a plateau on Day 13 and was maintained until Day 24 (Figure 1B). Based on these results, we categorized the learning processes into the acquisition and learned phases, and then divided the acquisition phase into the early (< 60%), middle (60–80%), and late (> 80%) stages. In the PET experiment, we selected Days 2, 6, and 10 as the representatives of each stage during the acquisition phase. In addition, we also selected Day 24 for the learned phase.  However, no scan was performed on Day 13 due to the transition between the two phases.   

      Comment 4: (A) Is the learning and acquisition of the single lever press and discrimination task completed by day 4? Or are rats still learning? The authors claimed no changes in DMS activity between single lever press & discrimination, and therefore DMS isn't involved in learning. But to make this claim we should have measures that the learning has already happened, which I am not sure have been provided. (B) On this same point, the DMS activity is elevated on Day 4 of a single lever press compared to the aDLS and pVLS. So is it possible that the activity in DMS was already elevated on Day 4 of single lever press training? Especially given that DMS is supposedly involved in goal-directed behavior?

      (A) In the single lever press task, the number of lever presses plateaued on Day 2 (Figure 1C). In addition, we analyzed response time and its variability, which plateaued from Day 3 and Day 2, respectively (see Author response image 1). These results indicate that the learning in the task was completed by Day 4. In the auditory discrimination task, Day 4 corresponded to the transition period from the early-tomiddle stages of the acquisition phase, suggesting that learning was still progressing. 

      In the imaging analysis, we examined task-related activity by comparing <sup>18</sup>F-FDG uptake on either day of the discrimination task with that on Day 4 of the single lever press task, and did not find any changes in the brain activity in the DMS. In addition, we investigated learning-related activity, and the DMS activity did not change during acquisition phase. These results suggest that the DMS is not involved in the acquisition phase of learning. Furthermore, comparisons between Days 10 and Day 24 showed a decrease in DMS activity during the learned phase, suggesting that DMS activity was downregulated during the learned phase. In addition, chronic lesion in the DMS indicated that the success rate in the discrimination task was comparable between the control and lesioned groups (Figure 3I), whereas the response time lengthened throughout the learning in the lesioned group compared to the controls (Figure S1C). These results support our notion that the DMS contributes to the execution, but not learning, of discriminative behavior (Figure 3I and S1C).

      Author response image 1.

      Performance of single lever press task conducted before auditory discrimination task. (A) Number of lever presses. (B) Response time (Kruskal-Wallis test, χ<sup>2</sup> = 38.063, p = 2.7 × 10<sup>-8</sup>, post hoc Tukey–Kramer test, p = 0.047 for Day 1 vs. Day 2; p = 2.3 × 10<sup>-7</sup> for Day 1 vs. Day 3; and p = 4.0 × 10<sup>-6</sup> for Day 1 vs. Day 4; p = 0.019 for Day 2 vs. Day 3; p = 0.082 for Day 2 vs. Day 4; p = 0.951 for Day 3 vs. Day 4). (C) Response time variability (Kruskal-Wallis test, χ<sup>2</sup> = 28.929, p = 2.3 × <sup>-6</sup>, post hoc Tukey–Kramer test, p = 0.077 for Day 1 vs. Day 2; p = 5.7 × 10<sup>-6</sup> for Day 1 vs. Day 3; and p = 1.3 × 10<sup>-4</sup> for Day 1 vs. Day 4; p = 0.060 for Day 2 vs. Day 3; p = 0.253 for Day 2 vs. Day 4; p = 0.912 for Day 3 vs. Day 4). Data obtained from the task shown in Figure 2C are plotted as the median and quartiles with the maximal and minimal values. *p < 0.05, **p < 0.01, and ***p < 0.001.

      (B) We compared <sup>18</sup>F-FDG uptakes among striatal subregions on Day 4 of the single lever press task (334.8 ± 2.86, 299.0 ± 1.71, and 336.8 ± 2.18 for the aDLS, pVLS, and DMS, respectively; one-way ANOVA, F[2,41] = 104.767, p = 2.1 × 10<sup>-16</sup>). The uptake was comparable between the aDLS and DMS (post hoc Tukey-Kramer test, p = 0.058), but it was significantly lower in the pVLS compared to either of the other two subregions (post hoc Tukey-Kramer test, aDLS vs. pVLS, p = 5.1 × 10<sup>-9</sup>, post hoc Tukey-Kramer test, pVLS vs. DMS, p = 5.1 × 10<sup>-9</sup>). However, since we did not measure the brain activity in the single lever task outside of Day 4, it is unclear whether there was an increase in DMS activity during the acquisition of the task. Similarly, since we did not confirm the behavioral modes, which include goal-directed and habitual actions, it is difficult to conclude that the lever presses in the task were controlled by the goaldirected mode. However, our chronic lesion experiment suggests that the DMS is involved in the execution of discrimination behavior (Figure S1C). A clearer understanding of the DMS function in discrimination learning is an important challenge in the future.

      Comment 5: It seems like the procedure of microPET imaging affects performance on the task. The anesthesia used maybe? Figures 2C and D show evidence that the behavior was negatively affected on the days on which microPET imaging was performed after the training. Can the author clarify/comment?

      Isoflurane anesthesia may slightly reduce behavioral performance. We carried out anesthesia (median [interquartile range]: 6 [5–8] min) during the insertion of the catheter for FDG injection, and set a recovery period of at least 2 h until the beginning of the behavioral session, to minimize the impact of anesthesia. The performances in Figure 2E were similar to those in the intact rats (compared to Figures 1C–1F), suggesting that the procedure for PET scans does not affect the acquisition of discrimination. 

      We have added detailed information on the isoflurane anesthesia to the Methods section (page 26, lines 649–653).

      Comment 6: More on clarity. Section 3 of the results (muscimol inactivation) refers a lot to "the behavioral strategies" without really clarifying what these are - are they referring to WSW / LSL (which also could use a better introduction) or goal-directed/habitual or stimulus-response/stimulus-outcome?

      The dorsal striatum is involved in both behavioral strategies based on stimulus-response association and the response-outcome association during instrumental learning. To assess the impact of striatal lesions on the behavioral strategies, we analyzed the proportion of response attributed to two strategies in all responses of each session. One is the “win-shift-win” strategy, which is considered to reflect the behavioral strategy based on the stimulus-response association. In this strategy, after a correct response in the previous trial, the rats press the opposite lever in the current trial in response to a shift of the instruction cue, resulting in the correct response.  Another strategy is the “lose-shift-lose” strategy, which is considered to appear as a consequence of the behavioral strategy based on the response-outcome association. In this strategy, after an error response in the previous trial, the rats press the opposite lever in the current trial despite a shift of the instruction cue, leading to another error response.

      We have revised the explanations of the behavioral strategies in the section of the Results section (page 9, lines 192–201). 

      Comment 7: Related to WSW / LSL needing a better introduction, on lines 192/193 authors describe a result where they saw the WSW and LSL strategies increase and decrease, respectively, in saline-injected mice. Is the change in performance expected or an undesired effect of the saline injection? This is not clear now and it should be clarified.

      The explanations of the win-shift-win and lose-shift-lose strategies have been revised in the Results section on excitotoxic lesion experiment (page 9, lines 192–201) as described in our response to Comment 6. Win-shift-win is an indicator of correct responses, while lose-shift-lose indicates errors. Therefore, win-shift-win is predicted to increase, and lose-shift-lose decrease, as discrimination learning progresses. Indeed, in the results of the behavioral experiments, shown in Figure 1, both indicators change in a similar pattern to those in the results of the lesion experiments (Figure 3).

      We have added the explanation of the proportions of both strategies in intact rats (page 9, lines 203–204) with a supplementary figure (Figure S2) and accompanying legend (page 56, lines 1173–1177).

      Comment 8: Muscimol experiments - two questions/comments. How often do rats receive muscimol?

      In this section, muscimol is given on day 2 and on days after the animals hit a 60% or 80% success rate. Can the authors provide a mean and SEM for when are those injections?

      The first injection was conducted on Day 2 to target the early stage. The second and third injections were conducted on the days after the success rate had reached 60% and 80% for the first time through the training, respectively, to target the middle and late stage. respectively. These conditions are described in the Results (page 10, lines 234– 237) and Methods (page 26, lines 633–636). The mean and s.e.m. of the injection day at the middle and late stages were not significantly different between the saline and muscimol-injected groups into the aDLS (see Author response image 2A) and pVLS (see Author response image 2B).

      Author response image 2.

      Injection days during auditory discrimination learning. Injections with saline (SAL) and muscimol (MUS) into the aDLS (A) or pVLS (B) were performed after the success rate had reached 60% (middle stage) and 80% (late stage) for the first time through the training, respectively (A, Wilcoxon signed rank test, middle, Z = 65, p = 0.772, late, Z = 56.5, p = 0.242 for the aDLS; B, Wilcoxon signed rank test, middle, Z = 39, p = 1.000, late, Z = 43, p = 0.587). Data are indicated as the median and quartiles with the maximal and minimal values. 

      Comment 9: Muscimol experiments. Can the authors comment on the effects on performance vs learning? What happens on the days after Muscimol? Does performance bounce back or is it still impaired?

      We conducted a transient inhibition experiment with muscimol to examine whether the neuronal activity in the striatal subregions is linked with the processes at different stages. In this experiment, to lower the possibility that compensation of learning may occur during a session after the muscimol injection (Day N), we limited the session time to 15 min (45 trials) and evaluated the impact of the injection on the success rate at specific stages. The success rate in the muscimol-injected groups into the aDLS significantly decreased at the middle stage compared to the corresponding salineinjected groups, but not at the early and late stages (Figure 4C), and the rate in the muscimol groups into the pVLS significantly decreased at the late stage compared with the respective saline groups, but not at the early and middle stages (Figure 4D). Our results demonstrated that the aDLS and pVLS mainly function at the middle and late stages of the auditory discrimination task, respectively. 

      In addition, we here reply to comment 10 as for the comparison of success rates before (Day N-1) and after (Day N+1) the injections (see Author response image 3). We focused on two injections into the aDLS at the middle stage and into the pVLS at the late stage, in which the rate was reduced soon after the muscimol injection on Day N. The success rate for the two injections showed no significant main effect regarding group (saline/muscimol) or day (Days N-1/N+1) and no significant interactions for group × day. Moreover, the success rate was not significantly increased on Day N+1 as compared to Day N-1, even in the saline-injected control group, probably because of the limited session time soon after the injection. Therefore, we consider that it was difficult to define the effects of drug injection on the learning of auditory discrimination in our behavioral protocol for the transient inhibition experiment, and that the reduced rates observed in the muscimol-injected group on Day N mostly reflect the impacts of muscimol at least partly on the performance of discriminative behavior. 

      Author response image 3.

      Comparison of success rate between days before (Day N1) and after (Day N+1) the injections into striatal subregions. Success rate in the saline (SAL)- and muscimol (MUS)-injected groups into the aDLS (A) or pVLS (B) at the early, middle, and late stages of auditory discrimination learning (two-way repeated ANOVA; early, day, F[1,14] = 5.266, p = 0.038, group, F[1,14] = 0.276, p = 0.608, day × group, F[1,14] = 0.118, p = 0.736; middle, day, F[1,14] = 4.110, p = 0.062, group, F[1,14] = 0.056, p = 0.816, day × group, F[1,14] = 1.150, p = 0.302; late, day, F[1,14] = 6.408, p = 0.024, group, F[1,14] = 0.229, p = 0.640, day × group, F[1,14] = 1.277, p = 0.278 for the aDLS; and early, day, F[1,10] = 0.115, p = 0.746, group, F[1,10] = 2.414, p = 0.151, day × group, F[1,10] = 0.157, p = 0.700; middle, day, F[1,10] = 0.278, p = 0.610, group, F[1,10] = 0.511, p = 0.491, day × group, F[1,10] = 4.144, p = 0.069; late, day, F[1,10] = 0.151, p = 0.705, group, F[1,10] = 0.719, p = 0.416, day × group, F[1,10] = 0.717, p = 0.417 for the pVLS). Data are indicated as the mean ± s.e.m.

      Comment 10: Muscimol data has a pair before and after, can the authors show this comparison at early, middle, and late training? Not just the subtraction.

      The comparison of success rates before and after drug injection is shown in Author response image 3.

      Comment 11: Ephys recordings. These are complex figures and include a large number of acronyms. It would help to define them again and help the reader through these figures so the reader can focus on understanding the finding more than the figure presentation.

      We replaced the abbreviations related to electrophysiological events (CO, CR, RS, and FL) with the original terms, and improved the explanation in the text and figures. 

      Comment 12: Figure 7B/E - on correct trials, they see a difference in the cue response to high tone / low tone but no difference in the choice. This is the one that seemed like a topography issue.

      The transient activity of cue onset-related neurons in the pVLS did not appear at the early stage of learning, but was observed in a learning-dependent manner (Figures 7A and S8E). In addition, the cue onset-HR activity showed a slight but notable difference between the HR and LL trials at the middle and late stages (Figure 7B), whereas there was no difference between activities in the HL and LR incorrect trials at the corresponding stages (Wilcoxon signed rank test; early, p = 0.375, middle, p = 0.931, and late, p = 0.668). These results suggest that the cue onset-related neurons in the pVLS represents the stimulus and response association (task contingency) rather than the topography of tone frequency.

      Comment 13: Animals were normally trained for 60 minutes but on muscimol days only trained for 15 mins. On PET days only trained for 30 minutes. Ephys sessions were 60 mins. Is this correct? Why?

      We determined the session time for each experiment by considering both technical and behavioral aspects. In the initial behavioral experiment, the session time was set to 60 min per day. Under this condition, the rats acquired the discrimination learning within 13 days. In the imaging experiment, the session without a PET scan was conducted for 60 min, while the session with a PET scan was carried out for 30 min as described previously (Cui et al, Neuroimage, 2015). This time schedule produced a learning curve similar to that of the initial behavioral experiment. In the transient inhibition experiment, the sessions without drug injections lasted for 60 min. As described in our response to the comment 2, the time of the session soon after the injection was limited to 15 min to lower the possibility of compensation of learning during the session. In the chronic lesion and electrophysiological experiments, all sessions were conducted for 60 min, corresponding to the initial experiment. 

      References

      Mizuma, H., Shukuri, M., Hayashi, T., Watanabe, Y. & Onoe, H. Establishment of in vivo brain imaging method in conscious mice. Journal of Nuclear Medicine 51, 10681075 (2010).

      Cui, Y., et al. A voxel-based analysis of brain activity in high-order trigeminal pathway in the rat induced by cortical spreading depression. Neuroimage 108, 17-22 (2015).

      Zimmer, E.R., et al. [18 F] FDG PET signal is driven by astroglial glutamate transport. Nat Neurosci 20, 393-395 (2017).

      Valtcheva, S. & Venance, L. Astrocytes gate Hebbian synaptic plasticity in the striatum. Nature communications 7, 13845 (2016).

      Gimenez T.L., Lorenc M., Jaramillo S. Adaptive categorization of sound frequency does not require the auditory cortex in rats. J Neurophysiol 114:1137-1145 (2015).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath. (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods. (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The Notch signaling pathway plays an important role in many developmental and disease processes. Although well-studied there remain many puzzling aspects. One is the fact that as well as activating the receptor through trans-activation, the transmembrane ligands can interact with receptors present in the same cell. These cis-interactions are usually inhibitory, but in some cases, as in the assays used here, they may also be activating. With a total of 6 ligands and 4 receptors, there is potentially a wide array of possible outcomes when different combinations are co-expressed in vivo. Here the authors set out to make a systematic analysis of the qualitative and quantitative differences in the signaling output from different receptor-ligand combinations, generating sets of "signaling" (ligand expressing) and "receiving" (receptor +/- ligand expressing cells).

      The readout of pathway activity is transcriptional, relying on the fusion of GAL4 in the intracellular part of the receptor. Positive ligand interactions result in the proteolytic release of Gal4 that turns on the expression of H2B-citrine. As an indicator of ligand and receptor expression levels, they are linked via TA to H2B mCherry and H2B mTurq expression respectively. The authors also manipulate the expression of the glycosyltransferase Lunatic-Fringe (LFng) that modifies the EGF repeats in the extracellular domains impacting their interactions. The testing of multiple ligand-receptor combinations at varying expression levels is a tour de force, with over 50 stable cell lines generated, and yields valuable insights although as a whole, the results are quite complex.

      Strengths:

      Taking a reductionist approach to testing systematically differences in the signaling strength, binding strength, and cis-interactions from the different ligands in the context of the Notch1 and Notch 2 receptors (they justify well the choice of players to test via this approach) produces a baseline understanding of the different properties and leads to some unexpected and interesting findings. Notably:

      -                Jag1 ligand expressing cells failed to activate Notch1 receptor although were capable of activating Notch2. Conversely, Jag2 cells elicited the strongest activation of both receptors. The results with

      Jag1 are surprising also because it exhibits some of the strongest binding to plate-bound ligands. The failure to activate Notch1 has major functional significance and it will be important in the future to understand the mechanistic basis.

      -                Jagged ligands have the strongest cis-inhibitory effects and the receptors differ in their sensitivity to cis-inhibition by Dll ligands. These observations are in keeping with earlier in vivo and cell culture studies. More referencing of those would better place the work in context but it nicely supports and extends previous studies that were conducted in different ways.

      -                Responses to most trans-activating ligands showed a degree of ultrasensitivity but this was not the case for cis-interactions where effects were more linear. This has implications for the way the two mechanisms operate and for how the signaling levels will be impacted by ligand expression levels.

      -                Qualitatively similar results are obtained in a second cell line, suggesting they reflect fundamental properties of the ligands/receptors.

      We appreciate the positive and constructive feedback.

      Weaknesses:

      One weakness is that the methods used to quantify the expression of ligands and receptors rely on the co-translation of tagged nuclear H2B proteins. These may not accurately capture surface levels/correctly modified transmembrane proteins. In general, the multiple conditions tested partly compensate for the concerns - for example, as Jag1 cells do activate Notch2 even if they do not activate Notch1 some Jag1 must be getting to the surface. But even with Notch2, Jag1 activities are on the lower side, making it important to clarify, especially given the different outcomes with the plated ligands. Similarly, is the fact that all ligands "signalled strongest to Notch2" an inherent property or due to differences in surface levels of Notch 2 compared to Notch1? The results would be considerably strengthened by calibration of the ligand/receptor levels (and ideally their sub-cellular localizations). Assessing the membrane protein levels would be relatively straightforward to perform on some of the basic conditions because their ligand constructs contain Flag tags, making it plausible to relate surface protein to H2B, and there are antibodies available for Notch1 and Notch2.

      We agree that mCherry fluorescence does not provide a direct readout of active surface ligand levels. As the reviewer points out, the ability of Jag1 to activate Notch2 demonstrates that expressed Jag1 is competent for signaling. Further, in some cases, Jag1-Notch2 activation can be comparable to Dll1-Notch2 activation (Figure 2A). Following the reviewer’s suggestion, we performed a Western blot for multiple expression levels for each of three surface ligands (Dll1, Dll4, Jag1) (Figure 2—figure supplement 2). This blot revealed a signal for surface expression of Jag1. Interpretation is complicated by the expected dependence of the efficiency of surface protein purification on the number of primary amines in the protein, which varies among these ligands, and qualitatively correlates with the staining intensity. While this makes quantitative interpretation difficult, this result further supports the notion that Jag1 is present on the cell surface. Finally, we note that high signaling activity need not, in general, directly correlate with surface expression levels. In fact, one study showed an example in which increased ligand activity occurred with decreased basal ligand surface levels (Antfolk et al., 2017). While one would ideally like to know all parameters of the system, including surface protein levels, rates of recycling, etc. the perspective taken here is that the net effect of these many post-translational processing steps can be subsumed into the overall relationship between the expression of the protein (which, in our case, is read out by the co-translational reporter) and its activity, which is relevant for the behavior of developmental circuits, among other systems. To address this comment, we now explicitly mention the limitation of mCherry as a proxy for surface protein, and add a reference to previous work highlighting the relationship between surface levels and ligand activity.

      In terms of the dependence of signaling on Notch levels, the metric of signaling activity used here is explicitly normalized by the mTurquoise co-translational reporter of Notch expression to account for differences in receptor expression across receiver clones. We have added a new figure to show the variation in expression (Figure 1—figure supplement 1A) and to demonstrate this normalization (Figure 1—figure supplement 5). Having said that, as the reviewer correctly points out, we cannot directly address the dependence on surface receptor levels with mTurquoise alone. To address this comment, we have added a figure that shows cotranslational and surface receptor expression for a subset of our receiver clones (Figure 1—figure supplement 1B). Although antibody binding strengths may vary, it appears unlikely that higher surface levels could explain most ligands’ preferential activation of Notch2 over Notch1, since Notch2 levels were lower than Notch1 levels in both surface expression and cotranslational expression.

      Cis-activation as a mode of signaling has only emerged from these synthetic cell culture assays raising questions about its physiological relevance. Cis-activation is only seen at the higher ligand (Dll1, Dll4) levels, how physiological are the expression levels of the ligands/receptors in these assays? Is it likely that this would make a major contribution in vivo? Is it possible that the cells convert themselves into "signaling" and "receiving" sub-populations within the culture by post-translational mechanism? Again some analysis of the ligand/receptors in the cultures would be a valuable addition to show whether or not there are major heterogeneities.

      The cis-activation results in this paper are, as the reviewer points out, conducted in synthetic cell culture assays. Cis-activation is observed across a large dynamic range of ligand expression, possibly including non-physiologically high levels. However, our previous work (Nandagopal et al, eLife 2019) showed that cis-activation does not require over-expression, as it occurred in unmodified Caco-2 and NMuMG cells with their endogenous ligand and receptor expression levels. As shown here in Figure 4B, cis-activation for Notch2 increases monotonically and is substantial even at intermediate ligand concentrations. In other cases, cis-activation is maximal at intermediate concentrations. We agree that the in vivo role remains unclear, and is difficult to determine due to the typical close contacts among cells in tissues. Therefore, these assays do not speak to in vivo relevance. Note that we can, however, rule out the possibility of trans signaling between well-mixed cell populations at these densities (Figure 4A).

      It is hard to appreciate how much cell-to-cell variability in the "output" there is. For example, low "outputs" could arise from fewer cells becoming activated or from all cells being activated less. As presented, only the latter is considered. That may be already evident in their data, but not easy for the reader to distinguish from the way they are presented. For example, in many of the graphs, data have been processed through multiple steps of normalization. Some discussion/consideration of this point is needed.

      We agree that in different experiments changes in a mean response can reflect changes in fraction of activated cells, or level of activation or some combination of both. In this work, most assays were conducted by flow cytometry, which provides a full distribution of cellular responses. We provided distributions for some experiments in the supplementary figures (i.e., Figure 4—figure supplement 1, and Figure 5—figure supplement 4). The sheer number of experiments and samples prevents us from displaying all underlying histograms. Therefore, we have provided all flow data sets in an extensive archive that is publicly available on data.caltech.edu (https://doi.org/10.22002/gjjkn-wrj28).

      Impact:

      Overall, cataloging the outcomes from the different ligand-receptor combinations, both in cis and trans, yields a valuable baseline for those investigating their functional roles in different contexts. There is still a long way to go before it will be possible to make a predictive model for outcomes based on expression levels, but this work gives an idea about the landscape and the complexities. This is especially important now that signaling relationships are frequently hypothesized based on single-cell transcriptomic data. The results presented here demonstrate that the relationships are not straightforward when multiple players are involved.

      We appreciate this concise impact summary, and agree with its conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors extend their previous studies on trans-activation, cis-inhibition (PMID: 25255098), and cis-activation (PMID: 30628888) of the Notch pathway. Here they create a large number of cell lines using CHO-K1 and C2C12 cells expressing either Notch1-Gal4 or Notch2-Gal4 receptors which express a fluorescent protein upon receptor activation (receiver cells). For cis-inhibition and cis-activation assays, these cells were engineered to express one of the four canonical Notch ligands (Dll1, Dll4, Jag1, Jag2) under tetracycline control. Some of the receiver cells were also transfected with a Lunatic fringe (Lfng) plasmid to produce cells with a range of Lfng expression levels. Sender cells expressing all of the canonical ligands were also produced. Cells were mixed in a variety of co-culture assays to highlight trans-activation, cis-activation, and cis-inhibition. All four ligands were able to trans-activate Notch1 and Notch 2, except Jag1 did not transactivate Notch1. Lfng enhanced trans-activation of both Notch receptors by Dll1 and Dll2, and inhibited Notch1 activation by Jag2 and Notch2 activation by both Jag 1 and Jag2. Cis-expression of all four ligands was predominantly inhibitory, but Dll1 and Dll4 showed strong cis-activation of Notch2. Interestingly, cis-ligands preferentially inhibited trans-activation by the same ligand, with varying effects on other trans-ligands.

      Strengths:

      This represents the most comprehensive and rigorous analysis of the effects of canonical ligands on cis- and trans-activation, and cis-inhibition, of Notch1 and Notch2 in the presence or absence of Lfng so far. Studying cis-inhibition and cis-activation is difficult in vivo due to the presence of multiple Notch ligands and receptors (and Fringes) that often occur in single cells. The methods described here are a step towards generating cells expressing more complex arrays of ligands, receptors, and Fringes to better mimic in vivo effects on Notch function.

      In addition, the fact that their transactivation results with most ligands on Notch1 and 2 in the presence or absence of Lfng were largely consistent with previous publications provides confidence that the author's assays are working properly.

      We appreciate the thoughtful comments and feedback.

      Weaknesses:

      It was unusual that the engineered CHO cells expressing Notch1-Gal4 were not activated at all by co-culture with Jag1-expressing CHO cells. Many previous reports have shown that Jag1 can activate Notch1 in co-culture assays, including when Notch1 was expressed in CHO cells. Interestingly, when the authors used Jag1-Fc in a plate coating assay, it did activate Notch1 and could be inhibited by the expression of Lfng.

      In our assays, we do in fact also see some signaling of Jag1 to Notch1, especially when dLfng is coexpressed (Figure 2—figure supplement 4, formerly Figure 2—figure supplement 3). While these levels are lower than those observed for other ligand-receptor combinations, they are significantly elevated compared to baseline. In specific natural contexts, it will be important to determine whether the weak but non-zero Jag1-Notch1 signaling acts negatively to suppress signaling from other ligands, or provides weak but potentially functionally important levels of signaling. Evidence for both modes exists in the literature. To address this, we have expanded the discussion of Jag1-Notch1 signaling and added references to other work on Jag1-Notch1 signaling to the Discussion section.

      The cell surface level of the ligands was determined by flow cytometry of a co-translated fluorescent protein. Some calibration of the actual cell surface levels with the fluorescent protein would strengthen the results.

      This issue was also raised by Reviewers #1 and #3. Please see responses to Reviewer #1, above.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports a comprehensive analysis of Notch-Delta/Jagged signaling inclusive of the human Notch1 and Notch2 receptors and DLL1, DLL4, JAG1, and JAG2 ligands. Measurements

      encompassed signaling activity for ligand trans-activation, cis-activation, cis-inhibition, and activity modulation by Lfng. The most striking observations of the study are that JAG1 has no detectable activity as a Notch1 ligand when presented on a cell (though it does have activity when immobilized on a surface), even though it is an effective cis-inhibitor of Notch1 signaling by other ligands, and that DLL1 and DLL4 exhibit cis-activating activity for Notch1 and especially for Notch2. Notwithstanding the artificiality of the system and some of its shortcomings, the results should nevertheless be a valuable resource for the Notch signaling community.

      Strengths:

      (1)  The work is systematic and comprehensive, addressing questions that are of importance to the community of researchers investigating mammalian Notch proteins, their activation by ligands, and the modulation of ligand activity by LFng.

      (2)  A quantitative and thorough analysis of the data is presented.

      Weaknesses:

      (1) The manuscript is primarily descriptive and does not delve into the underlying, mechanistic origin or source of the different ligand activities.

      We agree that the goals of this paper were largely to discover the range of signaling modes that occur. A mechanistic analysis would be beyond the scope of this work, but we agree it is an important next step.

      (2) The amount of ligand or receptor expressed is inferred from the flow cytometry signal of a co-translated fluorescent protein-histone fusion, and is not directly measured. The work would be more compelling if the amount of ligand present on the cell surface were directly measured with anti-ligand antibodies, rather than inferred from measurements of the fluorescent protein-histone fusion.

      This issue was also raised by Reviewers #1 and #2. Please see responses to Reviewer #1, above.

      (3) It would be helpful to see plots of the raw activity data before transformation and normalization, because the plots present data after several processing steps, and it is not clear how the processed data relate to the original values determined in each measurement.

      We included examples showing how raw data is processed in Figure 4—figure supplement 1 and Figure 5—figure supplement 4. The sheer number of experiments precludes including similar figures for all data sets. However, all raw and processed data and data analysis code is publicly available at (https://doi.org/10.22002/gjjkn-wrj28).

      (4) The authors use sparse plating of engineered cells with parental (no ligand or receptor-expressing cell to measure cis activation). However, the cells divide within the cultured period of 22-24 h and can potentially trans-activate each other.

      If measured cis-activation signal arises solely from trans-activation, then the measured cis-activation signal per cell should increase with cell density, since trans-activation per cell does depend on cell density (Figure 4A). However, for the strongest cis-activators (Dll1- and Dll4-Notch2), signaling magnitude is similar when these cells are cultured sparsely or at confluence, which would otherwise allow efficient trans signaling (Figure 5A). Thus, for Dll1- and Dll4-Notch2 receivers, total signaling strength per cell depends little or not at all on the opportunity to signal intercellularly. Moreover, cis-activation signal for the Dll1- and Dll4-Notch2 combinations exceeded the maximum trans-signaling levels we could achieve for the same receivers when cis-ligand was suppressed (Figure 4B). These results argue that cis interactions dominate signaling in this context. However, we have not ruled out the possibility that trans-signaling between sister cells after division contributes to the comparatively weak cis-activation observed for Notch1 receivers.

      Reviewer #1 (Recommendations For The Authors):

      As outlined in the public review, there is a question of whether the nuclear H2B accurately reflects the surface levels of the transmembrane proteins (ligand and receptor). Clearly, it would not be feasible to check levels in all of the experimental conditions, but some baseline conditions should be analyzed.

      We addressed this above.

      Reviewer #2 (Recommendations For The Authors):

      (1)  As mentioned above, it was unusual that Jag1 did not activate Notch1 in co-culture assays, but did activate Notch1 in plate-coating assays. The authors should add some text to the Discussion to explain why they think this is happening in their engineered cells. One possibility is that the CHO cells express Manic fringe (Mfng) which is known to reduce Jag1-Notch1 activation. Data for Mfng levels in CHO cells were not included in Supplemental Table 2. Knocking down all three Fringes in CHO cells might increase Jag1-Notch1 activation.

      This is already addressed in a sentence in the results: “Strikingly, while Jag1 sender cells failed to activate Notch1 receivers above background (Figure 2D), plate-bound Jag1-ext-Fc activated Notch1 only ~3-fold less efficiently than it activated Notch2 (Figure 3B-D). This suggests that the natural endocytic activation mechanism, or potential differences in tertiary structure between the expressed and recombinant Jag1 extracellular domains, could play roles in preventing Jag1-Notch1 signaling in coculture.” Regarding the point about Mfng, we added a note to Supplementary Table about other CHO-K1 expression data.

      (2) Figure 1-supplemental figure 1: Both the Notch1-Jag1 and Notch1-Jag2 cells show high expression of Jag1 in low 4epi, but any higher concentration reduces to control levels. How much of a problem is this for interpreting your data?

      This was not the ideal behavior, but by binning cells by co-translational reporters for ligand expression, we were able to obtain enough cells in intermediate bins. (Note: Figure 1—figure supplement 1 is now Figure 1—figure supplement 2.)

      (3)  Figure 1C legend: Are these stably-expressing cells or Tet-off cells? Please state in legend.

      The figure legend has been updated.

      (4)  Figure 1E: How long is the knockdown of Rfng and Lfng effective? Does it affect the expression of Lfng later?

      siRNA effects generally last for at least 72-96 hours, so we do not anticipate this being an issue.

      (5) Page 9: "Lfng significantly decreased trans-activation of both receptors by Jag1 (>2.5-fold)". If there is no Jag1-Notch1 activation, how can Lfng decrease trans-activation?

      We added a note in the main text to clarify that while Jag1-Notch1 signaling is relatively low, it can still be detectably decreased.

      (6) Figure 4A legend: Please define what "2.5k ea senders and Rec" means. In the text, it says "To focus on cis-interactions alone, we then cultured receiver cells at low density, amid an excess of wildtype CHO-K1 cells" (page 14).

      This was clarified in the text.

      (7)  Page 14: "By contrast, Notch2 was cis-activated by both Dll1 and Dll4, to levels exceeding those produced by trans-activation by high-Dll1 senders (Figure 4B, lower left)." Where is the trans-activation data? 4B, lower right?

      We updated this reference in the main text.

      (8)  Page 16: "For Notch2-Dll1 and Notch2-Dll4, single cell reporter activities correlated with cis-ligand expression, regardless of whether cells were pre-induced at a high or low culture density (Figure 4D)." It appears that Notch2-Dll1 has lower Notch activation at sparse culture than confluent.

      We agree that the level signaling is lower in sparse compared to confluent on average. This is explained by the sensitivity of the Tet-OFF promoter to culture density (Figure 4—figure supplement 2). However, the key point of this experiment is the positive correlation, which is consistent with cis-activation, and inconsistent with the pre-generation of NEXT hypothesis diagrammed in Figure 4C, which would not be expected to produce such a correlation.

      (9a) For the creation of the C2C12-Nkd cells: Has genomic sequencing been done to confirm editing of Notch2 and Jag1 loci?

      We confirmed the knockdown but did not do genomic sequencing.

      (9b) The gel in Figure 7-Supplement 1C is not adequate for showing loss of Jag1. It should be repeated.

      In this case, we have only the single gel. We added a note in figure legend that no duplicate was performed.

      (10) Figure 7A: Which Fringes are expressed in C2C12 cells? You should provide a rationale for knocking down just Rfng.

      Figure 7—figure supplement 1A shows the levels of expression in C2C12. Note that Mfng is not highlighted because its levels were undetectable.

      (11) Figure 7-Supplement 1D: This is confusing. Notch2 levels are not reduced in the left panel, and Notch1 and Notch2 levels are not reduced in the right panel?

      C2C12-Nkd cells exhibit reduced levels of Notch1 and Notch3. This can be seen in Figure 7—figure supplement 1A. Panel D presents the results of additional siRNA knockdown, performed to prevent subsequent up-regulation of Notch1 and Notch3 during the assay. These knockdown results were variable, as shown. The Notch2 siRNA knockdown was not essential for these experiments, but performed despite very low levels of Notch2 to begin with. In the revision, we have added this note to the Methods.

      Reviewer #3 (Recommendations For The Authors):

      (1) The results section of the manuscript is very dense and difficult to follow, as are the figure legends.

      We appreciate the criticism, and regret that it is not easier to read in its current form.

      (2) The authors could emphasize areas of concordance with published results (where available) to place their artificial, engineered system into a better biological context. Are there any examples of studies in whole organisms where cis-activation plays a role?

      We are not aware of examples of cis-activation in whole organisms at this point.

      (3) How do the authors rationalize the different responses of Notch1 to cell-presented Jag1 as opposed to immobilized Jag1, where its signal strength is second in rank order on a molar basis?

      This comment was addressed above in response to the first recommendation from Reviewer #2.

      It is also difficult to understand Figure 2_—_figure Supplement 3B, in which it appears that Jag1 induces a Notch1 reporter response when LFng is knocked down (dLfng), and how those data relate to the inactive response to Jag1 shown in the main figures.

      The issue here is a difference of normalization. Figure 2A in the main text is normalized to the sender expression level, i.e. relative signaling strength. By contrast, Figure 2—figure supplement 4B (previously Figure 2—figure supplement 3B) shows absolute signaling activity, which can appear higher because it does not normalize for ligand expression. For Jag1-Notch1 signaling in particular, substantial signaling required very high levels of Jag1. We have added a new figure to demonstrate these two types of normalization (Figure 2—figure supplement 1A).

      See the Authr response image 1 below for a direct comparison of these two normalization modes using data from both Figure 2A and Figure 2—figure supplement 4B. Note how the Jag1-Notch1 signaling activities that are nonzero in the top plot go to zero in the bottom plot as a result of normalizing the values to ligand expression.

      Author response image 1. Comparison of normalization modes in Figure 2A and Figure 2—figure supplement 4B (formerly 3B). Normalized trans-activation signaling activities for different ligand-receptor combinations (with dLfng only), either with further normalization to ligand expression (bottom row) or without further normalization (top row). Normalized signaling activity is defined as reporter activity (mCitrine, A.U.) divided by cotranslational receptor expression (mTurq2, A.U.), normalized to the strongest biological replicate-averaged signaling activity across all ligand-receptor-Lfng combinations in this experiment. Saturated data points, defined here as those with normalized signaling activity over 0.75 in both dLfng and Lfng conditions, were excluded. Colors indicate the identity of the trans-ligand expressed by cocultured sender cells. Error bars denote bootstrapped 95% confidence intervals (Methods), in this case sampled from the number of biological replicates given in the legend—n1 (for Notch1) or n2 (for Notch2). See Methods and Figure 2A caption for more details. Note that the only difference between this figure and the new Figure 2—figure supplement 1A is that this figure additionally includes the Jag1-high data from Figure 2—figure supplement 4B.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study evaluates the evolutionary significance of variations in the accuracy of the intron-splicing process across vertebrates and insects. Using a powerful combination of comparative and population genomics approaches, the authors present convincing evidence that species with lower effective population size tend to exhibit higher rates of alternative splicing, a key prediction of the drift-barrier hypothesis. The analysis is carefully conducted and all observations fit with this hypothesis, but focusing on a greater diversity of metazoan lineages would make these results even more broadly relevant. This study will strongly appeal to anyone interested in the evolution of genome architecture and the optimisation of genetic systems.

      Public Reviews):

      Reviewer #1 (Public Review:

      Summary:

      Functionally important alternative isoforms are gold nuggets found in a swamp of errors produced by the splicing machinery.

      The architecture of eukaryotic genomes, when compared with prokaryotes, is characterised by a preponderance of introns. These elements, which are still present within transcripts, are rapidly removed during the splicing of messenger RNA (mRNA), thus not contributing to the final protein. The extreme rarity of introns in prokaryotes, and the elimination of these introns from mRNAs before translation into protein, raises questions about the function of introns in genomes. One explanation comes from functional biology: introns are thought to be involved in post-transcriptional regulation and in the production of translational variants. The latter function is possible when the positions of the edges of the spliced intron vary. While some light has been shed on specific examples of the functional role of alternative splicing, to what extent are they representative of all introns in metazoans?

      In this study, the hypothesis of a functional role for alternative splicing, and therefore to a certain extent for introns, is evaluated against another explanation coming from evolutionary biology: isoforms are above all errors of imprecision by the molecular machinery at work during splicing. This hypothesis is based on a principle established by Motoo Kimura, which has become central to population genetics, explaining that the evolutionary trajectory of a mutation with a given effect is intimately linked to the effective population size (Ne) where this mutation emerges. Thus, the probability of fixation of a weakly deleterious mutation increases when Ne decreases, and the probability of fixation of a weakly advantageous mutation increases when Ne increases. The genomes of populations with low Ne are therefore expected to accumulate more weakly deleterious mutations and fewer weakly advantageous mutations than populations with high Ne. In this framework, if splicing errors have only small effects on the fitness of individuals, then natural selection cannot increase the precision of the splicing machinery, allowing tolerance for the production of alternative isoforms.

      In the past, the debate opposed one-off observations of effectively functional isoforms on the one hand, to global genomic quantities describing patterns without the possibility of interpreting them in detail. The authors here propose an elegant quantitative approach in line with the expected continuous variation in the effectiveness of selection, both between species and within genomes. The result describing the inter-specific pattern on a large scale confirms what was already known (there is a negative relationship between effective size and average alternative splicing rate). The essential novelty of this study lies in 1) the quantification, for each intron studied, of the relative abundance of each isoform, and 2) the analysis of a relationship between this abundance and the evolutionary constraints acting on these isoforms.

      What is striking is the light shed on the general very low abundance of alternative isoforms. Depending on the species, 60% to 96% of cases of alternatively spliced introns lead to an isoform whose abundance is less than 5% of the total variants for a given intron.

      In addition to the fact that 60 %-96% of the total isoforms are more than 20 times less abundant than their majority form, this large proportion of alternative isoforms exhibit coding-phase shift at rates similar to what would be expected by chance, i.e. for a third of them, which reinforces the idea that there is no particular constraint on these isoforms.

      The remaining 4%-40% of isoforms see their coding-phase shift rate decrease as their relative abundance increases. This result represents a major step forward in our understanding of alternative splicing and makes it possible to establish a quantitative model directly linking the relative abundance of an isoform with a putative functional role concerning only those isoforms produced in abundance. Only the (rare) isoforms which are abundantly produced are thought to be involved in a biological function.

      Within the same genome, the authors show that only highly expressed genes, i.e. those that tend to be more constrained on average, are also the genes with the lowest alternative splicing rates on average.

      The comparison between species in this study reveals that the smaller the effective size of a species, the more its genome produces isoforms that are low in abundance and low in constraint. Conversely, species with a large effective size relatively reduce rare isoforms, and increase stress on abundant isoforms. To sum up:

      • the higher the effective size of a species, the fewer introns are spliced.

      • highly expressed genes are spliced less.

      • when splicing occurs, it is mainly to produce low-abundance isoforms.

      • low-abundance isoforms are also less constrained.

      Taken together, these results reinforce a quantitative view of the evolution of alternative splicing as being mainly the product of imprecision in the splicing machinery, generating a great deal of molecular noise. Then, out of all this noise, a few functional gold nuggets can sometimes emerge. From the point of view of the reviewer, the evolutionary dynamics of genomes are depressing. The small effective population sizes are responsible for the accumulation of multiple slightly deleterious introns. Admittedly, metazoan genomes try to get rid of these introns during RNA maturation, but this mechanism is itself rendered imprecise by population sizes.

      Strengths:

      • The authors simultaneously study the effects of effective population size, isoform abundance, and gene expression levels on the evolutionary constraints acting on isoforms. Within this framework, they clearly show that an isoform becomes functionally important only under certain rare conditions.

      • The authors rule out an effect putatively linked to variations in expression between different organs which could have biased comparisons between different species.

      Weaknesses:

      • While the longevity of organisms as a measure of effective size seems to work overall, it may not be relevant for discriminating within a clade. For example, within Hymenoptera, we might expect them to have the same overall longevity, but that effective size would be influenced more by the degree of sociality: solitary bees/ants/wasps versus eusocial. I am therefore certain that the relationship shown in Figure 4D is currently not significant because the measure of effective size is not relevant for Hymenoptera. The article would have been even more convincing by contrasting the rates of alternative splicing between solitary versus social hymenopterans.

      As suggested by the reviewer, we investigated the degree of sociality for the 18 hymenopterans included in our study. We observed that the average dN/dS of the 12 eusocial species (4 bees, 6 ants, 2 wasps) is significantly higher than that of the 6 solitary species (p=2.1x10-3; Fig. R1A), consistent with a lower effective population size in eusocial species compared to solitary ones.

      However, the AS rate does not differ significantly between these two groups, neither for the full set of major-isoform introns (Author response image 1B), nor for the subsets of low-AS or high-AS major-isoform introns (Author response image 1C,D). Given the limited sample size (12 eusocial species, 6 solitary species), it is possible that some uncontrolled variables affecting the AS rate hide the impact of Ne.

      Author response image 1.

      Comparison of solitary (N=6) and eusocial hymenopterans (N=12). A: dN/dS ratio. B: AS rate (all major-isoform introns). C: AS rate (low-AS major-isoform introns). D: AS rate (high-AS major-isoform introns). The means of the two group were compared with a Wilcoxon test.

      • When functionalist biologists emphasise the role of the complexity of living things, I'm not sure they're thinking of the comparison between "drosophila" and "homo sapiens", but rather of a broader evolutionary scale. Which gives the impression of an exaggeration of the debate in the introduction.

      We disagree with the referee: in fact, all the debate regarding the paradox of the absence of relationship between the number of genes and organismal complexity arose from the comparative analysis of gene repertoires across metazoans. This debate started in the early 2000’s, when the sequencing of the human genome revealed that it contains only ~20,000 protein-coding genes (far less than the ~100,000 genes that were expected at that time). This came as a big surprise because it showed that the gene repertoire of mammals is not larger than that of invertebrates such as Caenorhabditis elegans (19,000 genes) or Drosophila melanogaster (14,000 genes) . We cite below several articles that illustrate how this paradox has been perceived by the scientific community:

      Graveley BR 2001 Alternative splicing: increasing diversity in the proteomic world. Trends in Genetics 17 : 100–107. https://doi.org/10.1016/S0168-9525(00)02176-4

      “ How can the genome of Drosophila melanogaster contain fewer genes than the undoubtedly simpler organism Caenorhabditis elegans? ”

      Ewing B and Green P 2000 Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics 25: 232–234. https://doi.org/10.1038/76115

      “ the invertebrates Caenorhabditis elegans and Drosophila melanogaster having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes […] approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number. ”

      Kim E, Magen A and Ast G 2007 Different levels of alternative splicing among eukaryotes. Nucleic Acids Research 35: 125–131. https://doi.org/10.1093/nar/gkl924

      “we reveal that the percentage of genes and exons undergoing alternative splicing is higher in vertebrates compared with invertebrates. […] The difference in the level of alternative splicing suggests that alternative splicing may contribute greatly to the mammal higher level of phenotypic complexity,”

      Nilsen TW and Graveley BR 2010 Expansion of the eukaryotic proteome by alternative splicing. Nature 463 : 457–463. https://doi.org/10.1038/nature08909

      “ It is noteworthy that Caenorhabditis elegans, D. melanogaster and mammals have about 20,000 (ref. 68), 14,000 (ref. 69) and 20,000 (ref. 70) genes, respectively, but mammals are clearly much more complex than nematodes or flies.”

      Reviewer #2 (Public Review):

      Summary:

      Two hypotheses could explain the observation that genes of more complex organisms tend to undergo more alternative splicing. On one hand, alternative splicing could be adaptive since it provides the functional diversity required for complexity. On the other hand, increased rates of alternative splicing could result through nonadaptive processes since more complex organisms tend to have smaller effective population sizes and are thus more prone to deleterious mutations resulting in more spurious splicing events (drift-barrier hypothesis). To evaluate the latter, Bénitière et al. analyzed transcriptome sequencing data across 53 metazoan species. They show that proxies for effective population size and alternative splicing rates are negatively correlated. Furthermore, the authors find that rare, nonfunctional (and likely erroneous) isoforms occur more frequently in more complex species. Additionally, they show evidence that the strength of selection on splice sites increases with increasing effective population size and that the abundance of rare splice variants decreases with increased gene expression. All of these findings are consistent with the drift-barrier hypothesis.

      This study conducts a comprehensive set of separate analyses that all converge on the same overall result and the manuscript is well organized. Furthermore, this study is useful in that it provides a modified null hypothesis that can be used for future tests of adaptive explanations for variation in alternative splicing.

      Strengths:

      The major strength of this study lies in its complementary approach combining comparative and population genomics. Comparing evolutionary trends across phylogenetic diversity is a powerful way to test hypotheses about the origins of genome complexity. This approach alone reveals several convincing lines of evidence in support of the drift-barrier hypothesis. However, the authors also provide evidence from a population genetics perspective (using resequencing data for humans and fruit flies), making results even more convincing.

      The authors are forward about the study's limitations and explain them in detail. They elaborate on possible confounding factors as well as the issues with data quality (e.g. proxies for Ne, inadequacies of short reads, heterogeneity in RNA-sequencing data).

      Weaknesses:

      The authors primarily consider insects and mammals in their study. This only represents a small fraction of metazoan diversity. Sampling from a greater diversity of metazoan lineages would make these results and their relevance to broader metazoans substantially more convincing. Although the authors are careful about their tone, it is challenging to reconcile these results with trends across greater metazoans when the underlying dataset exhibits ascertainment bias and represents samples from only a few phylogenetic groups. Relatedly, some trends (such as Figure 1B-C) seem to be driven primarily by non-insect species, raising the question of whether some results may be primarily explained by specific phylogenetic groups ( although the authors do correct for phylogeny in their statistics). How might results look if insects and mammals (or vertebrates) are considered independently?

      Following the referee’s suggestion, we investigated the relationship between AS rate and proxies of Ne, separately for insects and vertebrates (Supplementary Fig. 11) . We observed that the relationship was consistent in vertebrates and insects: linear regressions show a positive correlation, significant (p<0.05) in all cases, except for body length in vertebrates. We added a sentence (line 166) to mention this point.

      Note that for these analyses we have smaller sample sizes, so we have a weaker power to detect signal. We therefore prefer to present the combined analyses, using PGLS to account for phylogenetic inertia.

      Throughout the manuscript, the authors refer to infrequently spliced ( mode <5%) introns as "minor introns" and frequently spliced (mode >95%) as "major introns". This is extremely confusing since "minor introns" typically represent introns spliced by the U12 spliceosome, whereas "major introns" are those spliced by the U2 spliceosome.

      To avoid any confusion, we modified the terminology: we now refer to infrequently spliced introns as " minor-isoform introns" and frequently spliced as "major -isoform introns" (see line 135-137) . The entire manuscript (including the figures) has been modified accordingly.

      Furthermore, it remains unclear whether the study only considers major introns or both major and minor introns. Minor introns typically have AT-AC splice sites whereas major introns usually have GT/GC-AG splice sites, although in rare cases the U2 can recognize AT-AC (see Wu and Krainer 1997 for example).

      We modified the text (line 148-150) to clearly state that we studied all introns, both U2-type and U12-type.

      The authors also note that some introns show noncanonical AT-AC splice sites while these are actually canonical splice sites for minor introns.

      This is corrected (line 148).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Figures 1, 3, and 4: I suggest that authors add regression lines.

      We added the regression lines with the “pgls” function from the R package “caper” (in Fig. 1, 3 and 4, and also in all other figures where we present correlations).

      Figure 2: As previously mentioned, the terms "minor introns" and "major introns" are extremely confusing. I strongly suggest the authors use different naming conventions.

      We changed the terminology:

      minor introns -> minor-isoform introns

      major introns -> major-isoform introns

      Figure 5: Intron-exon boundaries and splice site annotations are shown at the bottom of B, C, and D but not A. I suggest removing the annotation beneath B for consistency and since A+C and B+D are aligned on the x-axis.

      Corrected, it was a mistake.

      Figure 7: The yellow dotted line is very challenging to see in A.

      Corrected, the line has been widened.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon. 

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.  

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.  

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.  

      Strengths: 

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.  

      Weaknesses: 

      Suggestions for refinement:  

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells? 

      This is an excellent suggestion. We have gene expression data on WT versus DNMT1 KO HAP1 cells and have included them now as Suppl. Figure S1. The  transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. 

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1. 

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ([1], Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor azadeoxycytidine (Author response images 2 and 3). These findings are in accordance with the observation  that inhibition of DNA methyltransferase activity by aza-deoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in up-regulation of L1TD1 [2]. Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We have included this information in the last paragraph of the Introduction in the revised manuscript.

      Author response image 1. RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice [1]. Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test). 

      Author response image 2. RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3. Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C)  RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. *P < 0.05, **P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing. 

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability. We have added a corresponding clarification in the Results section on page 8, last paragraph. 

      Based on previous studies with hESCs and germ cell tumors [3], it is likely that, in addition to its role in retrotransposition, L1TD1 has further functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability. This is in agreement with the observation that a subset of L1TD1 associated transcripts encode proteins involved in the control of cell division and cell cycle. It is possible that subtle changes in the expression of these protein that were not detected in our mass spectrometry approach contribute to the antiproliferative effect of L1TD1 depletion as discussed in the Discussion section of the revised manuscript. 

      Reviewer #2 (Public Review):           

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.   

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):        

      Major 

      (1) The study only used one knockout (KO) cell line generated by CRISPR/Cas9. Considering the possibility of an off-target effect, I suggest the authors attempt one or both of these suggestions. 

      A) Generate or acquire a similar DMNT1 deletion that uses distinct sgRNAs, so that the likelihood of off-targets is negligible. A few simple experiments such as qRT-PCR would be sufficient to suggest the same phenotype.  

      B) Confirm the DNMT1 depletion also by siRNA/ASO KD to phenocopy the KO effect.  (2) In addition to the strategies to demonstrate reproducibility, a rescue experiment restoring DNMT1 to the KO or KD cells would be more convincing. (Partial rescue would suffice in this case, as exact endogenous expression levels may be hard to replicate). 

      We have undertook several approaches to study the effect of DNMT1 loss or inactivation: As described above, we have generated a conditional KO mouse with ablation of DNMT1 in the epidermis. DNMT1-deficient keratinocytes isolated from these mice show a significant increase in L1TD1 expression.  In addition, treatment of primary human keratinocytes and two squamous cell carcinoma cell lines with the DNMT inhibitor aza-deoxycytidine led to upregulation of L1TD1 expression. Thus, the derepression of L1TD1 upon loss of DNMT1 expression or activity is not a clonal effect. Also, the spectrum of RNAs identified in RIP experiments as L1TD1-associated transcripts in HAP1 DNMT1 KO cells showed a strong overlap with the RNAs isolated by a related yet different method in human embryonic stem cells. When it comes to the effect of L1TD1 on L1-1 retrotranspostion, a recent study has reported a similar effect of L1TD1 upon overexpression in HeLa cells [4].  

      All of these points together help to convince us that our findings with HAP1 DNMT KO are in agreement with results obtained in various other cell systems and are therefore not due to off-target effects. With that in mind, we would pursue the suggestion of Reviewer 1 to analyze the effects of DNA hypomethylation upon DNMT1 ablation.

      (3) As stated in the introduction, L1TD1 and ORF1p share "sequence resemblance" (Martin 2006). Is the L1TD1 antibody specific or do we see L1 ORF1p if Fig 1C were uncropped?  (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).  

      This is a relevant question. We are convinced that the L1TD1 antibody does not crossreact with L1 ORF1p for the following reasons: Firstly, the antibody does not recognize L1 ORF1p (40 kDa) in the  uncropped Western blot for Figure 1C (Author response image 4A). Secondly, the L1TD1 antibody gives only background signals in DKO cells in the  indirect immunofluorescence experiment shown in Figure 1E of the manuscript. 

      Thirdly, the immunogene sequence of L1TD1 that determines the specificity of the antibody was checked in the antibody data sheet from Sigma Aldrich. The corresponding epitope is not present in the L1 ORF1p sequence. Finally, we have shown that the ORF1p antibody does not cross-react with L1TD1 (Author response image 4B).

      Author response image 4. (A) Uncropped L1TD1 Western blot shown in Figure 1C. An unspecific band is indicated by an asterisk. (B) Westernblot analysis of WT, KO and DKO cells with L1 ORF1p antibody.

      (4) In abstract (P2), the authors mentioned that L1TD1 works as an RNA chaperone, but in the result section (P13), they showed that L1TD1 associates with L1 ORF1p in an RNAindependent manner. Those conclusions appear contradictory. Clarification or revision is required. 

      Our findings that both proteins bind L1 RNA, and that L1TD1 interacts with ORF1p are compatible with a scenario where L1TD1/ORF1p heteromultimers bind to L1 RNA. The additional presence of L1TD1 might thereby enhance the RNA chaperone function of ORF1p. This model is visualized now in Suppl. Figure S7C. 

      (5) Figure 2C fold enrichment for L1TD1 and ARMC1 is a bit difficult to fully appreciate. A 100 to 200-fold enrichment does not seem physiological. This appears to be a "divide by zero" type of result, as the CT for these genes was likely near 40 or undetectable. Another qRT-PCRbased approach (absolute quantification) would be a more revealing experiment. 

      This is the validation of the RIP experiments and the presentation mode is specifically developed for quantification of RIP assays (Sigma Aldrich RIP-qRT-PCR: Data Analysis Calculation Shell). The unspecific binding of the transcript in the absence of L1TD1 in DNMT1/L1TD1 DKO cells is set to 1 and the value in KO cells represents the specific binding relative the unspecific binding. The calculation also corrects for potential differences in the abundance of the respective transcript in the two cell lines. This is not a physiological value but the quantification of specific binding of transcripts to L1TD1. GAPDH as negative control shows no enrichment, whereas specifically associated transcripts show strong enrichement. We have explained the details of RIPqRT-PCR evaluation in Materials and Methods (page 14) and the legend of Figure 2C in the revised manuscript.       

      (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).            

      See response to (3).  

      (7) Figure S4A and S4B: There appear to be a few unusual aspects of these figures that should be pointed out and addressed. First, there doesn't seem to be any ORF1p in the Input (if there is, the exposure is too low). Second, there might be some L1TD1 in the DKO (lane 2) and lane 3. This could be non-specific, but the size is concerning. Overexposure would help see this.

      The ORF1p IP gives rise to strong ORF1p signals in the immunoprecipitated complexes even after short exposure. Under these contions ORF1p is hardly detectable in the input. Regarding the faint band in DKO HAP1 cells, this might be due to a technical problem during Western blot loading. Therefore, the input samples were loaded again on a Western blot and analyzed for the presence of ORF1p, L1TD1 and beta-actin (as loading control) and shown as separate panel in Suppl. Figure S4A. 

      (8) Figure S4C: This is related to our previous concerns involving antibody cross-reactivity. Figure 3E partially addresses this, where it looks like the L1TD1 "speckles" outnumber the ORF1p puncta, but overlap with all of them. This might be consistent with the antibody crossreacting. The western blot (Figure 3C) suggests an upregulation of ORF1p by at least 2-3x in the DKO, but the IF image in 3E is hard to tell if this is the case (slightly more signal, but fewer foci). Can you return to the images and confirm the contrast are comparable? Can you massively overexpose the red channel in 3E to see if there is residual overlap? 

      In Figure 3E the L1TD1 antibody gives no signal in DNMT1/L1TD1 DKO cells confirming that it does not recognize ORF1p. In agreement with the Western blot in Figure 3C the L1 ORF1p signal in Figure 3E is stronger in DKO cells. In DNMT1 KO cells the L1 ORF1p antibody does not recognize all L1TD1 speckles. This result is in agreement with the Western blot shown above in Figure R4B and indicates that the L1 ORF1p antibody does not recognize the L1TD1 protein. The contrast is comparable and after overexposure there are still L1TD1 specific speckles. This might be due to differences in abundance of the two proteins.

      (9) The choice of ARMC1 and YY2 is unclear. What are the criteria for the selection?

      ARMC1 was one of the top hits in a pilot RIP-seq experiment (IP versus input and IP versus  IgG IP). In the actual RIP-seq experiment with DKO HAP1 cells instead of IgG IP as a negative control, we found ARMC1 as an enriched hit, although it was not among the top 5 hits. The results from the 2nd RIP-seq further confirmed the validity of ARMC1 as an L1TD1-interacting transcript. YY2 was of potential biological relevance as an L1TD1 target due to the fact that it is a processed pseudogene originating from YY1 mRNA as a result of retrotransposition. This is mentioned on page 6 of the revised manuscript.

      (10) (P16) L1 is the only protein-coding transposon that is active in humans. This is perhaps too generalized of a statement as written. Other examples are readily found in the literature. Please clarify.  

      We will tone down this statement in the revised manuscript. 

      (11) In both the abstract and last sentence in the discussion section (P17), embryogenesis is mentioned, but this is not addressed at all in the manuscript. Please refrain from implying normal biological functions based on the results of this study unless appropriate samples are used to support them.

      Much of the published data on L1TD1 function are related to embryonic stem cells [3-7]. Therefore, it is important to discuss our findings in the context of previous reports.

      (12) Figure 3E: The format of Figures 1A and 3E are internally inconsistent. Please present similar data/images in a cohesive way throughout the manuscript.  

      We show now consistent IF Figures in the revised manuscript.

      Minor: 

      (1) Intro:           

      - Is L1Td1 in mice and Humans? How "conserved" is it and does this suggest function?  

      Murine and human L1TD1 proteins share 44% identity on the amino acid level and it was suggested that the corresponding genes were under positive selection during evolution with functions in transposon control and maintenance of pluripotency [8].  

      - Why HAP1? (Haploid?) The importance of this cell line is not clear.          

      HAP1 is a nearly haploid human cancer cell line derived from the KBM-7 chronic myelogenous leukemia (CML) cell line [9, 10]. Due to its haploidy is perfectly suited and widely used for loss-of-function screens and gene editing. After gene editing  cells can be used in the nearly haploid or in the diploid state. We usually perform all experiments with diploid HAP1 cell lines.  Importantly, in contrast to other human tumor cell lines, this cell line tolerates ablation of DNMT1. We have included a corresponding explanation in the revised manuscript on page 5, first paragraph.

      - Global methylation status in DNMT1 KO? (Methylations near L1 insertions, for example?) 

      The HAP1 DNMT1 KO cell line with a 20 bp deletion in exon 4 used in our study was validated in the study by Smits et al. [11]. The authors report a significant reduction in overall DNA methylation. However, we are not aware of a DNA methylome study on this cell line. We show now data on the methylation of L1 elements in HAP1 cells and upon DNMT1 deletion in the revised manuscript in Suppl. Figure S1B.

      (2) Figure 1:  

      - Figure 1C. Why is LMNB used instead of Actin (Fig1D)?  

      We show now beta-actin as loading control in the revised manuscript.  

      - Figure 1G shows increased Caspase 3 in KO, while the matching sentence in the result section skips over this. It might be more accurate to mention this and suggest that the single KO has perhaps an intermediate phenotype (Figure 1F shows a slight but not significant trend). 

      We fully agree with the reviewer and have changed the sentence on page 6, 2nd paragraph accordingly.  

      - Would 96 hrs trend closer to significance? An interpretation is that L1TD1 loss could speed up this negative consequence. 

      We thank the reviewer for the suggestion. We have performed a time course experiment with 6 biological replicas for each time point up to 96 hours and found significant changes in the viability upon loss of DNMT1 and again significant reduction in viability upon additional loss of L1TD1 (shown in Figure 1F). These data suggest that as expexted loss of DNMT1 leads to significant reduction viability and that additional ablation of L1TD1 further enhances this effect.

      - What are the "stringent conditions" used to remove non-specific binders and artifacts (negative control subtraction?) 

      Yes, we considered only hits from both analyses, L1TD1 IP in KO versus input and L1TD1 IP in KO versus L1TD1 IP in DKO. This is now explained in more detail in the revised manuscript on page 6, 3rd paragraph.  

      (3) Figure 2:  

      - Figure 2A is a bit too small to read when printed. 

      We have changed this in the revised manuscript.

      - Since WT and DKO lack detectable L1TD1, would you expect any difference in RIP-Seq results between these two?

      Due to the lack of DNMT1 and the resulting DNA hypomethylation, DKO cells are more similar to KO cells than WT cells with respect to the expressed transcripts.

      - Legend says selected dots are in green (it appears blue to me). 

      We have changed this in the revised manuscript.           

      - Would you recover L1 ORF1p and its binding partners in the KO? (Is the antibody specific in the absence of L1TD1 or can it recognize L1?) I noticed an increase in ORF1p in the KO in Figure 3C.  

      Thank you for the suggestion. Yes, L1 ORF1p shows slightly increased expression in the proteome analysis and we have marked the corresponding dot in the Volcano plot (Figure 3A).

      - Should the figure panel reference near the (Rosspopoff & Trono) reference instead be Sup S1C as well? Otherwise, I don't think S1C is mentioned at all. 

      - What are the red vs. green dots in 2D? Can you highlight ERV and ALU with different colors? 

      We added the reference to Suppl. Figure S1C (now S3C) in the revised manuscript. In Figure 2D L1 elements are highlighted in green, ERV elements in yellow, and other associated transposon transcripts in red.     

      - Which L1 subfamily from Figure 2D is represented in the qRT-PCR in 2E "LINE-1"? Do the primers match a specific L1 subfamily? If so, which? 

      We used primers specific for the human L1.2 subfamily. 

      - Pulling down SINE element transcripts makes some sense, as many insertions "borrow" L1 sequences for non-autonomous retro transposition, but can you speculate as to why ERVs are recovered? There should be essentially no overlap in sequence. 

      In the L1TD1 evolution paper [8], a potential link between L1TD1 and ERV elements was discussed: 

      "Alternatively, L1TD1 in sigmodonts could play a role in genome defense against another element active in these genomes. Indeed, the sigmodontine rodents have a highly active family of ERVs, the mysTR elements [46]. Expansion of this family preceded the death of L1s, but these elements are very active, with 3500 to 7000 species-specific insertions in the L1-extinct species examined [47]. This recent ERV amplification in Sigmodontinae contrasts with the megabats (where L1TD1 has been lost in many species); there are apparently no highly active DNA or RNA elements in megabats [48]. If L1TD1 can suppress retroelements other than L1s, this could explain why the gene is retained in sigmodontine rodents but not in megabats." 

      Furthermore, Jin et al. report the binding of L1TD1 to repetitive sequences in transcripts [12]. It is possible that some of these sequences are also present in ERV RNAs.

      - Is S2B a screenshot? (the red underline). 

      No, it is a Powerpoint figure, and we have removed the red underline.

      (4) Figure 3: 

      - Text refers to Figure 3B as a western blot. Figure 3B shows a volcano plot. This is likely 3C but would still be out of order (3A>3C>3B referencing). I think this error is repeated in the last result section. 

      - Figure and legends fail to mention what gene was used for ddCT method (actin, gapdh, etc.). 

      - In general, the supplemental legends feel underwritten and could benefit from additional explanations. (Main figures are appropriate but please double-check that all statistical tests have been mentioned correctly).

      Thank you for pointing this out. We have corrected these errors in the revised manuscript.

      (5) Discussion: 

      -Aluy connection is interesting. Is there an "Alu retrotransposition reporter assay" to test whether L1TD1 enhances this as well? 

      Thank you for the suggestion. There is indeed an Alu retrotransposition reporter assay reported be Dewannieux et al. [13]. The assay is based on a Neo selection marker. We have previously tested a Neo selection-based L1 retrotransposition reporter assay, but this system failed to properly work in HAP1 cells, therefore we switched to a blasticidinbased L1 retrotransposition reporter assay. A corresponding blasticidin-based Alu retrotransposition reporter assay might be interesting for future studies (mentioned in the Discussion, page 11 paragraph 4 of the revised manuscript.

      (6) Material and Methods       : 

      - The number of typos in the materials and methods is too numerous to list. Instead, please refer to the next section that broadly describes the issues seen throughout the manuscript. 

      Writing style  

      (1) Keep a consistent style throughout the manuscript: for example, L1 or LINE-1 (also L1 ORF1p or LINE-1 ORF1p); per or "/"; knockout or knock-out; min or minute; 3 times or three times; media or medium. Additionally, as TE naming conventions are not uniform, it is important to maintain internal consistency so as to not accidentally establish an imprecise version. 

      (2) There's a period between "et al" and the comma, and "et al." should be italic. 

      (3) The authors should explain what the key jargon is when it is first used in the manuscript, such as "retrotransposon" and "retrotransposition".    

      (4) The authors should show the full spelling of some acronyms when they use it for the first time, such as RNA Immunoprecipitation (RIP).  

      (5) Use a space between numbers and alphabets, such as 5 µg.  

      (6) 2.0 × 105 cells, that's not an "x".  

      (7) Numbers in the reference section are lacking (hard to parse).  

      (8) In general, there are a significant number of typos in this draft which at times becomes distracting. For example, (P3) Introduction: Yet, co-option of TEs thorough (not thorough, it should be through) evolution has created so-called domesticated genes beneficial to the gene network in a wide range of organisms. Please carefully revise the entire manuscript for these minor issues that collectively erode the quality of this submission.  

      Thank you for pointing out these mistakes. We have corrected them in the revised manuscript. A native speaker from our research group has carefully checked the paper. In summary, we have added Supplementary Figure S7C and have changed Figures 1C, 1E, 1F, 2A, 2D, 3A, 4B, S3A-D, S4B and S6A based on these comments. 

      REFERENCES

      (1) Beck, M.A., et al., DNA hypomethylation leads to cGAS-induced autoinflammation in the epidermis. EMBO J, 2021. 40(22): p. e108234.

      (2) Altenberger, C., et al., SPAG6 and L1TD1 are transcriptionally regulated by DNA methylation in non-small cell lung cancers. Mol Cancer, 2017. 16(1): p. 1.

      (3) Narva, E., et al., RNA-binding protein L1TD1 interacts with LIN28 via RNA and is required for human embryonic stem cell self-renewal and cancer cell proliferation. Stem Cells, 2012. 30(3): p. 452-60.

      (4) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024. 52(6): p. 3310-3326.

      (5) Emani, M.R., et al., The L1TD1 protein interactome reveals the importance of posttranscriptional regulation in human pluripotency. Stem Cell Reports, 2015. 4(3): p. 519-28.

      (6) Santos, M.C., et al., Embryonic Stem Cell-Related Protein L1TD1 Is Required for Cell Viability, Neurosphere Formation, and Chemoresistance in Medulloblastoma. Stem Cells Dev, 2015. 24(22): p. 2700-8.

      (7) Wong, R.C., et al., L1TD1 is a marker for undifferentiated human embryonic stem cells. PLoS One, 2011. 6(4): p. e19355.

      (8) McLaughlin, R.N., Jr., et al., Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency. PLoS Genet, 2014. 10(9): p. e1004531.

      (9) Andersson, B.S., et al., Ph-positive chronic myeloid leukemia with near-haploid conversion in vivo and establishment of a continuously growing cell line with similar cytogenetic pattern. Cancer Genet Cytogenet, 1987. 24(2): p. 335-43.

      (10) Carette, J.E., et al., Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature, 2011. 477(7364): p. 340-3.

      (11) Smits, A.H., et al., Biological plasticity rescues target activity in CRISPR knock outs. Nat Methods, 2019. 16(11): p. 1087-1093.

      (12) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024.

      (13) Dewannieux, M., C. Esnault, and T. Heidmann, LINE-mediated retrotransposition of marked Alu sequences. Nat Genet, 2003. 35(1): p. 41-8.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programing library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs. In addition, the documentation website provided by the authors is very informative, allowing users to get quickly started with Vermouth.

      Weaknesses:

      Although the Vermouth library is designed as a general tool for topology generation for molecular simulations, only its applications with MARTINI have been demonstrated in the current study. Thus, the claimed generality of Vermouth remains to be exmained. The authors may consider to point out this in their manuscript.

      In order to demonstrate generality of the here proposed concepts for generating topologies for molecular dynamics simulations, we have now implemented and tested a workflow that will produce topologies for the popular CHARMM36 all-atom force field. To facilitate generation of all-atom topologies with Martinize2 a .rtp reader was introduced, which allows users to provide .rtp files that are the native GROMACS topology files for proteins instead of .ff files. These .rtp files exist for all major atomic protein forcefields. In addition, for CHARMM36 we also included modification files, which describe non-standard pH amino acids, histidine tautomers, and end terminal modifications. Thus, the current implementation unlocks all features available at the CG Martini level also for CHARMM36. We note that users must add the modifications files for other all-atom force fields e.g. AMBER.

      We have added a new item in the main manuscript (p28) briefly describing this proof-of-concept implementation. However, we like to point out that there are many specialized tools for the various force fields adopted by the respective communities. Thus, an exhaustive discussion on the capabilities of Martinize2 for all-atom force fields seemed out of place.

      Reviewer #2 (Public Review):

      This work introduces a Vermouth library framework to enhance software development within the Martini community. Specifically, it presents a Vermouth-powered program, Martinize2, for generating coarse-grained structures and topologies from atomistic structures. In addition to introducing the Vermouth library and the Martinize2 program, this paper illustrates how Martinize2 identifies atoms, maps them to the Martini model, generates topology files, and identifies protonation states or post-translational modifications. Compared with the prior version, the authors provide a new figure to show that Martinize2 can be applied to various molecules, such as proteins, cofactors, and lipids. To demonstrate the general application, Martinize2 was used for converting 73% of 87,084 protein structures from the template library, with failed cases primarily blamed on missing coordinates.

      I was hoping to see some fundamental changes in the resubmitted version. To my disappointment, the manuscript remains largely unchanged (even the typo I pointed out previously was not fixed). I do not doubt that Martinize2 and Vermouth are useful to the Martini community, and this paper will have some impact. The manuscript is very technical and limited to the Martini community. The scientific insight for the general coarse-grained modeling community is unclear. The goal of the work is ambitious (such as high-throughput simulations and whole-cell modeling), but the results show just a validation of Martinize2. This version does not reverse my previous impression that it is incremental. As I pointed out in my previous review (and no response from the authors), all the issues associated with the Martini model are still there, e.g. the need for ENM. In this shape, I feel this manuscript is suitable for a specialized journal in computational biophysics or stays as part of the GitHub repository.

      We apologize for not fixing the typo; it was fixed but unfortunately got reintroduced in the final resubmitted version. We politely disagree that the goal of the work itself is high-throughput simulations and whole-cell modeling, but the Martinize2 tool is certainly an important element in our ambitions to achieve this. Given the broad interest in these goals by the modeling community in general, we believe this work has a much wider impact beyond the (already large) group of Martini users. Addressing limitations of the Martini model itself, which are certainly there, is clearly not the scope of the current work.

      Reviewer #3 (Public Review):

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications. After the revisions provided by the authors, I recommend minor revision.

      The authors have addressed most of my concerns provided previously. Specifically, showcasing the capability of coarse-graining other types of molecules (Figure 7) is a useful addition, especially for the booming field of therapeutic macrocycles. My only additional concern is that to justify Martinize2 and Vermouth as a "high-throughput" method, the speed of these tools needs to be addressed in some form in the manuscript as a guideline to users.

      We have added some benchmark timings in the manuscript SI and pointed to the data in the discussion part, which addresses the timing. Martinize2 is certainly slower than martinize version 1 as we already pointed out in the previous versions. However, even for larger proteins (> 2000 residues) we are able to generate topologies in about 60s. As Martinize2 runs on a single core, it can be massively parallelized. Keeping this in mind the topology file generation is likely to take up only a fraction in a high-throughput pipeline compared to the more costly simulations themselves.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reply to Public Reviews:

      Reply to Reviewer #1:

      This is a carefully performed and well-documented study to indicate that the FUS protein interacts with the GGGGCC repeat sequence in Drosophila fly models, and the mechanism appears to include modulating the repeat structure and mitigating RAN translation. They suggest FUS, as well as a number of other G-quadruplex binding RNA proteins, are RNA chaperones, meaning they can alter the structure of the expanded repeat sequence to modulate its biological activities.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are very happy to see the reviewer for highly appreciating our manuscript.

      1. Overall this is a nicely done study with nice quantitation. It remains somewhat unclear from the data and discussions in exactly what way the authors mean that FUS is an RNA chaperone: is FUS changing the structure of the repeat or does FUS binding prevent it from folding into alternative in vivo structure?

      Response: We appreciate the reviewer’s constructive comments. Indeed, we showed that FUS changes the higher-order structures of GGGGCC [G4C2] repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation in vivo. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #2:

      Fuijino et al. provide interesting data describing the RNA-binding protein, FUS, for its ability to bind the RNA produced from the hexanucleotide repeat expansion of GGGGCC (G4C2). This binding correlates with reductions in the production of toxic dipeptides and reductions in toxic phenotypes seen in (G4C2)30+ expressing Drosophila. Both FUS and G4C2 repeats of >25 are associated with ALS/FTD spectrum disorders. Thus, these data are important for increasing our understanding of potential interactions between multiple disease genes. However, further validation of some aspects of the provided data is needed, especially the expression data.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript and also for her/his important comments that helped to strengthen our manuscript.

      Some points to consider when reading the work:

      1. The broadly expressed GMR-GAL4 driver leads to variable tissue loss in different genotypes, potentially confounding downstream analyses dependent on viable tissue/mRNA levels.

      Response: We thank the reviewer for this constructive comment. In the RT-qPCR experiments (Figures 1E, 3C, 4G, 6D and Figure 1—figure supplement 1C), the amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue, to avoid potential confounding derived from the difference in tissue viability between genotypes, as the reviewer pointed out. To clarify this process, we have made the following change to the revised manuscript.

      (1) On page 30, line 548-550, the sentence “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts in the same sample” was changed to “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue to avoid potential confounding derived from the difference in tissue viability between genotypes”.

      2. The relationship between FUS and foci formation is unclear and should be interpreted carefully.

      Response: We appreciate the reviewer’s important comment. We apologize for the lack of clarity. We showed the relationship between FUS and RNA foci formation in our C9-ALS/FTD fly, that is, FUS suppresses RNA foci formation (Figures 3A and 3B), and knockdown of endogenous caz, a Drosophila homologue of FUS, enhanced it conversely (Figures 4E and 4F). We consider that FUS suppresses RNA foci formation through altering RNA structures and preventing aggregation of misfolded G4C2 repeat RNA as an RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #3:

      In this manuscript Fujino and colleagues used C9-ALS/FTD fly models to demonstrate that FUS modulates the structure of (G4C2) repeat RNA as an RNA chaperone, and regulates RAN translation, resulting in the suppression of neurodegeneration in C9-ALS/FTD. They also confirmed that FUS preferentially binds to and modulates the G-quadruplex structure of (G4C2) repeat RNA, followed by the suppression of RAN translation. The potential significance of these findings is high since C9ORF72 repeat expansion is the most common genetic cause of ALS/FTD, especially in Caucasian populations and the DPR proteins have been considered the major cause of the neurodegenerations.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are grateful to the reviewer for the insightful comments, which were very helpful for us to improve the manuscript.

      1. While the effect of RBP as an RNA chaperone on (G4C2) repeat expansion is supposed to be dose-dependent according to (G4C2)n RNA expression, the first experiment of the screening for RBPs in C9-ALS/FTD flies lacks this concept. It is uncertain if the RBPs of the groups "suppression (weak)" and "no effect" were less or no ability of RNA chaperone or if the expression of the RBP was not sufficient, and if the RBPs of the group "enhancement" exacerbated the toxicity derived from (G4C2)89 RNA or the expression of the RBP was excessive. The optimal dose of any RBPs that bind to (G4C2) repeats may be able to neutralize the toxicity without the reduction of (G4C2)n RNA.

      Response: We appreciate the reviewer’s constructive comments. We employed the site-directed transgenesis for the establishment of RBP fly lines, to ensure the equivalent expression levels of the inserted transgenes. We also evaluated the toxic effects of overexpressed RBPs themselves by crossbreeding with control EGFP flies, showing in Figure 1A. To clarify them, we have made the following changes to the revised manuscript.

      (1) On page 8, line 166-168, the sentence “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and their different roles in RNA metabolism.” was changed to “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and the different toxicity of overexpressed RBPs themselves.”.

      (2) On page 29, line 519-522, the sentence “By employing site-specific transgenesis using the pUASTattB vector, each transgene was inserted into the same locus of the genome, and was expected to be expressed at the equivalent levels.” was added.

      2. In relation to issue 1, the rescue effect of FUS on the fly expressing (G4C2)89 (FUS-4) in Figure 4-figure supplement 1 seems weaker than the other flies expressing both FUS and (G4C2)89 in Figure 1 and Figure 1-figure supplement 2. The expression level of both FUS protein and (G4C2)89 RNA in each line is important from the viewpoint of therapeutic strategy for C9-ALS/FTD.

      Response: We appreciate the reviewer’s important comment. The FUS-4 transgene is expected to be expressed at the equivalent level to the FUS-3 transgene, since they are inserted into the same locus of the genome by the site-directed transgenesis. Thus, we suppose that the weaker suppressive effect of FUS-4 coexpression on G4C2 repeat-induced eye degeneration can be attributed to the C-terminal FLAG tag that is fused to FUS protein expressed in FUS-4 fly line. Since the caz fly expresses caz protein also fused to FLAG tag at the C-terminus, we used this FUS-4 fly line to directly compare the effect of caz on G4C2 repeat-induced toxicity to that of FUS.

      3. While hallmarks of C9ORF72 are the presence of DPRs and the repeat-containing RNA foci, the loss of function of C9ORF72 is also considered to somehow contribute to neurodegeneration. It is unclear if FUS reduces not only the DPRs but also the protein expression of C9ORF72 itself.

      Response: We thank the reviewer for this comment. We agree that not only DPRs, but also toxic repeat RNA and the loss-of-function of C9ORF72 jointly contribute to the pathomechanisms of C9-ALS/FTD. Since Drosophila has no homolog corresponding to the human C9orf72 gene, the effect of FUS on C9orf72 expression cannot be assessed. Our fly models are useful for evaluating gain-of-toxic pathomechanisms such as RNA foci formation and RAN translation, and the association between FUS and loss-of function of C9ORF72 is beyond the scope of this study.

      4. In Figure 5E-F, it cannot be distinguished whether FUS binds to GGGGCC repeats or the 5' flanking region. The same experiment should be done by using FUS-RRMmut to elucidate whether FUS binding is the major mechanism for this translational control. Authors should show that FUS binding to long GGGGCC repeats is important for RAN translation.

      Response: We would like to thank the reviewer for these insightful comments. Following the reviewer’s suggestion, we perform in vitro translation assay again using FUS-RRMmut, which loses the binding ability to G4C2 repeat RNA as evident by the filter binding assay (Figure 5A), instead of BSA. The results are shown in the figures of Western blot analysis below. The addition of FUS to the translation system suppressed the expression levels of GA-Myc efficiently, whereas that of FUS-RRMmut did not. FUS decreased the expression level of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity. These results suggest that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA.

      Unfortunately, RAN translation from short G4C2 repeat RNA was not investigated in our translation system, although the previous study reported the low efficacy of RAN translation from short G4C2 repeat RNA (Green et al., 2017).

      Author response image 1.

      (A) Western blot analysis of the GA-Myc protein in the samples from in vitro translation. (B) Quantification of the GA-Myc protein levels.

      We have made the following changes to the revised manuscript.

      (1) Figure 5F was replaced to new Figures 5F and 5G.

      (2) On page 14-15, line 326-330, the sentence “Notably, the addition of FUS to this system decreased the expression level of GA-Myc in a dose-dependent manner, whereas the addition of the control bovine serum albumin (BSA) did not (Figure 5F).” was changed to “Notably, upon the addition to this translation system, FUS suppressed RAN translation efficiently, whereas FUS-RRMmut did not. FUS decreased the expression levels of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity (Figure 5F and 5G).”.

      (3) On page 15, line 330-332, the sentence “Taken together, these results indicate that FUS suppresses RAN translation from G4C2 repeat RNA in vitro as an RNA chaperone.” was changed to “Taken together, these results indicate that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA as an RNA chaperone.”.

      (4) On page 37, line 720-723, the sentence “For preparation of the FUS protein, the human FUS (WT) gene flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS.” was changed to “For preparation of the FUS proteins, the human FUS (WT) and FUS-RRMmut genes flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS and pUAST- FUS-RRMmut, respectively.”.

      (5) On page 41, line 816-819, the sentence “FUS or BSA at each concentration (10, 100, and 1,000 nM) was added for translation in the lysate.” was changed to “FUS or FUS-RRMmut at each concentration (10, 100, 200, 400, and 1,000 nM) was preincubated with mRNA for 10 min to facilitate the interaction between FUS protein and G4C2 repeat RNA, and added for translation in the lysate.”.

      5. It is not possible to conclude, as the authors have, that G-quadruplex-targeting RBPs are generally important for RAN translation (Figure 6), without showing whether RBPs that do not affect (G4C2)89 RNA levels lead to decreased DPR protein level or RNA foci.

      Response: We appreciate the reviewer’s critical comment. Following the suggestion by the reviewer, we evaluate the effect of these G-quadruplex-targeting RBPs on RAN translation. We additionally performed immunohistochemistry of the eye imaginal discs of fly larvae expressing (G4C2)89 and these G-quadruplex-targeting RBPs. As shown in the figures of immunohistochemistry below, we found that coexpression of EWSR1, DDX3X, DDX5, and DDX17 significantly decreased the number of poly(GA) aggregates. The results suggest that these G-quadruplex-targeting RBPs regulate RAN translation as well as FUS.

      Author response image 2.

      (A) Immunohistochemistry of poly(GA) in the eye imaginal discs of fly larvae expressing (G4C2)89 and the indicated G-quadruplex-targeting RBPs. (B) Quantification of the number of poly(GA) aggregates.

      We have made the following changes to the revised manuscript.

      (1) Figures 6E and 6F were added.

      (2) On page 6-7, line 135-137, the sentence “In addition, other G-quadruplex-targeting RBPs also suppressed G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.” was changed to “In addition, other G-quadruplex-targeting RBPs also suppressed RAN translation and G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.”.

      (3) On page 15, line 344-346, the sentence “As expected, these RBPs also decreased the number of poly(GA) aggregates in the eye imaginal discs (Figures 6E and 6F).” was added.

      (4) On page 15, line 346-347, the sentence “Their effects on G4C2 repeat-induced toxicity and repeat RNA expression were consistent with those of FUS.” was changed to “Their effects on G4C2 repeat-induced toxicity, repeat RNA expression, and RAN translation were consistent with those of FUS.”

      (5) On page 16, line 355-357, the sentence “Thus, some G-quadruplex-targeting RBPs regulate G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.” was changed to “Thus, some G-quadruplex-targeting RBPs regulate RAN translation and G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.”

      (6) On page 19, line 417-421, the sentence “We further found that G-quadruplex-targeting RNA helicases, including DDX3X, DDX5, and DDX17, which are known to bind to G4C2 repeat RNA (Cooper-Knock et al., 2014; Haeusler et al., 2014; Mori et al., 2013a; Xu et al., 2013), also alleviate G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.” was changed to “We further found that G-quadruplex-targeting RNA helicases, … ,also suppress RAN translation and G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.”.

      Reply to Recommendations For The Authors:

      1) It is not clear from the start that the flies they generated with the repeat have an artificial vs human intronic sequence ahead of the repeat. It would be nice if they presented somewhere the entire sequence of the insert. The reason being that it seems they also tested flies with the human intronic sequence, and the effect may not be as strong (line 234). In any case, in the future, with a new understanding of RAN translation, it would be nice to compare different transgenes, and so as much transparency as possible would be helpful regarding sequences. Can they include these data?

      Response: We thank the editors and reviewers for this comment. We apologize for the lack of clarity. We used artificially synthesized G4C2 repeat sequences when generating constructs for (G4C2)n transgenic flies, so these constructs do not contain human intronic sequence ahead of the G4C2 repeat in the C9orf72 gene, as explained in the Materials and Methods section. To clarify the difference between our C9-ALS/FTD fly models and LDS-(G4C2)44GR-GFP fly model (Goodman et al., 2019), we have made the following change to the revised manuscript.

      (1) Schema of the LDS-(G4C2)44GR-GFP construct was presented in Figure 3—figure supplement 1.

      Furthermore, to maintain transparency of the study, we have provided the entire sequence of the insert as the following source file.

      (2) The artificial sequences inserted in the pUAST vector for generation of the (G4C2)n flies were presented in Figure 1—figure supplement 1—source data 1.

      2) It is really nice how they quantitated everything and showed individual data points.

      Response: We thank the editors and reviewers for appreciating our data analysis method. All individual data points and statistical analyses are summarized in source data files.

      3) So when they call FUS an RNA chaperone, are they simply meaning it is changing the structure of the repeat, or could it just be interacting with the repeat to coat the repeat and prevent it from folding into whatever in vivo structures? Can they speculate on why some RNA chaperones lead to presumed decay of the repeat and others do not? Can they discuss these points in the discussion? Detailed mechanistic understanding of RNA chaperones that ultimately promote decay of the repeat might be of highly significant therapeutic benefit.

      Response: We appreciate these critical comments. Indeed, we showed that FUS changes the higher-order structures of G4C2 repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Besides these RNA chaperones, we observed the expression of IGF2BP1, hnRNPA2B1, DHX9, and DHX36 decreased G4C2 repeat RNA expression levels. In addition, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). We speculate these RBPs could be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA. To clarify these interpretations, we revised the manuscript as follows.

      (3) On page 18, line 392-398, the sentences “Similarly, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). Interestingly, these RBPs have been reported to be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA.” was added.

      4) What is the level of the G4C2 repeat when they knock down caz? Is it possible that knockdown impacts the expression level of the repeat? Can they show this (or did they and I miss it)?

      Response: We thank the editors and reviewers for this comment. The expression levels of G4C2 repeat RNA in (G4C2)89 flies were not altered by the knockdown of caz, as shown in Figure 4G.

      5) A puzzling point is that FUS is supposed to be nuclear, so where is FUS in the brain in their lines? They suggest it modulates RAN translation, and presumably, that is in the cytoplasm. Is FUS when overexpressed now in part in the cytoplasm? Is the repeat dragging it into the cytoplasm? Can they address this in the discussion? If FUS is never found in vivo in the cytoplasm, then it raises the point that the impact they find of FUS on RAN translation might not reflect an in vivo situation with normal levels of FUS.

      Response: We appreciate these important comments. We agree with the editors and reviewers that FUS is mainly localized in the nucleus. However, FUS is known as a nucleocytoplasmic shuttling RBP that can transport RNA into the cytoplasm. Indeed, FUS is reported to facilitate transport of actin-stabilizing protein mRNAs to function in the cytoplasm (Fujii et al., 2005). Thus, we consider that FUS binds to G4C2 repeat RNA in the cytoplasm and suppresses RAN translation in this study.

      6) When they are using 2 copies of the driver and repeat, are they also using 2 copies of FUS? These are quite high levels of transgenes.

      Response: We thank the editors and reviewers for this comment. We used only 1 copy of FUS when using 2 copies of GMR-Gal4 driver. Full genotypes of the fly lines used in all experiments are described in Supplementary file 1.

      7) In Figure5-S1, FUS colocalizing with (G4C2)RNA is not clear. High-magnification images are recommended.

      Response: We appreciate this constructive comment on the figure. Following the suggestion, high-magnification images are added in Figure 5—figure supplement 1.

      8) I also suggest that the last sentence of the Discussion be revised as follows: Thus, our findings contribute not only to the elucidation of C9-ALS/FTD, but also to the elucidation of the repeat-associated pathogenic mechanisms underlying a broader range of neurodegenerative and neuropsychiatric disorders than previously thought, and it will advance the development of potential therapies for these diseases.

      Response: We appreciate this recommendation. We have made the following change based on the suggested sentence.

      (1) On page 20-21, line 455-459, “Thus, our findings contribute not only towards the elucidation of repeat-associated pathogenic mechanisms underlying a wider range of neuropsychiatric diseases than previously thought, but also towards the development of potential therapies for these diseases.” was changed to “Thus, our findings contribute to the elucidation of the repeat-associated pathogenic mechanisms underlying not only C9-ALS/FTD, but also a broader range of neuromuscular and neuropsychiatric diseases than previously thought, and will advance the development of potential therapies for these diseases.”.

      Authors’ comment on previous eLife assessment:

      We thank the editors and reviewers for appreciating our study. We mainly evaluated the function of human FUS protein on RAN translation and G4C2 repeat-induced toxicity using Drosophila expressing human FUS in vivo, and the recombinant human FUS protein in vitro. To validate that FUS functions as an endogenous regulator of RAN translation, we additionally evaluated the function of Drosophila caz protein as well. We are afraid that the first sentence of the eLife assessment, that is, “This important study demonstrates that the Drosophila FUS protein, the human homolog of which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …” is somewhat misleading. We would be happy if you modify this sentence like “This important study demonstrates that the human FUS protein, which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …”.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses. (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Author response:

      Public Review

      Joint Public Review:

      This manuscript presents an algorithm for identifying network topologies that exhibit a desired qualitative behaviour, with a particular focus on oscillations. The approach is first demonstrated on 3-node networks, where results can be validated through exhaustive search, and then extended to 5-node networks, where the search space becomes intractable. Network topologies are represented as directed graphs, and their dynamical behaviour is classified using stochastic simulations based on the Gillespie algorithm. To efficiently explore the large design space, the authors employ reinforcement learning via Monte Carlo Tree Search (MCTS), framing circuit design as a sequential decision-making process.

      This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.

      Strengths

      The evidence presented is strong and compelling. The authors validate their results for 3-node networks through exhaustive search, and the findings for 5-node networks are consistent with previously reported motifs, lending credibility to the approach. The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.

      Weaknesses

      The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters. However, in many cases, this persistence is due to the presence of partially redundant oscillatory motifs within the network. While this observation is interesting and of clear value for circuit design, framing it as evidence of evolutionary robustness may be misleading. The "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.

      A secondary limitation is that, despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.

      Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.

      We thank the reviewers for their careful consideration of our work and for the interesting feedback and scientific discussion. We are working on a revised text based on their recommendations, which will include some of the discussion below.

      This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.

      We thank the reviewers for their positive assessment of our work’s impact!

      The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.

      We appreciate these kind comments about our work’s merits. We are excited to share our reinforcement learning (RL) based method with the fields of systems and synthetic biology, and we consider it a valuable tool for the systematic analysis and design of larger-scale regulatory networks!

      The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters… [However, these] "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.

      We thank the reviewers for their attention to this point. In the large-scale circuit search, summarized in Figures 4A and 4B, we ran a search for 5-component oscillators that can spontaneously oscillate even when subjected to the deletion of a random gene. Some of the best performing circuits under these conditions exhibited a design feature we call “motif multiplexing,” in which multiple smaller motifs are interleaved in a way that makes oscillation possible under many different mutational scenarios. Interestingly, despite not selecting for preservation of frequency, the 3Ai+3Rep circuit (a 5-gene circuit highlighted in Figure 5) anecdotally appears to have a natural frequency that is robust to partial gene knockdowns, although not to complete gene deletions. As shown in Figure 5C, this circuit has a natural frequency of 6 cycles/hr (with one particular parameterization), and it can sustain a knockdown of any of its 5 genes to 50% of the wild-type transcription rate without altering the natural frequency by more than 20%.

      However, we agree that there are salient differences between this training scenario and natural evolution. The revised text will clarify that these differences limit what conclusions can be drawn about biological evolution by analogy. As the reviewers point out, we use the presence of spontaneous oscillations (with or without the deletion) as a measure of fitness, regardless of frequency, so as to screen for designs with promising behavior. Also, the deletion mutations introduced during training likely represent larger perturbations to the system than a typical mutation encountered during genome replication (for example, a point mutation in a response element leading to a moderate change in binding affinity). Finally, we do not introduce any entrainment. Real circadian oscillators are aligned to a 24-hour period (“entrained”) by environmental inputs such as light and temperature. For this reason, natural circadian clocks may have natural frequencies that are slightly shorter or longer than 24 hours, although a close proximity to the 24-hour period does seem to be an important selective factor [1].

      ...despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.

      We thank the reviewers for their attention to this point. The main limitation we encountered to exploring circuits with more than 5 nodes in this work was the poor computational scaling of the Gillespie stochastic simulation algorithm, rather than a limitation of MCTS itself. While the average runtime of a 3-node circuit simulation was roughly 7 seconds, this number increased to 18-20 seconds with 5-node circuits. For this reason, we limited the search to topologies with ≤15 interaction arrows (15 sec/simulation). In general, the simulation time was proportional to the square of the number of transcription factors (TFs). We will revise the text to include the reason for stopping at 5 nodes, which is significant for understanding CircuiTree’s scaling properties.

      With regards to scaling, an important advantage of CircuiTree is its ability to generate useful candidate designs after exploring only a portion of the search space. Like exhaustive search, given enough time, MCTS will comprehensively explore the search space and find all possible solutions. However, for large search spaces, RL-based agents are generally given a finite number of simulations (or time) to learn as much as possible.

      Across machine learning (ML) applications [2] and particularly with RL models [3], this training time tends to obey a power law with respect to the underlying complexity of the problem. Thus we can use the complexity of the 3-node and 5-node searches to infer the current scaling limits of CircuiTree. The first oscillator topology was discovered after 2,280 simulations for the 3-node search, and in the 5-node search, the first oscillator using 5 nodes appeared at ~8e5 simulations, resulting in a power law of Y ~ 84.4 X<sup>0.333</sup>. Thus, useful candidate designs may be found for 6-node and 7-node searches after 4.5e7 and 5.26e9 simulations, respectively, even though these spaces contain 1.5e17 and 2.5e23 topologies, respectively. Thus, running a 7-node search with the current implementation of CircuiTree would require resources close to the current boundaries of computation, requiring roughly 1.8 million CPU-hours, or 2 weeks on 5,000 CPUs, assuming a 1-second simulation. These points will be incorporated into both the results and discussion sections in our revised text.

      However, we are optimistic about CircuiTree’s potential to scale to much larger circuits with modifications to its algorithm. CircuiTree uses the original (so-called “vanilla”) implementation of MCTS, which has not been used in professional game-playing AIs in over a decade. Contemporary RL-based game-playing engines leverage deep neural networks to dramatically reduce the training time, using value networks to identify game-winning positions and policy networks to find game-winning moves. AlphaZero, developed by Google DeepMind to learn games by self-play and without domain knowledge, outperformed all other chess AIs after 44 million training games, much smaller than the 10^43 possible chess states [4]. Similarly, the game of go has 10<sup>170</sup> possible states, but AlphaZero outperformed other AIs after only 140 million games [4]. Large circuits live in similarly large search spaces; for example, 19-node and 20-node circuits represent spaces of 10<sup>172</sup> and 10<sup>190</sup> possible topologies. The revised text will include this discussion and identify value and policy networks, as well as more scalable simulation paradigms such as ODEs and neural ODEs, as our future directions for improving CircuiTree’s scalability.

      Finally, our revised discussion will note some important differences between game-playing and biological circuit design. Unlike deterministic games like chess, the final value of a circuit topology is determined stochastically, by running a simulation whose fitness depends on the parameter set and initial conditions. Thus, state-for-state, it is possible that training an agent for circuit design may inherently require more simulations to achieve the same level of certainty compared to classical games. Additionally, while we often possess a priori knowledge about a game such as its overall difficulty or certain known strategies, we lack this frame of reference when searching for circuit designs. Thus, it remains challenging to know if and when a large space of designs has been “satisfactorily” or “comprehensively” searched, since the answer depends on data that are unknown, namely the quantity, quality, and location of solutions residing in the search space.

      Not accounting for redundancy due to structural symmetries

      Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.

      In our dynamical modeling of transcription factor (TF) networks, we do not rely on continuum assumptions about promoter occupancy such as Hill functions. Rather, we model each reaction - transcription, translation, TF binding/unbinding, and degradation - explicitly, and individual molecules appear and disappear via stochastic birth and death events. Many natural TFs are homodimers that bind cooperatively to regulate transcription; similarly, we assume that pairs of TFs bind more stably to their response element than individual TFs. Thus, our model has similar cooperativity to a Hill function, and it can be shown that in the continuum limit, the effective Hill coefficient is always ≤2. Our revision will clarify this aspect of the modeling and include a derivation of this property. Currently, the parameter values used in the figures are shown in Table 2. In the revised text, these will be displayed in the body of the text as well for clarity.

      Bibliography (1) Spoelstra, K., Wikelski, M., Daan, S., Loudon, A. S. I., & Hau, M. (2015). Natural selection against a circadian clock gene mutation in mice. PNAS, 113(3), 686–691. https://doi.org/https://doi.org/10.1073/pnas.1516442113<br /> (2) Neumann, O., & Gros, C. (2023). Scaling Laws for a Multi-Agent Reinforcement Learning Model. The Eleventh International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=ZrEbzL9eQ3W (3) Jones, A. L. (2021). Scaling Scaling Laws with Board Games. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2104.03113 (4) Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that Masters Chess, Shogi, and go through self-play. Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

      We appreciate the Editorial assessment on our paper’s strengths and novelty. We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning. Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these socalled micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      We have previously showed that neural replay of MEG activity representing the practiced skill was prominent during rest intervals of early learning, and that the replay density correlated with micro-offline gains (Buch et al., 2021). These findings are consistent with recent reports (from two different research groups) that hippocampal ripple density increases during these inter-practice rest periods, and predict offline learning gains (Chen et al., 2024; Sjøgård et al., 2024). However, decoder performance in our earlier work (Buch et al., 2021) left room for improvement. Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses:

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions.

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while online monitoring of head position was not performed for this study, it was assessed at the beginning and at the end of each recording. The head was restrained with an inflatable air bladder, and head movement between the beginning and end of each scan did not exceed 5mm for all participants included in the study.

      The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. We agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. However, such correlations between small head movements and finger movements could only meaningfully contribute to decoding performance if: (A) they were consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) they systematically varied between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is unlikely. Alternatively, for this task design a much more likely confound could be the contribution of eye movement artefacts to the decoder performance (an issue raised by Reviewer #3 in the comments below).

      Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may generate eye movements that are systematically related to the task. Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (triggered by a KeyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts) (end of figure legend).

      Remember that the task display does not provide explicit feedback related to performance, only information about the present position in the sequence. Thus, it is possible that participants did not actively attend to the feedback. In fact, inspection of the eye position data revealed that on majority of trials, participants displayed random-walk-like gaze patterns around a central fixation point located near the center of the screen. Thus, participants did not attend to the asterisk position on the display, but instead intrinsically generated the action sequence. A similar realworld example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks) as provided in the study task – feedback which is typically ignored by the user.

      The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued. The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals.(Buch et al., 2021; Classen et al., 1998; Karni et al., 1995; Kleim et al., 1998) Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known. Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported (Doyon et al., 2002; Grafton et al., 1992; Hardwick et al., 2013; Kennerley et al., 2004; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001), and appears to be even more prominent during early fine motor skill learning in the non-dominant hand (Lee et al., 2019; Sawamura et al., 2019). The frontal regions identified in these studies are known to play crucial roles in executive control (Battaglia-Mayer & Caminiti, 2019), motor planning (Toni, Thoenissen, et al., 2001), and working memory (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998) processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998), in addition to working memory (Grover et al., 2022). Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task. We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular for the following reasons. First, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications (Srinivas et al., 2016). One could also view this hybrid-space decoding approach as a spatial analogue to common timefrequency based analyses such as theta-gamma phase amplitude coupling (θ/γ PAC), which assess interactions between two or more narrow-band spectral features derived from the same time-series data (Lisman & Jensen, 2013).

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (Hybrid<sub>Alt</sub>) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (Hybrid<sub>Orig</sub>). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± 7.03% SD for Hybrid<sub>Orig</sub> vs. 75.49% ± 7.17% for Hybrid<sub>Alt</sub>; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04; Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. Hybrid<sub>Alt</sub>: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. Hybrid<sub>Orig</sub>: Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that Hybrid<sub>Orig</sub> (the approach used in our manuscript) significantly outperforms the Hybrid<sub>Alt</sub> approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns (end of figure legend).

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen.

      We agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated, an important confound in connectivity analyses (Colclough et al., 2015; Colclough et al., 2016), not performed in our investigation.

      In our study, correlations between adjacent voxels effectively reduce the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. – the rank is greater than 1), the intra-parcel spatial patterns could meaningfully contribute to the decoder performance, as shown by the following results:

      First, we obtained higher decoding accuracy with voxel-space features (74.51% ± 7.34% SD) compared to parcel space features (68.77% ± 7.6%; Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel space features. Second, individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding shows that correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside within.

      Author response image 3.:

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding (end of figure legend).

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment.

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics (Bansal et al., 2011; Mollazadeh et al., 2011) muscle activation patterns (Flint et al., 2012) and temporal sequencing (Churchland et al., 2012) during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies) (Heusser et al., 2016). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions".

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans assessed changes in functional connectivity patterns while participants performed a similar sequence learning task to our present study (Bassett et al., 2011). Using a dynamic network analysis approach, Bassett et al. showed that flexibility in the composition of individual network modules (i.e. – changes in functional brain region membership of orthogonal brain networks) is up-regulated in novel learning environments and explains differences in learning rates across individuals. Thus, consistent with our findings, it is likely that functional brain networks rapidly reconfigure during early learning of novel sequential motor skills.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning (Albouy et al., 2013; Albouy et al., 2012). For example, reactivation events in the posterior parietal (Qin et al., 1997) and medial prefrontal (Euston et al., 2007; Molle & Born, 2009) cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains (Frankland & Bontempi, 2005), including motor sequence learning (Albouy et al., 2015; Buch et al., 2021; F. Jacobacci et al., 2020). Further, synchronized interactions between MPFC and hippocampus are more prominent during early as opposed to later learning stages (Albouy et al., 2013; Gais et al., 2007; Sterpenich et al., 2009), perhaps reflecting “redistribution of hippocampal memories to MPFC” (Albouy et al., 2013). MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning (Euston et al., 2012). Consistently, coupling between hippocampus and MPFC has been shown during initial memory encoding and during subsequent rest (van Kesteren et al., 2010; van Kesteren et al., 2012). Importantly, MPFC activity during initial memory encoding predicts subsequent recall (Wagner et al., 1998). Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” (Albouy et al., 2012), also engaged in the development of an abstract representation of the sequence (Ashe et al., 2006). In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012) required during early learning (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012). The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice (Schendan et al., 2003), all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding (Morris, 2006; Tse et al., 2007). Thus, several prefrontal and frontoparietal regions contributing to long term learning (Berlot et al., 2020) are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning. We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here.

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power (Bonstrup et al., 2019) and neural replay density (Buch et al., 2021) during inter-practice rest periods) to observed micro-offline gains.

      Reviewer #2 (Public review):

      Summary

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond.

      Strengths

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea.

      Weaknesses

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation.

      We now include a new control analysis that addresses this issue as well as additional re-examination of previously reported results with respect to this issue – all of which are inconsistent with this alternative explanation that “contextualization” reflects a change in mixing of keypress related MEG features as opposed to a change in the underlying representations themselves. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the “4-4” transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization related changes to the underlying neural representations.

      We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis in the revised manuscript.

      We also re-examined our previously reported classification results with respect to this issue. We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization.

      Based upon the increased overlap between adjacent index finger keypresses (i.e. – “4-4” transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features.

      In summary, both re-examination of previously reported data and new control analyses all converged on the idea that the proximity between keypresses does not explain contextualization.

      We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study.

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3 — figure supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans. This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself.

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the KeyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses. We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the KeyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder. Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the KeyDown event (t<sub>0</sub> = 0 ms). We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window. Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study. Future work in our lab, as pointed out above, are investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well.

      The Reviewer suggests that the current data is not enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last Index<sub>OP5</sub> and first Index<sub>OP1</sub> from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Figure 5 – figure supplement 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest periods.

      With respect to the second concern, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the original manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out. When quantifying online changes in contextualization from the first Index<sub>OP1</sub> the last Index<sub>OP5</sub> keypress in the same trial we observed no learning-related trend (Figure 5 – figure supplement 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Figure 5 – figure supplement 6).

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals.

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multiscale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter).

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?).

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes – 1; e.g. – 3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space. We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for).

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above. We agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above reply to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would miss most learning effects on a task in which speed is the main learning metrics.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial is pre-planned before the first keypress is performed. This occurs in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes. The Reviewer is concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. Please, note that since neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence (Kornysheva et al., 2019), mixing effects are most likely present also for the first keypress in a trial.

      Separately, the Reviewer suggests that contextualization during early learning may reflect preplanning or online planning. This is an interesting proposal. Given the decoding time-window used in this investigation, we cannot dissect separate contributions of planning, memory and sensory feedback to contextualization. Taking advantage of the superior temporal resolution of MEG relative to fMRI tools, work under way in our lab is investigating decoding time-windows more appropriate to address each of these questions.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice). It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable.

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualizaton effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that most participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.

      The minimal participant engagement with the visual display in this explicit sequence learning motor task (which is highly generative in nature) contrasts markedly with behavior observed when reactive responses to stimulus cues are needed in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when comparing findings across studies using the two sequence learning tasks.

      The authors report a significant correlation between "offline differentiation" and cumulative microoffline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"?

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differentiation” vs micro-online gains, (2) “online differentiation” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Figure 5 – figure supplement  4, 5 and 6). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      We disagree with this statement. The original (Bonstrup et al., 2019) paper clearly states that micro-offline gains do not necessarily reflect offline learning in some cases and must be carefully interpreted based upon the behavioral context within which they are observed. Further, the paper lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning. In fact, the excellent meta-analysis of (Pan & Rickard, 2015), which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study (Bonstrup et al., 2019), as well as in all our subsequent work. Pan & Rickard state:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943 . It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks(Brawn et al., 2010; Rickard et al., 2008 . Rickard, Cai, Rieth, Jones, and Ard (2008 and Brawn, Fenn, Nusbaum, and Margoliash (2010 (Brawn et al., 2010; Rickard et al., 2008 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008 massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard make several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They state:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead (Pan & Rickard, 2015 . One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead (Pan & Rickard, 2015 . That design appears sufficient to eliminate at least the majority of the reactive inhibition effect (Brawn et al., 2010; Rickard et al., 2008 .”

      We mindfully incorporated recommendations from (Pan & Rickard, 2015) into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects.

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.” The initial (Bonstrup et al., 2019) report was followed up by a large online crowd-sourcing study (Bonstrup et al., 2020). This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 4 below for further details on these conditions).

      Author response image 4.

      This Figure shows that micro-offline gains o ser ed in learning and nonlearning contexts are attri uted to different underl ing causes. Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from (Bonstrup et al., 2019). During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also (Bonstrup et al., 2020)). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature (Brooks et al., 2024; Gupta & Rickard, 2022; Florencia Jacobacci et al., 2020), argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning. The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds (end of Fig legend).

      Evidence documented in that paper (Bonstrup et al., 2020) showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118); 3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) (Bonstrup et al., 2020). Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve (Pan & Rickard, 2015) refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects (Buch et al., 2021). Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study (Buch et al., 2021)) linked to micro-offline gains during early skill learning. These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice (Deleglise et al., 2023). Crucial to this point, Chen et al. (2024) and Sjøgård et al (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple density during rest periods (which are known markers for neural replay (Buzsaki, 2015)) in the human hippocampus (80-120 Hz) to micro-offline gains during early skill learning.

      Thus, there is now substantial converging evidence in humans across different indirect noninvasive and direct invasive recording techniques linking hippocampal activity, neural replay dynamics and offline performance gains in skill learning.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024).

      The recent work of (Gupta & Rickard, 2022, 2024) does not present any data that directly opposes our finding that early skill learning (Bonstrup et al., 2019) is expressed as micro-offline gains during rest breaks. These studies are an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) experimental design to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.

      To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning trials (only at retest 5 min later). Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods than early learning. In fact, we reported the same findings for trials following the early learning period in our original 2019 paper (Bonstrup et al., 2019) (Author response image 4). Please, note that we also reported that cumulative microoffline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later (Bonstrup et al., 2019) (see the Results section and further elaboration in the Discussion). We interpreted these findings as indicative that the mechanisms underlying offline gains over the micro-scale of seconds during early skill learning versus over minutes or hours very likely differ.

      In the recent preprint from (Das et al., 2024), the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data. The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”. The study utilizes a spaced vs. massed practice groups between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis.

      Crucially, their design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024). A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 5):

      Author response image 5.

      This figure shows (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original (Bonstrup et al., 2019) paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) (gaps in the red shaded area) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report (Bonstrup et al., 2019) (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) (Bonstrup et al., 2019) is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range (end of figure legend).

      Participants in the original (Bonstrup et al., 2019) experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 5). Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.

      In addition, the training interventions (i.e. – the practice schedule differences between the Spaced and Massed groups) were designed in a manner that minimized any chance of effectively testing their hypothesis. First, the interventions were applied over an extremely short period relative to the length of the total training session (5% and 12% of the total training session for Massed and Spaced groups, respectively; see gaps in the red shaded area in Author response image 5). Second, the intervention was applied during a period in which only half of the known total learning occurs. Specifically, we know from Bönstrup et al. (2019) that only 46.57% of the total performance gains occur in the practice interval covered by Das et al Training 1 intervention. Thus, early skill learning as evaluated by multiple groups (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024), is in the Das et al experiment amputated to about half.

      Furthermore, a substantial amount of learning takes place during Das et al’s Test 1 and Test 2 periods (32.49% of total gains combined). The fact that substantial learning is known to occur over both the Test 1 (18.06%) and Test 2 (14.43%) intervals presents a fundamental problem described by Pan and Rickard (Pan & Rickard, 2015). They reported that averaging over intervals where substantial performance gains occur (i.e. – performance is not stable) inject crucial artefacts into analyses of skill learning:

      “A large amount of averaging has the advantage of yielding more precise estimates of each subject’s pretest and posttest scores and hence more statistical power to detect a performance gain. However, calculation of gain scores using that strategy runs the risk that learning that occurs during the pretest and (or posttest periods (i.e., online learning is incorporated into the gain score (Rickard et al., 2008; Robertson et al., 2004 .”

      The above statement indicates that the Test 1 and Test 2 performance scores from Das et al. (2024) are substantially contaminated by the learning rate within these intervals. This is particularly problematic if the intervention design results in different Test 2 learning rates between the two groups. This in fact, is apparent in their data (Figure 1C,E of the Das et al., 2024 preprint) as the Test 2 learning rate for the Spaced group is negative (indicating a unique interference effect observable only for this group). Specifically, the Massed group continues to show an increase in performance during Test 2 and 4 relative to the last 10 seconds of practice during Training 1 and 2, respectively, while the Spaced group displays a marked decrease. This post-training performance decrease for the Spaced group is in stark contrast to the monotonic performance increases observed for both groups at all other time-points. One possible cause could be related to the structure of the Test intervals, which include 20 seconds of uninterrupted practice. For the Spaced group, this effectively is a switch to a Massed practice environment (i.e., two 10-secondlong practice trials merged into one long trial), which interferes with greater Training 1 interval gains observed for the Space group. Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (Figure 1E) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      In summary, the experimental design and analyses used by Das et al does not contradict the view that early skill learning is expressed as micro-offline gains during rest breaks. The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized (Bonstrup et al., 2019; Pan & Rickard, 2015). Extrapolation of this current framework to postplateau performance periods, longer timespans, or non-learning situations (e.g. – the Nonrepeating groups from Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found Figure 2B too small to be useful, as the actual elements of the cells are very hard to read.

      We have removed the grid colormap panel (top-right) from Figure 2B. All of this colormap data is actually a subset of data presented in Figure 2 – figure supplement 1, so can still be found there.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to the first point in my concerns, I would suggest the authors compare decoding accuracy between correct presses followed by correct vs. incorrect presses. This would clarify if the decoder is actually taking the MEG signal for subsequent press into account. I would also suggest the authors use pre-movement MEG features and post-movement features with shorter windows and compare each result with the results for the original post-movement MEG feature with a longer window.

      The present study does not contain enough errors to perform the analysis proposed by the Reviewer. As noted above, we did re-examine our data and now report a new control regression analysis, all of which indicate that the proximity between keypresses does not explain contextualization effects.

      (2) I was several times confused by the author's use of "neural representation of an action" or "sequence action representations" in understanding whether these terms refer to representation on the level of whole-brain, region (as defined by the specific parcellation used), or voxels. In fact, what is submitted to the decoder is some complicated whole-brain MEG feature (i.e., the "neural representation"), which is a hybrid of voxel and parcel features that is further dimension-reduced and not immediately interpretable. Clarifying this point early in the text and possibly using some more sensible terms, such as adding "brain-wise" before the "sequence action representation", would be the most helpful for the readers.

      We now clarified this terminology in the revised manuscript.

      (3) Although comparing many different ways in feature selection/reduction, time window selection, and decoder types is undoubtedly a meticulous work, the current version of the manuscript seems still lacking some explanation about the details of these methodological choices, like which decoding method was actually used to report the accuracy, whether or not different decoding methods were chosen for individual participants' data, how training data was selected (is it all of the correct presses in Day 1 data?), whether the frequency power or signal amplitude was used, and so on. I would highly appreciate these additional details in the Methods section.

      The reported accuracies were based on linear discriminant analysis classifier. A comparison of different decoders (Figure 3 – figure supplement 4) shows LDA was the optimal choice.

      Whether or not different decoding methods were chosen for individual participants' data

      We selected the same decoder (LDA) performance to report the final accuracy.

      How training data was selected (is it all of the correct presses in Day 1 data?),

      Decoder training was conducted as a randomized split of the data (all correct keypresses of Day 1) into training (90%) and test (10%) samples for 8 iterations.

      Whether the frequency power or signal amplitude was used

      Signal amplitude was used for feature calculation.

      (4) In terms of the Methods, please consider adding some references about the 'F1 score', the 'feature importance score,' and the 'MRMR-based feature ranking,' as the main readers of the current paper would not be from the machine learning community. Also, why did the LDA dimensionality reduction reduce accuracy specifically for the voxel feature?

      We have now added the following statements to the Methods section that provide more detailed descriptions and references for these metrics:

      “The F1 score, defined as the harmonic mean of the precision (percentage of true predictions that are actually true positive) and recall (percentage of true positives that were correctly predicted as true) scores, was used as a comprehensive metric for all one-versus-all keypress state decoders to assess class-wise performance that accounts for both false-positive and false-negative prediction tendencies [REF]. A weighted mean F1 score was then computed across all classes to assess the overall prediction performance of the multi-class model.”

      and

      “Feature Importance Scores

      The relative contribution of source-space voxels and parcels to decoding performance (i.e. – feature importance score) was calculated using minimum redundant maximum relevance (MRMR) and highlighted in topography plots. MRMR, an approach that combines both relevance and redundancy metrics, ranked individual features based upon their significance to the target variable (i.e. – keypress state identity) prediction accuracy and their non-redundancy with other features.”

      As stated in the Reviewer responses above, the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. – 3 dimensions for 4-class keypress decoding). It is likely that the reduction in accuracy observed only for the voxel-space feature was due to the loss of relevant information during the mapping process that resulted in reduced accuracy. This reduction in accuracy for voxel-space decoding was specific to LDA. Figure 3—figure supplement 3 shows that voxel-space decoder performance actually improved when utilizing alternative dimensionality reduction techniques.

      (5) Paragraph 9, lines #139-142: "Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations."

      This does not seem to be a fair comparison, as the index finger appears twice as many as the other fingers do in the sequence. To claim this, proper statistical analysis needs to be done taking this difference into account.

      We thank the Reviewer for bringing this issue to our attention. We have now corrected this comparison to evaluate relative false negative and false positive rates between individual keypress state decoders, and have revised this statement in the manuscript as follows:

      “Notably, decoding of index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest false negative (0.116 per keypress) and false positive (0.043 per keypress) misclassification rates compared with all other digits (false negative rate range = [0.067 0.114]; false positive rate range = [0.020 0.037]; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed within different contexts (i.e. - different learning states or sequence locations).”

      (6) Finally, the authors could consider acknowledging in the Discussion that the contribution of micro-offline learning to genuine skill learning is still under debate (e.g., Gupta and Rickard, 2023; 2024; Das et al., bioRxiv, 2024).

      We have added a paragraph in the Discussion that addresses this point.

      Reviewer #3 (Recommendations for the authors):

      In addition to the additional analyses suggested in the public review, I have the following suggestions/questions:

      (1) Given that the authors introduce a new decoding approach, it would be very helpful for readers to see a distribution of window sizes and window onsets eventually used across individuals, at least for the optimized decoder.

      We have now included a new supplemental figure (Figure 4 – figure Supplement 2) that provides this information.

      (2) Please explain in detail how you arrived at the (interpolated?) group-level plot shown in Figure 1B, starting from the discrete single-trial keypress transition times. Also, please specify what the shading shows.

      Instantaneous correct sequence speed (skill measure) was quantified as the inverse of time (in seconds) required to complete a single iteration of a correctly generated full 5-item sequence. Individual keypress responses were labeled as members of correct sequences if they occurred within a 5-item response pattern matching any possible circular shifts of the 5-item sequence displayed on the monitor (41324). This approach allowed us to quantify a measure of skill within each practice trial at the resolution of individual keypresses. The dark line indicates the group mean performance dynamics for each trial. The shaded region indicates the 95% confidence limit of the mean (see Methods).

      (3) Similarly, please explain how you arrived at the group-level plot shown in Figure 1C. What are the different colored lines (rows) within each trial? How exactly did the authors reach the conclusion that KTT variability stabilizes by trial 6?

      Figure 1C provides additional information to the correct sequence speed measure above, as it also tracks individual transition speed composition over learning. Figure 1C, thus, represents both changes in overall correct sequence speed dynamics (indicated by the overall narrowing of the horizontal speed lines moving from top to bottom) and the underlying composition of the individual transition patterns within and across trials. The coloring of the lines is a shading convention used to discriminate between different keypress transitions. These curves were sampled with 1ms resolution, as in Figure 1B. Addressing the underlying keypress transition patterns requires within-subject normalization before averaging across subjects. The distribution of KTTs was normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning.

      (4) Maybe I missed it, but it was not clear to me which of the tested classifiers was eventually used. Or was that individualized as well? More generally, a comparison of the different classifiers would be helpful, similar to the comparison of dimension reduction techniques.

      We have now included a new supplemental figure that provides this information.

      (5) Please add df and effect sizes to all statistics.

      Done.

      (6) Please explain in more detail your power calculation.

      The study was powered to determine the minimum sample size needed to detect a significant change in skill performance following training using a one-sample t-test (two-sided; alpha = 0.05; 95% statistical power; Cohen’s D effect size = 0.8115 calculated from previously acquired data in our lab). The calculated minimum sample size was 22. The included study sample size (n = 27) exceeded this minimum.

      This information is now included in the revised manuscript.

      (7) The cut-off for the high-pass filter is unusually high and seems risky in terms of potential signal distortions (de Cheveigne, Neuron 2019). Why did the authors choose such a high cut-off?

      The 1Hz high-pass cut-off frequency for the 1-150Hz band-pass filter applied to the continuous raw MEG data during preprocessing has been used in multiple previous MEG publications (Barratt et al., 2018; Brookes et al., 2012; Higgins et al., 2021; Seedat et al., 2020; Vidaurre et al., 2018).

      (8) "Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not", lines 336/337 - where is that analysis?

      Additional details pertaining to this analysis are now provided in the Results section (Figure 5 – figure supplement 4).

      (9) How were feature importance scores computed?

      We have now added a new subheading in the Methods section with a more detailed description of how feature importance scores were computed.

      (10)  Please add x and y ticks plus tick labels to Figure 5 - Figure Supplement 3, panel A

      Done

      (11) Line 369, what does "comparable" mean in this context?

      The sentence in the “Study Participants” part of the Methods section referred to here has now been revised for clarity.

      (12) In lines 496/497, please specify what t=0 means (KeyDown event, I guess?).

      Yes, the KeyDown event occurs at t = 0. This has now been clarified in the revised manuscript.

      (13) Please specify consistent boundaries between alpha- and beta-bands (they are currently not consistent in the Results vs. Methods (14/15 Hz or 15/16 Hz)).

      We thank the Reviewer for alerting us to this discrepancy caused by a typographic error in the Methods. We have now corrected this so that the alpha (8-14 Hz) and beta-band (15-24 Hz) frequency limits are described consistently throughout the revised manuscript.

      References

      Albouy, G., Fogel, S., King, B. R., Laventure, S., Benali, H., Karni, A., Carrier, J., Robertson, E. M., & Doyon, J. (2015). Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage, 108, 423-434. https://doi.org/10.1016/j.neuroimage.2014.12.049

      Albouy, G., King, B. R., Maquet, P., & Doyon, J. (2013). Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus, 23(11), 985-1004. https://doi.org/10.1002/hipo.22183 Albouy, G., Sterpenich, V., Vandewalle, G., Darsaud, A., Gais, S., Rauchs, G., Desseilles, M., Boly, M., Dang-Vu, T., Balteau, E., Degueldre, C., Phillips, C., Luxen, A., & Maquet, P. (2012). Neural correlates of performance variability during motor sequence acquisition. NeuroImage, 60(1), 324-331. https://doi.org/10.1016/j.neuroimage.2011.12.049

      Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. Annu Rev Neurosci, 25, 189-220. https://doi.org/10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences. Curr Opin Neurobiol, 16(2), 213-221. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=16563734

      Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W., & Donoghue, J. P. (2011). Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol, 105(4), 1603-1619. https://doi.org/10.1152/jn.00532.2010

      Barratt, E. L., Francis, S. T., Morris, P. G., & Brookes, M. J. (2018). Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage, 181, 831-844. https://doi.org/10.1016/j.neuroimage.2018.06.041

      Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T. (2011). Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A, 108(18), 7641-7646. https://doi.org/10.1073/pnas.1018985108

      Battaglia-Mayer, A., & Caminiti, R. (2019). Corticocortical Systems Underlying High-Order Motor Control. J Neurosci, 39(23), 4404-4421. https://doi.org/10.1523/JNEUROSCI.2094-18.2019

      Berlot, E., Popp, N. J., & Diedrichsen, J. (2020). A critical re-evaluation of fMRI signatures of motor sequence learning. Elife, 9. https://doi.org/10.7554/eLife.55241

      Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N., & Cohen, L. G. (2020). Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn, 5, 7. https://doi.org/10.1038/s41539-020-0066-9

      Bonstrup, M., Iturrate, I., Thompson, R., Cruciani, G., Censor, N., & Cohen, L. G. (2019). A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol, 29(8), 1346-1351 e1344. https://doi.org/10.1016/j.cub.2019.02.049

      Brawn, T. P., Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2010). Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci, 30(42), 13977-13982. https://doi.org/10.1523/JNEUROSCI.3295-10.2010

      Brookes, M. J., Woolrich, M. W., & Barnes, G. R. (2012). Measuring functional connectivity in MEG: a multivariate approach insensitive to linear source leakage. NeuroImage, 63(2), 910-920. https://doi.org/10.1016/j.neuroimage.2012.03.048

      Brooks, E., Wallis, S., Hendrikse, J., & Coxon, J. (2024). Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn, 9(1), 23. https://doi.org/10.1038/s41539-024-00238-6

      Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M., & Cohen, L. G. (2021). Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep, 35(10), 109193. https://doi.org/10.1016/j.celrep.2021.109193

      Buneo, C. A., & Andersen, R. A. (2006). The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44(13), 2594-2606. https://doi.org/10.1016/j.neuropsychologia.2005.10.011

      Buzsaki, G. (2015). Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. https://doi.org/10.1002/hipo.22488

      Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H., & Staresina, B. P. (2024). Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680. https://doi.org/10.1101/2024.10.06.614680

      Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., & Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature, 487(7405), 51-56. https://doi.org/10.1038/nature11129

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol, 79(2), 1117-1123. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=9463469

      Colclough, G. L., Brookes, M. J., Smith, S. M., & Woolrich, M. W. (2015). A symmetric multivariate leakage correction for MEG connectomes. NeuroImage, 117, 439-448. https://doi.org/10.1016/j.neuroimage.2015.03.071

      Colclough, G. L., Woolrich, M. W., Tewarie, P. K., Brookes, M. J., Quinn, A. J., & Smith, S. M. (2016). How reliable are MEG resting-state connectivity metrics? NeuroImage, 138, 284-293. https://doi.org/10.1016/j.neuroimage.2016.05.070

      Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P., & Azanon, E. (2024). “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795. https://doi.org/10.1101/2024.07.11.602795

      Deleglise, A., Donnelly-Kehoe, P. A., Yeffal, A., Jacobacci, F., Jovicich, J., Amaro, E., Jr., Armony, J. L., Doyon, J., & Della-Maggiore, V. (2023). Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex, 33(10), 6120-6131. https://doi.org/10.1093/cercor/bhac489

      Doyon, J., Bellec, P., Amsel, R., Penhune, V., Monchi, O., Carrier, J., Lehéricy, S., & Benali, H. (2009). Contributions of the basal ganglia and functionally related brain structures to motor learning. [Review]. Behavioural brain research, 199(1), 61-75. https://doi.org/10.1016/j.bbr.2008.11.012

      Doyon, J., Song, A. W., Karni, A., Lalonde, F., Adams, M. M., & Ungerleider, L. G. (2002). Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A, 99(2), 1017-1022. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11805340

      Euston, D. R., Gruber, A. J., & McNaughton, B. L. (2012). The role of medial prefrontal cortex in memory and decision making. Neuron, 76(6), 1057-1070. https://doi.org/10.1016/j.neuron.2012.12.002

      Euston, D. R., Tatsuno, M., & McNaughton, B. L. (2007). Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science, 318(5853), 1147-1150. https://doi.org/10.1126/science.1148979

      Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E., & Slutzky, M. W. (2012). Local field potentials allow accurate decoding of muscle activity. J Neurophysiol, 108(1), 18-24. https://doi.org/10.1152/jn.00832.2011

      Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nat Rev Neurosci, 6(2), 119-130. https://doi.org/10.1038/nrn1607

      Gais, S., Albouy, G., Boly, M., Dang-Vu, T. T., Darsaud, A., Desseilles, M., Rauchs, G., Schabus, M., Sterpenich, V., Vandewalle, G., Maquet, P., & Peigneux, P. (2007). Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A, 104(47), 1877818783. https://doi.org/10.1073/pnas.0705454104

      Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E. (1992). Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci, 12(7), 2542-2548.

      Grover, S., Wen, W., Viswanathan, V., Gill, C. T., & Reinhart, R. M. G. (2022). Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci, 25(9), 1237-1246. https://doi.org/10.1038/s41593-022-01132-3

      Gupta, M. W., & Rickard, T. C. (2022). Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn, 7(1), 25. https://doi.org/10.1038/s41539-022-00140-z

      Gupta, M. W., & Rickard, T. C. (2024). Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 14(1), 4661. https://doi.org/10.1038/s41598-024-52726-9

      Hardwick, R. M., Rottschy, C., Miall, R. C., & Eickhoff, S. B. (2013). A quantitative metaanalysis and review of motor learning in the human brain. NeuroImage, 67, 283-297. https://doi.org/10.1016/j.neuroimage.2012.11.020

      Heusser, A. C., Poeppel, D., Ezzyat, Y., & Davachi, L. (2016). Episodic sequence memory is supported by a theta-gamma phase code. Nat Neurosci, 19(10), 1374-1380. https://doi.org/10.1038/nn.4374

      Higgins, C., Liu, Y., Vidaurre, D., Kurth-Nelson, Z., Dolan, R., Behrens, T., & Woolrich, M. (2021). Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron, 109(5), 882-893 e887. https://doi.org/10.1016/j.neuron.2020.12.007

      Hikosaka, O., Nakamura, K., Sakai, K., & Nakahara, H. (2002). Central mechanisms of motor skill learning. Curr Opin Neurobiol, 12(2), 217-222. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=12015240

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro, E., Jr., Jovicich, J., Doyon, J., & Della-Maggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A, 117(38), 23898-23903. https://doi.org/10.1073/pnas.2009576117

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro Jr, E., Jovicich, J., Doyon, J., & DellaMaggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning.

      Proceedings of the National Academy of Sciences, 117(38), 23898-23903. Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995). Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature, 377(6545), 155-158. https://doi.org/10.1038/377155a0

      Kennerley, S. W., Sakai, K., & Rushworth, M. F. (2004). Organization of action sequences and the role of the pre-SMA. J Neurophysiol, 91(2), 978-993. https://doi.org/10.1152/jn.00651.2003 00651.2003 [pii]

      Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol, 80, 3321-3325.

      Kornysheva, K., Bush, D., Meyer, S. S., Sadnicka, A., Barnes, G., & Burgess, N. (2019). Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron, 101(6), 1166-1180 e1163. https://doi.org/10.1016/j.neuron.2019.01.018

      Lee, S. H., Jin, S. H., & An, J. (2019). The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 9(1), 14066. https://doi.org/10.1038/s41598-019-50644-9

      Lisman, J. E., & Jensen, O. (2013). The theta-gamma neural code. Neuron, 77(6), 1002-1016. https://doi.org/10.1016/j.neuron.2013.03.007

      Mollazadeh, M., Aggarwal, V., Davidson, A. G., Law, A. J., Thakor, N. V., & Schieber, M. H. (2011). Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci, 31(43), 15531-15543. https://doi.org/10.1523/JNEUROSCI.2999-11.2011

      Molle, M., & Born, J. (2009). Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron, 61(4), 496-498. https://doi.org/10.1016/j.neuron.2009.02.002

      Morris, R. G. M. (2006). Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. [Review]. The European journal of neuroscience, 23(11), 2829-2846. https://doi.org/10.1111/j.1460-9568.2006.04888.x

      Mylonas, D., Schapiro, A. C., Verfaellie, M., Baxter, B., Vangel, M., Stickgold, R., & Manoach, D. S. (2024). Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci, 44(14). https://doi.org/10.1523/JNEUROSCI.1839-23.2024

      Pan, S. C., & Rickard, T. C. (2015). Sleep and motor learning: Is there room for consolidation? Psychol Bull, 141(4), 812-834. https://doi.org/10.1037/bul0000009

      Penhune, V. B., & Steele, C. J. (2012). Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res., 226(2), 579-591. https://doi.org/10.1016/j.bbr.2011.09.044

      Qin, Y. L., McNaughton, B. L., Skaggs, W. E., & Barnes, C. A. (1997). Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci, 352(1360), 1525-1533. https://doi.org/10.1098/rstb.1997.0139

      Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn, 34(4), 834-842. https://doi.org/10.1037/0278-7393.34.4.834

      Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedural consolidation. Nat Rev Neurosci, 5(7), 576-582. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=15208699

      Sawamura, D., Sakuraba, S., Suzuki, Y., Asano, M., Yoshida, S., Honke, T., Kimura, M., Iwase, Y., Horimoto, Y., Yoshida, K., & Sakai, S. (2019). Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep, 9(1), 20397. https://doi.org/10.1038/s41598-019-56956-0

      Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37(6), 1013-1025. https://doi.org/10.1016/s0896-6273(03)00123-5

      Seedat, Z. A., Quinn, A. J., Vidaurre, D., Liuzzi, L., Gascoyne, L. E., Hunt, B. A. E., O'Neill, G. C., Pakenham, D. O., Mullinger, K. J., Morris, P. G., Woolrich, M. W., & Brookes, M. J. (2020). The role of transient spectral 'bursts' in functional connectivity: A magnetoencephalography study. NeuroImage, 209, 116537. https://doi.org/10.1016/j.neuroimage.2020.116537

      Shadmehr, R., & Holcomb, H. H. (1997). Neural correlates of motor memory consolidation. Science, 277, 821-824.

      Sjøgård, M., Baxter, B., Mylonas, D., Driscoll, B., Kwok, K., Tolosa, A., Thompson, M., Stickgold, R., Vangel, M., Chu, C., & Manoach, D. S. (2024). Hippocampal ripples mediate motor learning during brief rest breaks in humans. bioRxiv. https://doi.org/10.1101/2024.05.02.592200

      Srinivas, S., Sarvadevabhatla, R. K., Mopuri, K. R., Prabhu, N., Kruthiventi, S. S. S., & Babu, R. V. (2016). A Taxonomy of Deep Convolutional Neural Nets for Computer Vision [Technology Report]. Frontiers in Robotics and AI, 2. https://doi.org/10.3389/frobt.2015.00036

      Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Desseilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P. (2009). Sleep promotes the neural reorganization of remote emotional memory. J Neurosci, 29(16), 5143-5152. https://doi.org/10.1523/JNEUROSCI.0561-09.2009

      Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage, 14(5), 10481057. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11697936

      Toni, I., Thoenissen, D., & Zilles, K. (2001). Movement preparation and motor intention. NeuroImage, 14(1 Pt 2), S110-117. https://doi.org/10.1006/nimg.2001.0841

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      van Kesteren, M. T., Fernandez, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schemadependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A, 107(16), 7550-7555. https://doi.org/10.1073/pnas.0914892107

      van Kesteren, M. T., Ruiter, D. J., Fernandez, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends Neurosci, 35(4), 211-219. https://doi.org/10.1016/j.tins.2012.02.001

      Vidaurre, D., Hunt, L. T., Quinn, A. J., Hunt, B. A. E., Brookes, M. J., Nobre, A. C., & Woolrich, M. W. (2018). Spontaneous cortical activity transiently organises into frequency specific phase-coupling networks. Nat Commun, 9(1), 2987. https://doi.org/10.1038/s41467-01805316-z

      Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B. R., & Buckner, R. L. (1998). Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. [Comment]. Science (New York, N.Y.), 281(5380), 1188-1191. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=9712582 &retmode=ref&cmd=prlinks

      Wolpert, D. M., Goodbody, S. J., & Husain, M. (1998). Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci, 1(6), 529-533. https://doi.org/10.1038/2245

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Based on previous publications suggesting a potential role for miR-26b in the pathogenesis of metabolic dysfunction-associated steatohepatitis (MASH), the researchers aim to clarify its function in hepatic health and explore the therapeutical potential of lipid nanoparticles (LNPs) to treat this condition. First, they employed both whole-body and myeloid cell-specific miR-26b KO mice and observed elevated hepatic steatosis features in these mice compared to WT controls when subjected to WTD. Moreover, livers from whole-body miR-26b KO mice also displayed increased levels of inflammation and fibrosis markers. Kinase activity profiling analyses revealed distinct alterations, particularly in kinases associated with inflammatory pathways, in these samples. Treatment with LNPs containing miR-26b mimics restored lipid metabolism and kinase activity in these animals. Finally, similar anti-inflammatory effects were observed in the livers of individuals with cirrhosis, whereas elevated miR-26b levels were found in the plasma of these patients in comparison with healthy control. Overall, the authors conclude that miR-26b plays a protective role in MASH and that its delivery via LNPs efficiently mitigates MASH development.

      The study has some strengths, most notably, its employ of a combination of animal models, analyses of potential underlying mechanisms, as well as innovative treatment delivery methods with significant promise. However, it also presents numerous weaknesses that leave the research work somewhat incomplete. The precise role of miR-26b in a human context remains elusive, hindering direct translation to clinical practice. Additionally, the evaluation of the kinase activity, although innovative, does not provide a clear molecular mechanisms-based explanation behind the protective role of this miRNA.

      Therefore, to fortify the solidity of their conclusions, these concerns require careful attention and resolution. Once these issues are comprehensively addressed, the study stands to make a significant impact on the field.

      We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we have addressed these as good as possible during the revision of our manuscript.

      We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.

      The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.

      Reviewer #1 (Recommendations For The Authors):

      A list of recommendations for the authors is presented below:

      (1) The title should emphasize that the majority of experiments were conducted in mice to accurately reflect the scope of the study.

      As suggested we have updated our title to include the statement that we primarily used a murine model:

      “MicroRNA-26b protects against MASH development in mice and can be efficiently targeted with lipid nanoparticles.”

      (2) It would be useful to know more about miR-26b function, including its target genes, tissue-specific expression, and tissue vs. circulating levels. Is it expected that the two strains of the miRNA (i.e., -3p and -5p) act this similarly? Also, miR-26b expression in the liver of individuals with cirrhosis should be determined.

      The function of miR-26b is still rather elusive, making functional studies using this miR very interesting. In a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021) we have eluded several functions of miR-26b that are already investigated. This was particularly already described in carcinogenesis and the neurological field.

      Target gene wise, there are already several targets described in miRbase. However, for our experiments we feel that determination of the specific target genes is beyond the scope of the current manuscript and rather a focus of follow-up projects.

      Regarding the expression of miR-26b, the liver and blood have rather high and similar expressions of both miR-26b-3p and miR-26b-5p as shown in Author response image 1.

      Author response image 1.

      Expression of miR-26b-3p and -5p. Expression of miR-26b-3p (left) and miR-26b-5p (right), generated by using the miRNATissueAtlas 2025 (Rishik et al. Nucleic Acids Research, 2024). Unfortunately, due to restrictions in tissue availability and the lack of stored RNA samples, we are unable to measure miR-26b expression in the human livers. However, based on the potency of the miR-26b mimic loaded LNPs in the mice (Revised Supplemental Figure 2A-B), we are confident that these LNPs also resulted in a overexpression of miR-26b in the human livers.

      (3) Please explain the rationale behind primarily using whole-body miR-26b KO mice rather than the myeloid cell-specific KO model for the studies.

      The main goal of our study is the elucidation of the general role of miR-26b in MASH formation. Therefore, we decided to primarily focus on the whole-body KO model. While we used the myeloid cell-specific KO model to highlight that myeloid cells play an important role in the observed phenotypes, we believe the whole-body KO model is more appropriate as main focus, particularly also in light of the used LNP targeting that also provides a whole-body approach. Furthermore, this focus on the whole-body model also reflects a more therapeutically relevant approach.

      (4) The authors claim that treatment with LNPs containing miR-26b "replenish the miR-26b level in the whole-body deficient mouse" but the results of this observation are not presented.

      This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Revised Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.

      (5) The number of 3 human donors for the precision-cut liver slices is clearly insufficient and clinical parameters need to be shown. Additionally, inconsistencies in individual values in Figures 8B-E need clarification.

      Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number for these experiments. Clinical parameters are not available, but the liver slices were from healthy tissue.

      We have performed these experiments in duplicates for each individual donor. We have now specified this also in the figure legend to explain the individual values in the graphs:

      “…(3 individual donors, cultured in duplicates).”

      (6) Figure 2D: Please include representative images.

      As suggested we have included representative images in our revised manuscript.

      (7) Address discrepancies in the findings across different experimental settings. For example, the expression levels of the lipid metabolism-related genes are not significantly modulated in whole-body miR-26b KO mice (except for Sra), but they are in the myeloid cell-specific model (but not Sra), and none of them are restored after LNPs injections.

      Although Cd36 is not significantly increased in the whole-body miR-26b KO mice, there is a clear tendency towards increased expression, which is now also validated on protein level (Revised Figure 1K-L). In the myeloid cell-specific model we see a similar tendency, although the gene expression difference of Sra is not significantly changed. This could be due to the difference in the model, since only myeloid cells are affected, suggesting that the effects on Sra are to a large extend driven by non-myeloid cells. This would also fit to the tendency to decreased Sra expression in the mimic-LNP treated mice. Due to the larger variation, this difference did not reach significance, which is rather a statistical issue due to relatively small n-numbers. At this moment, we cannot exclude that these receptors are differentially regulated by different cell-types. For this, future studies are needed focussing on cell-specific targeting of miR-26b in somatic cells, like hepatocytes.

      (8) Figure 4A the images are not representative of the quantification.

      We have selected another representative image that is exactly reflecting the average Sirius red positive area, to reflect the quantification appropriately.

      (9) Figures 5 and 7: Are there not significantly decreased/increased kinases? A deeper analysis of these kinase alterations is necessary to understand how miR-26b exerts its role. A comparison analysis of these two datasets might clarify this regard.

      We indeed very often see in these kinome analysis that the general tendency of kinase activity is unidirectional. We believe that this is caused by the highly interconnected nature of kinases. Activation of one signalling cascade will also results in the activation of many other cascades. However, it is interesting to see which pathways are affected in our study and we find it particularly interesting to see that the tendencies is exactly opposite between both comparisons as KO vs. WT shows increase kinase activities, while KO-LNP vs. KO shows a decrease again. Further showing that the method is reflecting a true biological effect that is mediated by miR26b.

      (10) Determinations of the effect of LNPs containing miR-26b in the KO mice are limited to only a few observations (that are not only significant). More extensive findings are needed to conclusively demonstrate the effectiveness of this treatment method. Similar to the experiments with human liver samples (Figures 8A-E).

      We have now elaborated our observations in the mouse model using LNPs by also analysing the effects on inflammation and fibrosis. However, it seems that the treatment time was not long enough to see pronounced changes on these later stages of disease development. Interestingly, the expression of Tgfb was significantly reduced, suggesting at least that the LNPs on genetic levels have an effect already on fibrotic processes. Thereby, it can be suggested that longer mLNP treatment may result in more effects on protein level as well, which remains to be determined in future studies.

      Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number or read-outs for these experiments at this moment.

      (11) In Figures 8F-H, the observed increase in circulating miR-26b levels in the plasma of cirrhotic individuals seems contradictory to its proposed protective role. This discrepancy requires clarification.

      In the revised discussion (second to last paragraph), we have now elaborated more on the findings in the plasma of cirrhotic individuals in comparison to our murine in-vivo results, to highlight and discuss this discrepancy.

      (12) Figures 8F-H legend mentions using 8-11 patients per group, but the methods section lacks corresponding information about these individuals.

      These patients, together with inclusion/exclusion criteria and definition of cirrhosis are described in the method section 2.14.

      (13) Figure 8G has 7 data points in the cirrhosis group, instead of 8. Any data exclusion should be justified in the methods section.

      As defined in method section 2.15, we have identified outliers using the ROUT = 1 method, which is the reason why Figure 8G only has 7 data points instead of 8.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Peters, Rakateli, et al. aims to characterize the contribution of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. In addition, the authors provide a rescue of the miR-26b using lipid nanoparticles (LNPs), with potential therapeutic implications. In addition, the authors provide useful insights into the role of macrophages and some validation of the effect of miR-26b LNPs on human liver samples.

      Strengths:

      The authors provide a well-designed mouse model, that aims to characterize the role of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. The rescue of the phenotypes associated with the model used using miR-26b using lipid nanoparticles (LNPs) provides an interesting avenue to novel potential therapeutic avenues.

      Weaknesses:

      Although the authors provide a new and interesting avenue to understand the role of miR-26b in MASH, the study needs some additional validations and mechanistic insights in order to strengthen the author's conclusions.

      (1) Analysis of the expression of miRNAs based on miRNA-seq of human samples (see https://ccb-compute.cs.uni-saarland.de/isomirdb/mirnas) suggests that miR-26b-5p is highly abundant both on liver and blood. It seems hard to reconcile that despite miRNA abundance being similar in both tissues, the physiological effects claimed by the authors in Figure 2 come exclusively from the myeloid (macrophages).

      We agree with the reviewer that the effects observed in the whole-body KO model are most likely a combination of cellular effects, particularly since miR-26b is also highly expressed in the liver. However, with the LysM-model we merely want to demonstrate that the myeloid cells at least play an important, though not exclusive, role in the phenotype. In the discussion, we also further elaborate on the fact that the observed changes in the liver can me mediated by hepatic changes.

      To stress this, we have adjusted the conclusion of Figure 2:

      “Interestingly, mice that have a myeloid-specific lack of miR-26b also show increased hepatic cholesterol levels and lipid accumulation demonstrated by Oil-red-O staining, coinciding with an increased hepatic Cd36 expression (Figure 2), demonstrating that myeloid miR-26b plays a major, but not exclusive, role in the observed steatosis.”

      (2) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26a-5p is indeed 4-fold higher than miR-26b-5p both in the liver and blood. Since both miRNAs share the same seed sequence, and most of the supplemental regions (only 2 nt difference), their endogenous targets must be highly overlapped. It would be interesting to know whether deletion of miR-26b is somehow compensated by increased expression of miR-26a-5p loci. That would suggest that the model is rather a depletion of miR-26.

      UUCAAGUAAUUCAGGAUAGGU mmu-miR-26b-5p mature miRNA

      UUCAAGUAAUCCAGGAUAGGCU mmu-miR-26a-5p mature miRNA

      This is a very valid point raised by the reviewer, which we actually already explored in a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021). In this manuscript, we could show that miR-26a is not affected by the deficiency of miR-26b (Figure 1G in: Van der Vorst et al. BMC Genom Data, 2021).

      (3) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26b-5p is indeed 50-fold higher than miR-26b-3p in the liver and blood. This difference in abundance of the two strands is usually regarded as one of them being the guide strand (in this case the 5p) and the other being the passenger (in this case the 3p). In some cases, passenger strands can be a byproduct of miRNA biogenesis, thus the rescue experiments using LNPs with both strands in equimolar amounts would not reflect the physiological abundance miR-26b-3p. The non-physiological overabundance of miR-26b-3p would constitute a source of undesired off-targets.

      We agree with the reviewer on this aspect and this is something we had to consider while generating the mimic LNPs. However, we believe that we do not observe and undesired off-target effects, as the effects of the mimic LNPs at least on functional outcomes are relatively mild and only restricted to the expected effects on lipids. Furthermore, the effects on the kinase profile due to the mimic LNP treatment are in line with our expectations. Combined these results suggest at least that potential off-target effects are minor.

      (4) It would also be valuable to check the miRNA levels on the liver upon LNP treatment, or at least the signatures of miR-26b-3p and miR-26b-5p activity using RNA-seq on the RNA samples already collected.

      This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.

      (5) Some of the phenotypes described, such as the increase in cholesterol, overlap with the previous publication by van der Vorst et al. BMC Genom Data (2021), despite in this case the authors are doing their model in Apoe knock-out and Western-type diet. I would encourage the authors to investigate more or discuss why the initial phenotypes don't become more obvious despite the stressors added in the current manuscript.

      In our previous publication (BMC Genom Data; 2021), we actually did not see any changes in circulating lipid levels. However, in that study we did not evaluate the livers of the mice, so we do not have any information about the hepatic lipid levels.

      As mentioned by the reviewer, we believe that we see much more pronounced phenotypes in the current model because we use the combined stressor of Apoe-/- and Western-type diet, which cannot be compared to the wildtype and chow-fed mice used in the BMC Genom Data manuscript.

      (6) The authors have focused part of their analysis on a few gene makers that show relatively modest changes. Deeper characterization using RNA-seq might reveal other genes that are more profoundly impacted by miR-26 depletion. It would strengthen the conclusions proposed if the authors validated that changes in mRNA abundance (Sra, Cd36) do impact the protein abundance. These relatively small changes or trends in mRNA expression, might not translate into changes in protein abundance.

      As suggested by the reviewer we have now also confirmed that the protein expression of CD36 and SRA is significantly increased upon miR-26b depletion, visualized as Figure 1K-L in the revised manuscript. Unfortunately, we do not have enough material left to perform similar analysis for the LysM-model or the LNP-model, although based on the whole-body effects we are confident that at least for CD36/SRA in this case the gene expression matches effects observed on protein level.

      (7) In Figures 5 and 7, the authors run a phosphorylation array (STK) to analyze the changes in the activity of the kinome. It seems that a relatively large number of signaling pathways are being altered, I think that should be strengthened by further validations by Western blot on the collected tissue samples. For quite a few of the kinases, there might be antibodies that recognise phosphorylation. The two figures lack a mechanistic connection to the rest of the manuscript.<br /> On this point we respectfully have to disagree with the reviewer. We have used a kinase activity profiling approach (PamGene) to analyse the real-time activity of kinases in our lysates. This approach is different than the classical Western blot approach in which only the presence or absence of a specific phosphorylation is detected. Thereby, Western blot analysis does not analyse phosphorylation in real-time, but rather determines whether there has been phosphorylation in the past. Our approach actually determines the real-time, current activity of the kinases, which we believe is a different and perhaps even more reliable read-out measurement. Therefore, validation by Western blot would not strengthen these observations.

      We have particularly tried to connect these observations to the rest of the manuscript by highlighting the observed signalling cascades that are affected, highlighting a role in inflammation and angiogenesis, thereby providing some mechanistic insights.

      Reviewer #2 (Recommendations For The Authors):

      I would encourage the authors to follow-up on some of the more miRNA focused comments made above, which would strengthen the mechanistic part of the work presented.

      I suggest the authors tone down some of some of the claims made (eg. "clearly increased expression", "exacerbated hepatic fibrosis"), given that some of it might need further validation.

      Wherever needed we have tuned down the tone of some claims, although we believe that most claims are already written carefully enough and in line with the observed results.

      Some of the panels that are supposed to have the same amount of animals have variable N, despite they come from the same exact number of RNA samples or tissue lysates (eg. 1G and 1H, vs 1I and 1J).

      This is indeed correct and caused by the fact that some analysis resulted in statistical outliers as identified using the ROUT = 1 method, as also specified in section 2.15 of the method section.

      It would be nice to have representative images of oil-red-o in all the figures where it is quantified (or at least in the supplementary figures).

      As suggested by the reviewer, we have now included representative images for the LysM-model (Revised Figure 2D) and the LNP-model (Revised Figure 6D) as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary:

      In this manuscript, Shao et al. investigate the contribution of different cortical areas to working memory maintenance and control processes, an important topic involving different ideas about how the human brain represents and uses information when it is no longer available to sensory systems. In two fMRI experiments, they demonstrate that the human frontal cortex (area sPCS) represents stimulus (orientation) information both during typical maintenance, but even more so when a categorical response demand is present. That is, when participants have to apply an added level of decision control to the WM stimulus, sPCS areas encode stimulus information more than conditions without this added demand. These effects are then expanded upon using multi-area neural network models, recapitulating the empirical gradient of memory vs control effects from visual to parietal and frontal cortices. In general, the experiments and analyses provide solid support for the authors' conclusions, and control experiments and analyses are provided to help interpret and isolate the frontal cortex effect of interest. However, I suggest some alternative explanations and important additional analyses that would help ensure an even stronger level of support for these results and interpretations.

      Strengths:

      -  The authors use an interesting and clever task design across two fMRI experiments that is able to parse out contributions of WM maintenance alone along with categorical, rule-based decisions. Importantly, the second experiment only uses one fixed rule, providing both an internal replication of Experiment 1's effects and extending them to a different situation when rule-switching effects are not involved across mini-blocks.

      - The reported analyses using both inverted encoding models (IEM) and decoders (SVM) demonstrate the stimulus reconstruction effects across different methods, which may be sensitive to different aspects of the relationship between patterns of brain activity and the experimental stimuli.

      - Linking the multivariate activity patterns to memory behavior is critical in thinking about the potential differential roles of cortical areas in sub-serving successful working memory. Figure 3 nicely shows a similar interaction to that of Figure 2 in the role of sPCS in the categorization vs. maintenance tasks.

      - The cross-decoding analysis in Figure 4 is a clever and interesting way to parse out how stimulus and rule/category information may be intertwined, which would have been one of the foremost potential questions or analyses requested by careful readers. However, I think more additional text in the Methods and Results to lay out the exact logic of this abstract category metric will help readers bet0ter interpret the potential importance of this analysis and result.

      We thank the reviewer for the positive assessment of our manuscript. Please see lines 366-372, 885-894 in the revised manuscript for a detailed description of the abstract category index, and see below for a detailed point-by-point response.

      Weaknesses:

      - Selection and presentation of regions of interest: I appreciate the authors' care in separating the sPCS region as "frontal cortex", which is not necessarily part of the prefrontal cortex, on which many ideas of working memory maintenance activity are based. However, to help myself and readers interpret these findings, at a minimum the boundaries of each ROI should be provided as part of the main text or extended data figures. Relatedly, the authors use a probabilistic visual atlas to define ROIs in the visual, parietal, and frontal cortices. But other regions of both lateral frontal and parietal cortices show retinotopic responses (Mackey and Curtis, eLife, 2017: https://elifesciences.org/articles/22974) and are perhaps worth considering. Do the inferior PCS regions or inferior frontal sulcus show a similar pattern of effects across tasks? And what about the middle frontal gyrus areas of the prefrontal cortex, which are most analogous to the findings in NHP studies that the authors mention in their discussion, but do not show retinotopic responses? Reporting the effects (or lack thereof) in other areas of the frontal cortex will be critical for readers to interpret the role of the frontal cortex in guiding WM behavior and supporting the strongly worded conclusions of broad frontal cortex functioning in the paper. For example, to what extent can sPCS results be explained by visual retinotopic responses? (Mackey and Curtis, eLife, 2017: https://elifesciences.org/articles/22974).

      We thank the reviewer for the suggestions. We have added a Supplemental Figure 1 to better illustrate the anatomical locations of ROIs.  

      Following the reviewer’s suggestion, we defined three additional subregions in the frontal cortex based on the HCP atlas [1], including the inferior precentral sulcus (iPCS, generated by merging 6v, 6r, and PEF), inferior frontal sulcus (IFS, generated by merging IFJp, IFJa, IFSp, IFSa, and p47r), and middle frontal gyrus (MFG, generated by merging 9-46d, 46, a9-46v, and p9-46v). We then performed the same analyses as in the main text using both mixed-model and within-condition IEMs. Overall, we found that none of the ROIs demonstrated significant orientation representation in Experiment 1, for either IEM analysis (Author response image 1A and 1C). In Experiment 2, however, the IFS and MFG (but not iPCS) demonstrated a similar pattern to sPCS for orientation representation, though these results did not persist in the within-condition IEM with lower SNR (Author response image 1B and 1D). Moreover, when we performed the abstract category decoding analysis in the three ROIs, only the MFG in Experiment 2 showed significant abstract category decoding results, with no significant difference between experiments (Author response image 1E). To summarize, the orientation and category results observed in sPCS in the original manuscript were largely absent in other frontal regions. There was some indication that the MFG might share some results for orientation representation and category decoding, although this pattern was weaker and was only observed in some analyses in Experiment 2. Therefore, although we did not perform retinotopic mapping and cannot obtain a direct measure of retinotopic responses in the frontal cortex, these results suggest that our findings are unlikely to be explained by visual retinotopic responses: the iPCS, which is another retinotopic region, did not show the observed pattern in any of the analyses. Notably, the iPCS results are consistent with our previous work demonstrating that orientation information cannot be decoded from iPCS during working memory delay [2]. We have included these results on lines 395-403, 563-572 in the revised manuscript to provide a more comprehensive understanding of the current findings. 

      Author response image 1.

      Orientation reconstruction and abstract category decoding results in iPCS, IFS, and MFG.

      - When looking at the time course of effects in Figure 2, for example, the sPCS maintenance vs categorization effects occur very late into the WM delay period. More information is needed to help separate this potential effect from that of the response period and potential premotor/motor-related influences. For example, are the timecourses shifted to account for hemodynamic lag, and if so, by how much? Do the sPCS effects blend into the response period? This is critical, too, for a task that does not use a jittered delay period, and potential response timing and planning can be conducted by participants near the end of the WM delay. For example, the authors say that " significant stimulus representation in EVC even when memoranda had been transformed into a motor format (24)". But, I *think* this paper shows the exact opposite interpretation - EVC stimulus information is only detectable when a motor response *cannot* be planned (https://elifesciences.org/articles/75688). Regardless, parsing out the timing and relationship to response planning is important, and an ROI for M1 or premotor cortex could also help as a control comparison point, as in reference (24).

      We thank the reviewer for raising this point. We agree that examining the contribution of response-related activity in our study is crucial, as we detail below:

      First, the time course results in the manuscript are presented without time shifting. The difference in orientation representation in Figure 2 emerged at around 7 s after task cue onset and 1 s before probe onset. Considering a 4-6 s hemodynamic response lag, the difference should occur around 1-3 s after task cue onset and 5-7 s prior to probe onset. This suggests that a substantial portion of the effect likely occurred during the delay rather than response period.

      Second, our experimental design makes it unlikely that response planning would have influenced our results, as participants were unable to plan their motor responses in advance due to randomized response mapping at the probe stage on a trial-by-trial basis. Moreover, even if response planning had impacted the results in sPCS, it would have affected both conditions similarly, which again, would not explain the observed differences between conditions.

      Third, following the reviewer’s suggestion, we defined an additional ROI (the primary motor cortex, M1) using the HCP atlas and repeated the IEM analysis. No significant orientation representation was observed in either condition in M1, even during the response period (Figure S3), further suggesting that our results are unlikely to be explained by motor responses or motor planning.

      Based on the evidence above, we believe motor responses or planning are unlikely to account for our current findings. We have included these results on lines 264-267 to further clarify this issue.

      Lastly, upon re-reading the Henderson et al. paper [3], we confirmed that stimulus information was still decodable in EVC when a motor response could be planned (Figure 2 of Henderson et al.). In fact, the authors also discussed this result in paragraph 5 of their discussion. This finding, together with our results in EVC, indicates that EVC maintains stimulus information in working memory even when the information is no longer task-relevant, the functional relevance of which warrants further investigation in future research.

      - Interpreting effect sizes of IEM and decoding analysis in different ROIs. Here, the authors are interested in the interaction effects across maintenance and categorization tasks (bar plots in Figure 2), but the effect sizes in even the categorization task (y-axes) are always larger in EVC and IPS than in the sPCS region... To what extent do the authors think this representational fidelity result can or cannot be compared across regions? For example, a reader may wonder how much the sPCS representation matters for the task, perhaps, if memory access is always there in EVC and IPS? Or perhaps late sPCS representations are borrowing/accessing these earlier representations? Giving the reader some more intuition for the effect sizes of representational fidelity will be important. Even in Figure 3 for the behavior, all effects are also seen in IPS as well. More detail or context at minimum is needed about the representational fidelity metric, which is cited in ref (35) but not given in detail. These considerations are important given the claims of the frontal cortex serving such an important for flexible control, here.

      We thank the reviewer for raising this point. We agree that the effect sizes are always larger in EVC and IPS. This is because the specific decoding method we adopted, IEM, is based on the concept of population-level feature-selective responses, and decoding results would be most robust in regions with strong feature-tuning responses, such as EVC and parts of IPS. Therefore, to minimize the impact of effect size on our results, we avoided direct comparisons of representational strength across ROIs, focusing instead on differences in representational strength between conditions within the same ROI. With this approach, we found that EVC and IPS showed high representational fidelity throughout the trial, but only in sPCS did we observe significant higher fidelity in categorization condition, where orientation was actually not a behavioral goal but was manipulated in working memory to achieve the goal. Moreover, although representational fidelity in the EVC was the highest, its behavioral predictability decreased during the delay period, unlike sPCS. These results suggest that the magnitude of fidelity alone is not the determining factor for the observed categorization vs. maintenance effect or for behavioral performance. We have included further discussion on this issue on lines 208-211 of the revised manuscript.

      The reviewer also raised a good point that IPS showed similar behavioral correlation results as sPCS. In the original manuscript, we discussed the functional similarities and distinctions between IPS and sPCS in the discussion. We have expanded on this point on lines 610-627 in the revised manuscript:

      “While many previous WM studies have focused on the functional distinction between sensory and frontoparietal cortex, it has remained less clear how frontal and parietal cortex might differ in terms of WM functions. Some studies have reported stimulus representations with similar functionality in frontal and parietal cortex [4, 5], while others have observed differential patterns [6-8]. We interpret the differential patterns as reflecting a difference in the potential origin of the corresponding cognitive functions. For example, in our study, sPCS demonstrated the most prominent effect for enhanced stimulus representation during categorization as well as the tradeoff between stimulus difference and category representation, suggesting that sPCS might serve as the source region for such effects. On the other hand, IPS did show visually similar patterns to sPCS in some analyses. For instance, stimulus representation in IPS was visually but not statistically higher in the categorization task. Additionally, stimulus representation in IPS also predicted behavioral performance in the categorization task. These results together support the view that our findings in sPCS do not occur in isolation, but rather reflect a dynamic reconfiguration of functional gradients along the cortical hierarchy from early visual to parietal and then to frontal cortex.”

      Lastly, following the reviewer’s suggestion, we have included more details on the representational fidelity metric on lines 201-206, 856-863 in the revised manuscript for clarity.

      Recommendations:

      Figure 3 layout - this result is very interesting and compelling, but I think could be presented to have the effect demonstrated more simply for readers. The scatter plots in the second and third rows take up a lot of space, and perhaps having a barplot as in Figure 2 showing the effects of brain-behavior correlations collapsed across the WM delay period timing would make the effect stand out more.

      We thank the reviewer for the suggestion. We have added a subplot (C) to Figure 3 to demonstrate the brain-behavior correlation collapsed across the late task epoch.

      When discussing the link between sPCS representations and behavior, I think this paper should likely be cited ([https://www.jneurosci.org/content/24/16/3944](https://www.jneurosci.org/content/24/ 16/3944)), which shows univariate relationships between sPCS delay activity and memory-guided saccade performance.

      We thank the reviewer for the suggestion and have included this citation on lines 278-279 in the revised manuscript.

      Interpretation of "control" versus categorization - the authors interpret that "It would be of interest to further investigate whether this active control in the frontal cortex could be generalized to tasks that require other types of WM control such as mental rotation." I think more discussion on the relationship between categorization and "control" is needed, especially given the claim of "flexible control" throughout. Is stimulus categorization a form of cognitive control, and if so, how?  

      We thank the reviewer for raising this point. Cognitive control is generally defined as the process by which behavior is flexibly adapted based on task context and goals, and most theories agree that this process occurs within working memory [9, 10]. With this definition, we consider stimulus categorization to be a form of cognitive control, because participants needed to adapt the stimulus based on the categorization rule in working memory for subsequent category judgements. With two categorization rules, the flexibility in cognitive control increased, because participants need to switch between the two rules multiple times throughout the experiment, instead of being fixed on one rule. We now clarify these two types of controls on lines 112-116 in the introduction.

      However, we agree that the latter form of control could be more related to rule switching that might not be specific to categorization per se. For instance, if participants perform rule switching in another type of WM task that requires WM control such as mental rotation, it remains to be tested whether similar results would be observed and/or whether same brain regions would be recruited. We have included further information on this issue on lines 572-575 in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide evidence that helps resolve long-standing questions about the differential involvement of the frontal and posterior cortex in working memory. They show that whereas the early visual cortex shows stronger decoding of memory content in a memorization task vs a more complex categorization task, the frontal cortex shows stronger decoding during categorization tasks than memorization tasks. They find that task-optimized RNNs trained to reproduce the memorized orientations show some similarities in neural decoding to people. Together, this paper presents interesting evidence for differential responsibilities of brain areas in working memory.

      Strengths:

      This paper was strong overall. It had a well-designed task, best-practice decoding methods, and careful control analyses. The neural network modelling adds additional insight into the potential computational roles of different regions.

      We thank the reviewer for the positive assessment of our manuscript.

      Weaknesses:

      While the RNN model matches some of the properties of the task and decoding, its ability to reproduce the detailed findings of the paper was limited. Overall, the RRN model was not as well-motivated as the fMRI analyses.

      We are grateful for the reviewer’s suggestions on improving our RNN results. Please see below for a detailed point-by-point response.

      Recommendations:

      Overall, I thought that this paper was excellent. I have some conceptual concerns about the RNN model, and minor recommendations for visualization.

      (1) I think that the RNN modelling was certainly interesting and well-executed. However, it was not clear how much it contributed to the results. On the one hand, it wasn't clear why reproducing the stimulus was a critical objective of the task (ie could be more strongly motivated on biological grounds). On the other hand, the agreement between the model and the fMRI results is not that strong. The model does not reproduce stronger decoding in 'EVC' for maintenance vs categorization. Also, the pattern of abstract decoding is very different from the fMRI (eg the RNN has stronger categorical encoding in 'EVC' than 'PFC' and larger differences between fixed and flexible rules in earlier areas than is evident in the fMRI). Together, the RNN modelling comes across as a little ad hoc, without really nailing the performance.

      We thank the reviewer for prompting us to further elaborate on the rationale for our RNN analysis. In our fMRI results, we observed a tradeoff between maintaining stimulus information in more flexible tasks (Experiment 1) and maintaining abstract category information in less flexible tasks (Experiment 2). This led to the hypothesis that participants might have employed different coding strategies in the two experiments. Specifically, in flexible environments, stimulus information might be preserved in its original identity in the higher-order cortex, potentially reducing processing demands in each task and thereby facilitating efficiency and flexibility; whereas in less flexible tasks, participants might generate more abstract category representations based on task rules to facilitate learning. To directly test this idea, we examined whether explicitly placing a demand for the RNN to preserve stimulus representation would recapitulate our fMRI findings in frontal cortex by having stimulus information as an output, in comparison to a model that did not specify such a demand. Meanwhile, we totally agree with the reviewer that there are alternative ways to implement this objective in the model. For instance, changing the network encoding weights (lazy vs. rich regime) to make feedforward neural networks either produce high-dimensional stimulus or low-dimensional category representations [11]. However, we feel that exploring these alternatives may fall outside the scope of the current study.

      Regarding the alignment between the fMRI and RNN results: for the stimulus decoding results in EVC, we found that with an alternative decoding method (IEM), a similar maintenance > categorization pattern was observed in EVC-equivalent module, suggesting that our RNN was capable of reproducing EVC results, albeit in a weaker manner (please see our response to the reviewer’s next point). For the category decoding results, we would like to clarify that the category decoding results in EVC was not necessarily better than those in sPCS. Although category decoding accuracy was numerically higher in EVC, it was more variable compared to IPS and sPCS. To illustrate this point, we calculated the Bayes factor for the category decoding results of RNN2 in Figure 6C, and found that the amount of evidence for category decoding as well as for the decoding difference between RNNs in IPS and sPCS modules was high, whereas the evidence in the EVC was insufficient (Response Table 1).

      Author response table 1.

      Bayes factors for category decoding and decoding differences in Figure 6C lower panel.

      Nevertheless, we agree with the reviewer that all three modules demonstrated the category decoding difference between experiments, which differs from our fMRI results. This discrepancy may be partially due to differences in signal sensitivity. RNN signals typically have a higher SNR compared to fMRI signals, as fMRI aggregates signals from multiple neurons and single-neuron tuning effects can be reduced. We have acknowledged this point on lines 633-636 in the revised manuscript. Nonetheless, the current RNNs effectively captured our key fMRI findings, including increased stimulus representation in frontal cortex as well as the tradeoff in category representation with varying levels of flexible control. We believe the RNN results remain valuable in this regard.

      Honestly, I think the paper would have a very similar impact without the modelling results, but I appreciate that you put a lot of work into the modeling, and this is an interesting direction for future research. I have a few suggestions, but nothing that I feel too strongly about.

      - It might be informative to use IEM to better understand the RNN representations (and how similar they are to fMRI). For example, this could show whether any of the modules just encode categorical information. 

      - You could try providing the task and/or retro cue directly to the PFC units. This is a little unrealistic, but may encourage a stronger role for PFC.

      - You might adjust the ratio of feedforward/feedback connections, if you can find good anatomical guidance on what these should be.

      Obviously, I don't have much - it's a tricky problem!

      We thank the reviewer for the suggestions. To better align the fMRI and RNN results, we first performed the same IEM analyses used in the fMRI analyses on the RNN data. We found that with IEM, the orientation representation in the EVC module demonstrated a pattern similar to that in the fMRI data, showing a negative trend for the difference between categorization and maintenance, although the trend did not reach statistical significance (Author response image 2A). Meanwhile, the difference between categorization and maintenance remained a positive trend in the sPCS module.

      Second, following the reviewer’s suggestion, we adjusted the ratio of feedforward/feedback connections between modules to 1:2, such that between Modules 1 and 2 and between Modules 2 and 3, there were always more feedback than feedforward connections, consistent with recent theoretical proposals [12]. We found that, this change preserved the positive trend for orientation differences in the sPCS module, but in the meantime also made the orientation difference in the EVC and IPS modules more positive (Author response image 2B).

      To summarize, we found that the positive difference between categorization and maintenance in the sPCS module was robust across difference RNNs and analytical approaches, further supporting that RNNs with stimulus outputs can replicate our key fMRI findings in the frontal cortex. By contrast, the negative difference between categorization and maintenance in EVC was much weaker. It was weakly present using some analytical methods (i.e., the IEM) but not others (i.e., SVMs), and increasing the feedback ratio of the entire network further weakened this difference. We believe that this could be due to that the positive difference was mainly caused by top-down, feedback modulations from higher cortex during categorization, such that increasing the feedback connection strengthens this pattern across modules. We speculate that enhancing the negative difference in the EVC module might require additional modules or inputs to strengthen fine-grained stimulus representation in EVC, a mechanism that might be of interest to future research. We have added a paragraph to the discussion on the limitations of the RNN results on lines 629-644.

      Author response image 2.

      Stimulus difference across RNN modules.  (A). Results using IEM (p-values from Module 1 to 3: 0.10, 0.48, 0.01). (B). Results using modified RNN2 with changed connection ratio (p-values from Module 1 to 3: 0.12, 0.22, 0.08). All p-values remain uncorrected.

      (2) Can you rule out that during the categorization task, the orientation encoding in PFC isn't just category coding? You had good controls for category coding, but it would be nice to see something for orientation coding. e.g., fit your orientation encoding model after residualizing category encoding, or show that category encoding has worse CV prediction than orientation encoding.

      We thank the reviewer for raising this point. To decouple orientation and category representations, we performed representational similarity analysis (RSA) in combination with linear mixed-effects modeling (LMEM) on the fMRI data. Specifically, we constructed three hypothesized representational dissimilarity matrices (RDMs), one for graded stimulus (increasing distance between orientations as they move farther apart, corresponding to graded feature tuning responses), one for abstract category (0 for all orientations within the same category and 1 for different categories), and another for discrete stimulus (indicating equidistant orientation representations). We then fit the three model RDMs together using LMEM with subject as the random effect (Author response image 3A). This approach is intended to minimize the influence of collinearity between RDMs on the results [13].

      Overall, the LMEM results (Author response image 3B-D) replicated the decoding results in the main text, with significant stimulus but not category representation in sPCS in Experiment 1, and marginally significant category representation in the same brain region in Experiment 2. These results further support the validity of our main findings and emphasize the contribution of stimulus representation independent of category representation.

      Author response image 3.

      Delineating stimulus and category effects using LMEM.  (A) Schematic illustration of this method. (B) Results for late epoch in Experiment 1, showing the fit of each model RDM. (C) Results for early epoch in Experiment 2. (D) Results for late epoch in Experiment 2.

      (3) Is it possible that this region of PFC is involved in categorization in particular and not 'control-demanding working memory'? 

      We thank the reviewer for raising this possibility. Cognitive control is generally defined as the process by which behavior is flexibly adapted based on task context and goals, and most theories agree that this process occurs within working memory [9, 10]. With this definition, we consider stimulus categorization to be a form of cognitive control, because participants need to adapt the stimulus based on the categorization rule in working memory for subsequent category judgements.  However, in the current study we only used one type of control-demanding working memory task (categorization) to test our hypothesis, and therefore it remains unclear whether the current results in sPCS can generalize to other types of WM control tasks.

      We have included a discussion on this issue on lines 572-575 in the revised manuscript.

      (4) Some of the figures could be refined to make them more clear:

      a.  Figure 4 b/c should have informative titles and y-axis labels.

      b.  Figure 5, the flexible vs fixed rule isn't used a ton up to this point - it would help to (also include? Replace?) with something like exp1/exp2 in the legend. It would also help to show the true & orthogonal rule encoding in these different regions (in C, or in a separate panel), especially to the extent that this is a proxy for stimulus encoding.

      c.  Figure 6: B and C are very hard to parse right now. (i) The y-axis on B could use a better label. (ii) It would be useful to include an inset of the relevant data panel from fMRI that you are reproducing. (iii) Why aren't there fixed rules for RNN1?

      We thank the reviewer for the suggestions and have updated the figures accordingly as following:

      Overall I think this is excellent - my feedback is mostly on interpretation and presentation. I think the work itself is really well done, congrats!

      References

      (1) Glasser, M.F., et al., A multi-modal parcellation of human cerebral cortex. Nature, 2016. 536(7615): p. 171-178.

      (2) Yu, Q. and Shim, W.M., Occipital, parietal, and frontal cortices selectively maintain taskrelevant features of multi-feature objects in visual working memory. Neuroimage, 2017. 157: p. 97-107.

      (3) Henderson, M.M., Rademaker, R.L., and Serences, J.T., Flexible utilization of spatial- and motor-based codes for the storage of visuo-spatial information. Elife, 2022. 11.

      (4) Christophel, T.B., et al., Cortical specialization for attended versus unattended working memory. Nat Neurosci, 2018. 21(4): p. 494-496.

      (5) Yu, Q. and Shim, W.M., Temporal-Order-Based Attentional Priority Modulates Mnemonic Representations in Parietal and Frontal Cortices. Cereb Cortex, 2019. 29(7): p. 3182-3192.

      (6) Li, S., et al., Neural Representations in Visual and Parietal Cortex Differentiate between Imagined, Perceived, and Illusory Experiences. J Neurosci, 2023. 43(38): p. 6508-6524.

      (7) Hu, Y. and Yu, Q., Spatiotemporal dynamics of self-generated imagery reveal a reverse cortical hierarchy from cue-induced imagery. Cell Rep, 2023. 42(10): p. 113242.

      (8) Lee, S.H., Kravitz, D.J., and Baker, C.I., Goal-dependent dissociation of visual and prefrontal cortices during working memory. Nat Neurosci, 2013. 16(8): p. 997-9.

      (9) Miller, E.K. and Cohen, J.D., An integrative theory of prefrontal cortex function. Annu Rev Neurosci, 2001. 24: p. 167-202.

      (10) Badre, D., et al., The dimensionality of neural representations for control. Curr Opin Behav Sci, 2021. 38: p. 20-28.

      (11) Flesch, T., et al., Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron, 2022. 110(7): p. 1258-1270 e11.

      (12) Wang, X.J., Theory of the Multiregional Neocortex: Large-Scale Neural Dynamics and Distributed Cognition. Annu Rev Neurosci, 2022. 45: p. 533-560.

      (13) Bellmund, J.L.S., et al., Mnemonic construction and representation of temporal structure in the hippocampal formation. Nat Commun, 2022. 13(1): p. 3395.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Weaknesses:

      One potential weakness of the paper is that the methodology could be clearer, especially in how different cells were used for various electrophysiological measurements and the conditions under which the recordings were made. Clarifying these points would improve the study's rigor and make the results easier to interpret.

      Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generated recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      Weaknesses:

      The use of spine density and shape characteristics is performed from an extremely limited sample (2 individuals). How reflective these data are of the population is not possible to interpret. Furthermore, these data assume that spines fall into discrete types - which is an increasingly controversial assumption.

      Many data are shown according to somewhat arbitrary age ranges. It would have been more informative to plot by absolute age, and then perform more rigourous statistics to test age-dependent effects.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human life and implications for how different neuronal properties may influence neurological conditions.

      Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of the human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in morphological and electrophysiological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      We are grateful for the positive evaluation of our work. We also thank the reviewers for their comments and believe that our manuscript has improved significantly with their help. In addition to the reviewer’s suggestions for improvement, further cell reconstructions were performed to make the anatomical data more robust (n = 1,2,3,3,4,3,2 additional reconstruction in age groups infant, early childhood, late childhood, adolescence, young adulthood, middle adulthood and late adulthood, respectively; Σn = 18). Four additional cells were added to the spine analysis and the statistics associated with each additional dataset were updated.

      I have some comments, particularly regarding the methodology and data presentation, to improve the clarity of the paper

      (1) I assume the tissue is from the resected area adjacent to the tumor. Could you please clarify this in the Methods section?

      Thank you for this comment, it has been clarified in the Methods section with the following sentence: “We used human cortical tissue adjacent to the pathological lesion  that had to be surgically removed from patients (n = 63 female  n = 45 male) as part of the treatment for tumors, hydrocephalus, apoplexy, cysts, and arteriovenous malformation.”

      (2) Regarding the presentation of data in the Methods section, could you please clarify whether the authors used different cells for measuring the various electrophysiological properties? The number of recorded cells for calculating subthreshold properties (e.g., late adulthood: n = 113) differs from the number the cells used for calculating suprathreshold properties (e.g., late adulthood: n = 83). If this is the case, it may make it difficult to compare the electrophysiological properties. Could you please clarify this?

      The different element numbers are indeed due to the fact that different quality criteria were defined for the analysis of fast and slow signals. For the analysis of fast signals (e.g. AP half-width, AP upstroke velocity, AP amplitude), higher quality requirements were established therefore cells with high series resistance (> 30 MΩ) were excluded. We have updated and clarified the recording conditions in the text, figures, and methodology section accordingly.

      (3) Additionally, they mentioned that their recordings were done at zero holding current and at more than -50 pA. Could you clarify whether the data from these two sets of experiments were combined? If so, please provide an explanation in the methods section.

      Basically, we wanted to determine the parameters of the potential changes of the membrane at rest. However, for technical reasons related to the biological amplifier, in some of the experiments a certain continuous holding current may be present during the measurement (3.5% of all experiments). The holding currents were in the range of -50 pA to +60 pA. Within this range, previously checked on mouse neurons we have not found linear correlation between the electrophysiological properties and the holding current. This is reported in the Methods section.

      (4) This section needs revision. It is unclear why different series resistances (Rs) or different cells were used to compute various electrophysiological properties." To calculate passive membrane properties (resting membrane potential, input resistance, time constant, and sag) either cells with series resistance (Rs): 22.85 {plus minus} 9.04 MΩ (ranging between -4.55 MΩ and 56.76 MΩ) and 0 pA holding current (n = 154), or cells with holding current > -50 pA (-7.46 {plus minus} 28.56 pA, min: -49.89 pA, max: 59.68pA) and Rs < 30 MΩ (18.96 {plus minus} 6.48 MΩ) (n = 23) were used. For the analysis of high frequency action potential features (AP half-width, AP up-stroke velocity, AP amplitude and rheobase) cells with Rs < 30 MΩ (n = 331 cells with Rs 19.2 {plus minus} 6.6 MΩ) and holding current > -50pA (n = 308 with 0 pA holding current and Rs: 19.22 {plus minus} 6.59 MΩ, n = 23 withholding current: -7.46 {plus minus} 28.56 pA and Rs: 18.96 {plus minus} 6.48 MΩ) were used."

      To make the chapter clearer, we simplified the cell groups used to analyse the different electrophysical properties and revised the Method section as follows: “For the analysis of the electrophysiological recordings n = 457 recordings with a series resistance (Rs) of 24.93 ± 11.18 MΩ (max: 63.77 MΩ) were used. For the analysis of fast parameters related to the action potential (AP half-width, AP upstroke velocity, AP amplitude and rheobase), higher quality requirements were set and cells with Rs > 30 MΩ were excluded. This reduced the data set to n = 331 cells with Rs 19.42 ± 6.2 MΩ.”

      (5) The authors recorded the sag ratio using a -100 pA injected current. Is there a technical reason why they did not inject more than -100 PA?

      There is no particular technical reason, we use similar to others this current amplitude for voltage response recordings over the years to record electrophysiological traces.

      (6) In the abstract, the authors mentioned that data were recorded from ages 1 month to 85 years. However, in the results, they stated that data were recorded from ages 0 to 85 years. Could you please clarify this discrepancy?

      We corrected this discrepancy.

      (7) Additionally, the results mention that data were collected from 485 human cortical layer 2/3 (L2/3) pyramidal cells, but subthreshold membrane features such as resting membrane potential, input resistance, time constant (tau), and sag ratio were calculated in 475 cortical pyramidal cells from 99 patients. Could you please clarify these discrepancies? In the discussion "We recorded from n = 457 human cortical excitatory pyramidal cells from the supragranular layer from birth to 85 years"

      Thank you for pointing this out, we have corrected the error. Although our full data set contained 485 pyramidal cells, 28 recordings were excluded from the electrophysiological analysis and were used for morphological evaluation only, therefore 457 recordings were used for passive parameter measurements.

      (8) Regarding the distance from the pia to the border layer L1/L2, did the authors notice any differences across ages?

      To investigate whether the thickness of cortical layer 1 changes throughout life, we measured the L1 thickness and found no significant differences between age groups (P = 0.09, Kruskal-Wallis test) (Author response image 1).

      Author response image 1.

      Thickness of cortical layer 1 at different life stages. (A) Boxplot shows the thickness of layer 1. (B) Scatter plot shows the distribution of L1 thickness measured on the reconstructed cells. Age is shown in years on a logarithmic scale, dots are color-coded according to the corresponding age groups.

      (9) I am not sure why they referred to the data as layer 2/3 when most of the data, based on Figure 1E, were recorded from a distance of 0-200 µm from the L1/L2 border. Could it be that there is no significant depth-dependent variation in electrophysiological properties, as reported by Berg (2021), Kalmbach (2018), and Chameh (2021)?

      Although the vast majority of our data comes from a distance of less than 200 μm from the L1/L2 border, we cannot neglect the fact that our dataset also contains a small number of cells deeper than this, which are layer 3 cells. Apart from some differences shown in Supplementary Figures 7-9, we found no general difference between cells located at a distance of less than 200 μm and more than 200 μm from the L1 border.

      (10) In Figure 1, there is variability in resting membrane potential (RMP), tau, and input resistance (IR) within the infant age group. However, this trend is not observed in the sag ratio. Could you please discuss this finding?

      The large variance in the data is due to dramatic changes in these three parameters during the first year of life. Supplementary Figure 3 shows the comparisons of parameter distributions of patients between 0-6 months and 6-12 months. The sag amplitude in these cells is generally low therefore no such large changes could have occurred in them.

      (11) Did the authors use a K-Nearest Neighbors (KNN) test to assess the accuracy of the infant cluster in Figure 3F?

      Based on eight electrophysiological features of the cells (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude), the infant pyramidal cells on a UMAP form a distinct group (Author response image 2A) represented by cluster 4 on Author response image 2B. When calculating the sum of the Euclidean distances of cells within the cluster from the centroid, the isolated infant group (cluster 4) shows the smallest distance value from the centroid (cluster 1: 40.2, cluster 2: 36.21, cluster 3: 39.96, cluster 4: 5.72, cluster 5: 39.2, cluster 6: 55.74, cluster 7: 54.27), demonstrating that infant cells create a discrete cluster distinct from other age groups (Author response image 2B).

      Author response image 2.

      (A) Uniform Manifold Approximation and Projection (UMAP) of 8 selected electrophysiological properties (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude) with data points for 331 cortical L2/3 pyramidal cells, colored with the corresponding age groups. (B) UMAP colored by k-means clustering with 7 clusters, red crosses represent the centroids of the clusters.

      (12) Missing citation: 'Previous research has shown that the biophysical properties of human pyramidal cells show depth-related correlations throughout L2/3 (Berg et al., 2021).' Please include citations for Kalmbach (2018) and Chameh (2021).

      We thank for the additional references, these studies are now cited.

      (13) Have they noticed any morphological properties differences among the different cortical lobes (Parietal, Temporal, Frontal, and Occipital). It would be beneficial to present this data, especially since they have a sufficient sample size from each cortical lobe.

      The majority of our data set on the morphological properties of pyramidal cells comes from the parietal (n = 17 cells) and temporal lobe (n = 15). We found no significant differences in the morphological properties of cells from these two brain regions and no differences between age groups in the same cortical lobes.

      (14) Have the authors found differences in spine characteristics among different cortical areas, as reported previously by 10.1023/a:1024134312173).

      We found morphological differences in dendritic spines in the different brain regions, yet, our data are limited to draw definitive conclusions.

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) I believe that these data presented in all main text figures would be more intuitive to be plotted on a log(age) scale, such as shown in supplementary Figure 13. The bounds of the ages used for different groups, as summarised in Figure 1 feel somewhat arbitrary.

      Recent neuroscientific studies on postnatal ageing mainly use the age-group comparison format (Kang 2011, Bethlehem 2022), which has been defined based on milestones in the cognitive, motor, social-emotional, and language/communications domains of observable behaviour (Zubler et al. 2022, for detailed definitions see Kang 2011). Since many parameters do not vary linearly but take a U-shape (or inverted U-shape), statistical quantification of these is not straightforward, so we would retain the age-group format for the main graphs. However, at the reviewer's suggestion, electrophysiological and morphological parameters are presented on a log(age) scale as supplementary figures (Supplementary Figures 2,4 and 6), also further statistical analysis was also carried out without grouping the data (see response 5).

      (2) The authors present a lot of data values in the text, which is also shown in the figures. This makes reading of the manuscript somewhat difficult in places. For brevity, it may be best to present this data as supplementary tables.

      Thank you for this suggestion. We have inserted these data as tables.

      (3) I am unclear why the authors excluded cells that fired doublets or triplets in Figure 4? Were these included in the passive and AP-specific analysis - but excluded from F-I plots? Please clarify the rationale and the relative abundance of these physiological types based on age - one might predict that more initial-burst firing types are associated with older neurons?

      Thank you for drawing attention to this anomaly. We have updated the figures and text by adding the cells with initial burst firing. These cells are also included in the analysis of passive and action potential properties. In our overall dataset, 6.78% of cells show burst firing; infant: 0%, early childhood: 3.57% (1 cell), late childhood: 0%, adolescence: 11.11% (6 cells), young adulthood: 10.11% (9), middle adulthood: 10.71% (6 cells), late adulthood: 7.96 (9 cells) of all cells including the age groups.

      (4) The statistical analyses performed in Figure 6 are not justified. From the authors' description of these data, they derive spine density measurements from 1 infant and 1 aged adult, then perform pseudoreplicated analysis in these individuals. These data would require greater replication from infant and aged groups - with the possible inclusion of a younger adult group also. It would be ideal to have n=3/age group to allow robust statistical analysis.

      Thank you for this point. Accordingly, we have expanded our data set to include n = 3 infant pyramidal cells (83 days old, from one patient) and n = 3 pyramidal cells from three late adulthood patients (64.3 ± 2.08 years old).

      (5) Given the high number of individuals and replicates throughout this manuscript, a more circumspect approach to statistics would be appreciated, e.g. a generalised linear mixed effects model - with age as a fixed effect and sex, patient, etc as random effects. This may reveal the greatest statistical power of these important and rich data.

      Of the generative models we used the Generalized Additive Mixed Model (GAMM) to describe the relationship between age and the various passive and active electrophysiological features. We defined age with cubic spline smoothing term as the fixed effect and gender, brain area, surgical procedure, and hemisphere as random effects. With GAMM we found that the age-dependent correlation of the examined parameters (resting membrane potential, input resistance, tau, sag ratio, rheobase current, AP half-width, AP up-stroke velocity, AP amplitude, first AP latency, adaptation) was significant, except for F-I slope, described by the model incorporating the four random effects.  We also observed correlation with gender, brain area, hemisphere, and surgical procedure in various intrinsic properties. The Author response table 1 below shows the statistical values of GAMM and the statistical tests used in the manuscript to compare.

      Author response table 1.

      Statistical significance of patient attributes *In the pairwise comparison, the age of cells in the two groups was significantly different: female (subthreshold: 37.36 ± 26.25 years old, suprathreshold: 38.3 ± 25.6 y.o.) - male (subthreshold: 24.86 ± 23.7 y.o., suprathreshold: 25.7 ± 23.93 y.o.), subthreshold: P = 1.96*10-6, suprathreshold: P = 3.25*10-5 Mann-Whitney test. **In the pairwise comparison, the age of cells in the two groups was significantly different: surgical procedure: tumor removal (subthreshold: 33.72 ± 24.33 y.o., suprathreshold: 36.43 ± 27.07 y.o.) - VP shunt (subthreshold: 27.38 ± 29.69 y.o., suprathreshold: 27.07 ± 29.37 y.o.) subthreshold: P = 3.68*10-3, suprathreshold: P = 1.64-10-3, Mann-Whitney test)

      (6) Regarding the morphological diversity of dendritic spines. There is some debate in the field as to whether the distinction of specific dendritic spine types - as conveyed in this manuscript - are true subtypes or reflect a continuum of diverse morphology (see Tønneson et al., 2014 Nature Neuroscience). It is appreciated that the approach taken by the authors is the dogma within the field - however, dogma should continue to be challenged. Given that the authors have used DAB labelling combined with light microscopy, the possibility of accurately measuring spine morphology required for determining this continuum is extremely limited (e.g. Li et al., (2023) ACS Chemical Neuroscience). I would suggest that alongside the inclusion of further replicates for their spine analysis, the authors tone down their discussion of spine subtypes given the absence of any synaptic data presented in this current study to support the maturation (or otherwise) of dendritic spine synapses.

      Many thanks to the reviewer for this comment. We agree with the drawbacks of our method for testing spine categorization. To increase the reliability of our results, we increased the number of pyramidal cells in the infant and late adult groups. We also revised the figure and as suggested by Reviewer#3 added photos of spines to each category in addition to schematic drawings to give an impression of the phenotype. In the discussion, we only address the differences between two readily separable mushroom and filopodial forms and highlight results that only confirm findings already known in the literature. Although the concerns are valid, we apply the sentence from the above Li et al. (2023) reference “...the most sophisticated equipment may not always be necessary for answering some research questions”. We believe that it is worth sharing our data and the somewhat subjective grouping, which we hope to report in more detail in the future.

      Minor

      (1) The order of the supplemental materials is out of order with their introduction in the text. These should be revised to reflect the order mentioned in the text.

      Thank you for your comment, we have corrected the order of the supplementary figures.

      (2) In Supplementary Figure 13, it would be informative to include some form of linear regression to confirm whether an age-dependent effect on neuronal morphology exists.

      We have added linear regression to the figure.

      (3) Figure 3D = should this be AP - not Ap?

      Thank you for drawing attention to this, we have corrected the incorrect typing on the figure.

      (4) For UMAP analysis in Figure 3, please provide a table of the features that were used for the 32 & 8-parameter UMAPs respectively.

      We have added a table to the Materials and methods section of all the electrophysiological features included in the UMAP.

      (5) For morphology, please include pia and L1/2 border for reconstructions shown for clarity.

      We indicated both the pia mater and the L1/2 border on the figure showing all the reconstructions (Supplementary Figure 10).

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) Data were obtained from different cortical areas of human patients of different ages. The electrophysiological characteristics were largely independent of other attributes such as disease, gender, and cortical areas (Supplementary Figure 2). To support the conclusion that age is one of the key attributes responsible for change, a similar morphological analysis would be necessary for gender.

      We updated the text and the supplementary section with Supplementary Figures 18-21. to determine if age-related differences in biophysical characteristics are affected by the patient's gender.

      (2) 'mushroom-shaped, thin, filopodial, branched, and stubby spines'

      Show photographs of individual typical spine types to make the classification easier to understand.

      To make the classification more understandable, we have updated the corresponding figure (Figure 6) with representative photos of the dendritic spine types.

      (3) Some electrophysiological parameters of the infant group showed higher deviations compared to other age groups. A UMAP (Supplementary Figure 2) shows that some infant neurons form a small cluster, while other infant neurons are scattered with neurons of other ages. Are there any differences between infant neurons in the small cluster and other infant neurons with respect to attributes other than age?

      For most of the electrophysiological parameters, the infant age group showed age-dependent variability, as illustrated in Supplementary Figures 3, 2,4 and 6 . The small group of infant cells is not clustered by gender, brain region, or medical condition, as shown in Supplementary Figure 5.

      (4) A recent paper (Benavides-Piccione et al. 2024, doi:10.1093/cercor/bhae180) reported that some morphological parameters of human layer 3 neurons differ between occipital and temporal regions. Area-dependent morphological differences have been also reported in non-human primates. Discussion of potential contradictions may therefore be requested.

      Most of the cells we reconstructed originated from the parietal and temporal regions (parietal: n = 20, temporal: n = 23, frontal: n = 15, occipital: n = 5). We found no differences in morphological features between these two regions, and we also found no significant differences when we compared the cells from the same brain regions by age group.

      (5) L2/3 cells of rodents are morphologically differentiated according to cortical depth. If individual L2/3 cells of humans are less differentiated than those of rodents, this point should be discussed.

      Depth-related morphological heterogeneity has already been reported previously (Berg 2021), however, our dataset on the morphological characteristics of pyramidal cells is from the upper L2/3 region, with their soma located at a distance of 117.85 ± 65.3 μm (between: 11.05 and 243.3 μm) from the L1/L2 border. Therefore, we cannot conclude from our data whether humans are less differentiated than rodents.

      Minor:

      (1) Cell body morphology may affect electrophysiological properties. However, morphological quantification of cell bodies has not been reported. It may be added.

      In our DAB-labeled samples, we could not perfectly measure the total volume of the cell body in the reconstructions, therefore our measurements regarding the soma morphology are not shown in the manuscript. When comparing the cell body area of the middle sections of the soma of the reconstructed cells between the age groups, we found no significant differences (P = 0.082, Kruskal–Wallis test).

      (2) 'The adaptation of the AP frequency response'

      Describe how this parameter was obtained.

      The adaptation of the AP frequency response or adaptation was calculated as the average adaptation of the interspike interval between consecutive APs.

      (3) 'we excluded cells showing initial duplet or triplet action potential bursts'

      Why were the burst cells excluded from the analysis?

      We have modified the figures and text to include cells with initial burst firing.

      (4) Electrophysiological characteristics to be analyzed:

      Spike thresholds and afterhyperpolarizations

      We found age-related differences in the amplitude of the afterhyperpolarization (P = 2.56*10<sup>-30</sup>, Kruskal-Wallis test) and in the threshold of the action potential (P = 5.24*10<sup>-12</sup>, Kruskal-Wallis test) (Author response image 3).

      Author response image 3.

      Age-dependence of afterhyperpolarization and AP threshold. (A-B) Boxplots show the differences in afterhyperpolarization (AHP) amplitude (A) and AP threshold (B) between age groups. Asterisks indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001, Kruskal-Wallis test with post-hoc Dunn test). (C-D) Scatter plots show AHP amplitude (C) and AP threshold (D) across the lifespan. Age is shown on a logarithmic scale, dots are colored according to the corresponding age group.

      (5) 'We identified and labeled each spine on n = 2 fully 3D-reconstructed cells'

      To which cortical area do these cells belong?

      At what depths are they distributed?

      Is it possible to report the number of spines, in addition to the density per unit length?

      We increased the number of cells in which we analyzed dendritic spine density. The data shown in Figure 6. are from pyramidal cells from an infant patient (n = 3 from a single patient) and late adulthood patients (n = 3 from 3 patients) (Supplementary Figure 13). The infant cells are from the same patient, the sample is from the right parietal lobe, and the patient is 83 days old. The older cells are from three different patients (#1: 65 years old, right temporal lobe; #2: 66 years old, right parietal lobe; #3: 62 years old, right frontal lobe). Infant cells are located 144.43 ± 45.26 µm (#1: 109.3, #2: 128.49, #3: 195.5 µm), late adult cells 161.22 ± 66.22 µm (#1: 183.5, #2: 213.42, #3: 86.73 µm) from the L1/2 border. We provide the number of spines in an additional supplementary table (Supplementary table 2.).

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful review of our manuscript and the constructive comments. We have addressed the majority of comments with either new experiments, analyses, and/or text revisions. A summary of the major changes is listed below, followed by our point-by-point responses to the reviewer comments.

      Major changes:

      (1) We sought to gain insight into the potential mechanistic cause of the increased intrinsic excitability of Cntnap2<sup>-/-</sup> dSPNs. Given that Kv1.1 and 1.2 potassium channels are known to interact with Caspr2 (the protein encoded by Cntnap2), we hypothesized that altered number, location, and/or function of these channels may underlie the excitability change in these cells. To investigate this, we performed new analyses of the initial dataset to assess action potential (AP) properties known to be impacted by potassium channel function. Indeed, we found that AP frequency was increased, and rheobase current, AP latency and AP threshold were decreased in Cntnap2<sup>-/-</sup> dSPNs, suggestive of altered Kv1.2 function. These data are in the new Supplemental Fig. 4. We also performed new electrophysiology experiments in which we pharmacologically blocked Kv1.1 and 1.2 to assess whether the effects of blocking these channels would be occluded in Cntnap2<sup>-/-</sup> dSPNs. We found that 1) WT dSPNs responded to blockade of Kv1.1/1.2 channels by increasing their excitability but Cntnap2<sup>-/-</sup> dSPNs did not and 2) Kv1.1/1.2 channels were more important contributors to the excitability of dSPNs compared to iSPNs. These new data are presented in the revised Fig. 4 and Supplemental. Figs. 5 and 6.

      (2) We performed additional experiments to assess excitatory synaptic properties, specifically AMPA/NMDA receptor ratio. This has been added to Fig. 1.

      (3) We performed more rigorous statistical analyses of the initial physiology datasets to align with the statistics performed for the revision experiments. This applies to Fig. 1, Fig. 2, Fig. 3, Fig. 5, and Supp. Fig. 2.

      (4) In the discussion section, we now highlight potential limitations of the study and further discuss the variable impact that Cntnap2 loss has on different cell types and brain regions.  

      Reviewer #1 (Public Review):

      Summary:

      Cording et al. investigated how deletion of CNTNAP2, a gene associated with autism spectrum disorder, alters corticostriatal engagement and behavior. Specifically, the authors present slice electrophysiology data showing that striatal projection neurons (SPNs) are more readily driven to fire action potentials in response to stimulation of corticostriatal afferents, and this is due to increases in SPN intrinsic excitability rather than changes in excitatory or inhibitory synaptic inputs. The authors show that CNTNAP2 mice display repetitive behaviors, enhanced motor learning, and cognitive inflexibility. Overall the authors' conclusions are supported by their data, but a few claims could use some more evidence to be convincing.

      Strengths:

      The use of multiple behavioral techniques, both traditional and cutting-edge machine learning-based analyses, provides a powerful means of assessing repetitive behaviors and behavioral transitions/rigidity.

      Characterization of both excitatory and inhibitory synaptic responses in slice electrophysiology experiments offers a broad survey of the synaptic alterations that may lead to increased corticostriatal engagement of SPNs.

      Weaknesses:

      (1) The authors conclude that increased cortical engagement of SPNs is due to changes in SPN intrinsic excitability rather than synaptic strength (either excitatory or inhibitory). One weakness is that only AMPA receptor-mediated responses were measured. Though the holding potential used for experiments in Figure 1FI wasn't clear, recordings were presumably performed at a hyperpolarized potential that limits NMDA receptormediated responses. Because the input-output experiments used to conclude that corticostriatal engagement of SPNs is elevated (Figure 1B-E) were conducted in the current clamp, it is possible that enhanced NMDA receptor engagement contributed to increased SPN responses to cortical stimulation. Confirming that NMDA receptor-mediated EPSC components are not altered would strengthen the main conclusion.

      The reviewer is correct, the initial optically-evoked EPSC assessments were performed at a hyperpolarized potential (-70mV), thus measuring primarily AMPAR-mediated currents. We agree that assessing potential changes in the NMDAR-mediated EPSC component is important and we have completed new experiments to assess this. We find no differences in NMDAR-mediated EPSCs assessed at +40mV or the AMPA:NMDA ratio.

      These results have been added to Fig. 1. An expanded analysis of these results is shown in Author response image 1. We note that the previous AMPAR-mediated EPSC results have been replicated in this additional experiment, again showing no change in Cntnap2<sup>-/-</sup> SPNs. 

      Author response image 1.

      AMPA and NMDA receptor-mediated EPSCs are unchanged in Cntnap2<sup>-/-</sup> SPNs. (A) Quantification (mean ± SEM) of AMPA:NMDA ratio per cell for Cntnap2<sup>+/+</sup> and Cntnap2<sup>-/-</sup> dSPNs, p=0.9537, MannWhitney test. (B) dSPN AMPA current per cell, p=0.6172, Mann-Whitney test. (C) dSPN NMDA current per cell, p=0.6009, Mann-Whitney test. (D) dSPN AMPA:NMDA ratio averaged by animal, p=0.8413, Mann-Whitney test. (E) dSPN AMPA current averaged by animal, p>0.9999, Mann-Whitney test. (F) dSPN NMDA current averaged by animal, p=0.6905, Mann-Whitney test. (G) Quantification (mean ± SEM) of AMPA:NMDA ratio per cell for Cntnap2<sup>+/+</sup> and Cntnap2<sup>-/-</sup> iSPNs, p=0.4104, Mann-Whitney test. (H) iSPN AMPA current per cell, p=0.9010, Mann-Whitney test. (I) iSPN NMDA current per cell, p=0.9512, two-tailed unpaired t test. (J) iSPN AMPA:NMDA averaged by animal, p=0.3095, Mann-Whitney test. (K) iSPN AMPA current averaged by animal, p=>0.9999, Mann-Whitney test. (L) iSPN NMDA current averaged by animal, p=0.8413, MannWhitney test. All values were recorded using 20% blue light intensity. For dSPNs: Cntnap2<sup>+/+</sup> n=22 cells from 5 mice, Cntnap2<sup>-/-</sup> n=22 cells from 5 mice. For iSPNs: Cntnap2<sup>+/+</sup> n=21 cells from 5 mice, Cntnap2<sup>-/-</sup>n=21 cells from 5 mice.

      (2) Data clearly show that SPN intrinsic excitability is increased in knockout mice. Given that CNTNAP2 has been linked to potassium channel regulation, it would be helpful to show and quantify additional related electrophysiology data such as negative IV curve responses and action potential hyperpolarization.

      We appreciate this suggestion. As indicated by the reviewer, Caspr2, has previously been shown to control the clustering of Kv1-family potassium channels in axons isolated from optic nerve and corpus callosum (PMIDs: 10624965, 12963709, 29300891). In particular, Caspr2 is known to associate directly with Kv1.2 (PMID: 29300891). To assess a potential contribution of Kv1.2 to the excitability phenotype, we performed additional analyses of our original dataset to quantify AP properties known to be impacted by changes in Kv1.2 function (i.e. latency to fire and AP threshold, new Supp. Fig. 4). We identified several changes in Cntnap2<sup>-/-</sup> dSPNs resembling those that occur in wild-type cells when Kv1.2 is blocked (i.e. reduced threshold and reduced latency to fire, Supp. Fig. 4). 

      We then performed a pharmacological experiment, blocking Kv1.2 using α-dendrotoxin (α-DTX) while recording intrinsic excitability to assess whether the effects of this drug on dSPN excitability were occluded in Cntnap2<sup>-/-</sup> cells. Indeed, we found that while blocking Kv1.2 in wild-type dSPNs significantly reduced threshold and increased intrinsic excitability, these effects were not seen in Cntnap2<sup>-/-</sup> dSPNs (new Fig. 4). We believe that this suggests an altered contribution of Kv1.2 to the intrinsic excitability of mutant dSPNs, owing to a change in the clustering, number, or function of these channels. Therefore, loss-of-function of Kv1.2 is a likely explanation for the enhanced intrinsic excitability of Cntnap2<sup>-/-</sup> dSPNs. Interestingly, we found that α-DTX had only subtle effects on iSPNs (Cntnap2 WT or mutant), suggesting a lesser contribution of this channel in controlling the excitability of indirect pathway cells. This finding can account for the relatively stronger effect of Cntnap2 loss on dSPN physiology. The results of these new experiments and analyses are presented in the new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6. 

      (3) As it stands, the reported changes in dorsolateral striatum SPN excitability are only correlative with reported changes in repetitive behaviors, motor learning, and cognitive flexibility.

      We agree that we have not identified a causative relationship between the change in dorsolateral dSPN excitability and the behaviors that we measured in Cntnap2<sup>-/-</sup> mice. That said, in a previous study, we showed that selective deletion of the autism spectrum disorder (ASD) risk gene Tsc1 from dorsal striatal dSPNs resulted in increased corticostriatal drive and this was sufficient to increase rotarod motor learning (PMID: 34380034). Therefore, while we have not demonstrated causality in this study, we hypothesize that changes in dSPN excitability are likely to contribute to the behavioral phenotypes observed in Cntnap2<sup>-/-</sup> mice. 

      Reviewer #2 (Public Review):

      Summary:

      This is an important study characterizing striatal dysfunction and behavioral deficits in Cntnap2<sup>-/-</sup> mice. There is growing evidence suggesting that striatal dysfunction underlies core symptoms of ASD but the specific cellular and circuit level abnormalities disrupted by different risk genes remain unclear. This study addresses how the deletion of Cntnap2 affects the intrinsic properties and synaptic connectivity of striatal spiny projection neurons (SPN) of the direct (dSPN) and indirect (iSPN) pathways. Using Thy1-ChR2 mice and optogenetics the authors found increased firing of both types of SPNs in response to cortical afferent stimulation. However, there was no significant difference in the amplitude of optically-evoked excitatory postsynaptic currents (EPSCs) or spine density between Cntnap2<sup>-/-</sup> and WT SPNs, suggesting that the increased corticostriatal coupling might be due to changes in intrinsic excitability. Indeed, the authors found Cntnap2<sup>-/-</sup> SPNs, particularly dSPNs, exhibited higher intrinsic excitability, reduced rheobase current, and increased membrane resistance compared to WT SPNs. The enhanced spiking probability in Cntnap2<sup>-/-</sup> SPNs is not due to reduced inhibition. Despite previous reports of decreased parvalbumin-expressing (PV) interneurons in various brain regions of Cntnap2<sup>-/-</sup> mice, the number and function (IPSC amplitude and intrinsic excitability) of these interneurons in the striatum were comparable to WT controls.

      This study also includes a comprehensive behavioral analysis of striatal-related behaviors. Cntnap2<sup>-/-</sup> mice demonstrated increased repetitive behaviors (RRBs), including more grooming bouts, increased marble burying, and increased nose poking in the holeboard assay. MoSeq analysis of behavior further showed signs of altered grooming behaviors and sequencing of behavioral syllables. Cntnap2<sup>-/-</sup> mice also displayed cognitive inflexibility in a four-choice odor-based reversal learning assay. While they performed similarly to WT controls during acquisition and recall phases, they required significantly more trials to learn a new odor-reward association during reversal, consistent with potential deficits in corticostriatal function.

      Strengths:

      This study provides significant contributions to the field. The finding of altered SPN excitability, the detailed characterization of striatal inhibition, and the comprehensive behavioral analysis are novel and valuable to understanding the pathophysiology of Cntnap2<sup>-/-</sup> mice.

      Weaknesses:

      (1) The approach based on Thy-ChR2 mice has the advantage of overcoming issues caused by injection efficiency and targeting variability. However, the spread of oEPSC amplitudes across mice shown in panels of Figure 1 G/I is very high with almost one order of magnitude difference between some mice. Given this is one of the most important points of the study it will be important to further analyze and discuss what this variability might be due to. Typically, in acute slice recordings, the within-animal variability is larger than the variability across animals. From the sample sizes reported it seems the authors sampled a large number of animals, but with a relatively low number of neurons per animal (per condition). Could this be one of the reasons for this variability?

      We agree with the reviewer that the variability in these experiments is quite large. We have replicated these experiments in the process of performing AMPA:NMDA ratio recordings (see above response to Reviewer 1’s comment). We again find no differences in AMPAR-mediated EPSC amplitude between WT and mutant SPNs (Author response image 2). Notably, these experiments also demonstrate a large amount of variability. In the original dataset, a small number of cells were collected from each animal (~1-3 cells/mouse). However, the variability remains in the new dataset, in which more cells were collected from each animal (~4-6 cells/mouse). We find both withinanimal and between-animal variability, as can be seen in Author response image 2 (recordings made from the same animal are color-coordinated). Potential sources of variability in this experiment include: 1) variable expression of ChR2 per mouse, 2) variable innervation of ChR2-expressing terminals onto any given recorded cell, and/or 3) differences in prior plasticity state between cells (i.e. some neurons may have recently undergone corticostriatal LTP or LTD). 

      Author response image 2.

      Optically-evoked AMPAR EPSCs exhibit within- and between-animal variability. (A) Quantification of EPSC amplitude evoked in dSPNs at different light intensities from the original dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=17 cells from 8 mice, Cntnap2<sup>-/-</sup> n=13 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 56) = 0.3879, geno F (1, 28) = 0.8098, stim F (1.047, 29.32) = 76.56. (B) Quantification of EPSC amplitude evoked in dSPNs, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=8 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 22) = 0.2154, geno F (1, 11) = 0.2585, stim F (1.053, 11.58) = 49.68. (C) Quantification of EPSC amplitude in dSPNs from the revision dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=22 cells from 5 mice, Cntnap2<sup>-/-</sup> n=22 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 84) = 0.01885, geno F (1, 42) = 0.002732, stim F (1.863, 78.26) = 20.93. (D) Quantification of EPSC amplitude in dSPNs from the revision dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=5 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 16) = 0.06288, geno F (1, 8) = 0.006548, stim F (1.585, 12.68) = 16.97. (E) Quantification of EPSC amplitude evoked in iSPNs from the original dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=13 cells from 6 mice, Cntnap2<sup>-/-</sup> n=11 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 44) = 0.9414, geno F (1, 22) = 1.333, stim F (1.099, 24.18) = 52.26. (F) Quantification of EPSC amplitude evoked in iSPNs from original dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=6 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 18) = 0.4428, geno F (1, 9) = 0.5635, stim F (1.095, 9.851) = 23.82. (G) Quantification of EPSC amplitude evoked in iSPNs from the revision dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=21 cells from 5 mice, Cntnap2<sup>-/-</sup> n=21 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 80) = 0.04134, geno F (1, 40) = 0.007025, stim F (1.208, 48.31) = 102.9. (H) Quantification of EPSC amplitude evoked in iSPNs from the revision dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=5 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 16) = 0.001865, geno F (1, 8) = 0.1004, stim F (1.179, 9.433) = 61.31.

      (2) This is particularly important because the analysis of corticostriatal evoked APs in panels C and E is performed on pooled data without considering the variability in evoked current amplitudes across animals shown in G and I. Were the neurons in panels C/E recorded from the same mice as shown in G/I? If so, it would be informative to regress AP firing data (say at 20% LED) to the average oEPSC amplitude recorded on those mice at the same light intensity. However, if the low number of neurons recorded per mouse is due to technical limitations, then increasing the sample size of these experiments would strengthen the study.

      We appreciate this point; however, the evoked AP experiment and the evoked EPSC experiment were performed on different mice, so it is not possible to correlate the data across experiments. While the evoked AP experiments were performed using potassium-based internal, we used a cesium-based internal to measure AMPAR-mediated EPSCs to more accurately detect synaptic currents. We note that the evoked AP experiments share a similar amount of variability as the evoked EPSC experiments, again possibly owing to variable expression of channelrhodopsin per mouse, variable innervation of ChR2-positive terminals onto individual cells, and/or differences in prior plasticity status between cells.  

      (3) On a similar note, there is no discussion of why iSPNs also show increased corticostriatal evoked firing in Figure 1E, despite the difference in intrinsic excitability shown in Figure 3. This suggests other potential mechanisms that might underlie altered corticostriatal responses. Given the role of Caspr2 in clustering K channels in axons, altered presynaptic function or excitability could also contribute to this phenotype, but potential changes in PPR have not been explored in this study.

      We have now performed more rigorous statistics on the data in Fig. 1 (repeated measures two-way ANOVA) such that the difference in corticostriatal evoked firing in Cntnap2<sup>-/-</sup> iSPNs no longer reaches statistical significance. This is consistent with the modest but statistically non-significant effect of Cntnap2 loss on iSPN intrinsic excitability. We agree with the reviewer that presynaptic alterations could potentially contribute to the changes in cortically-driven action potentials, especially as this experiment was performed without any synaptic blockers present, and Cntnap2 is deleted from all cells. That said, if changes in presynaptic release probability accounted for the increased corticostriatal drive, we would expect to see differences in cortically-evoked EPSCs onto SPNs. 

      While we can’t rule out the possibility of pre-synaptic changes, a straightforward explanation for our findings is that loss or alteration of Kv1.2 channel function is responsible for the increased excitability of Cntnap2<sup>-/-</sup> dSPNs, resulting in enhanced spiking in response to cortical input. Given the fact that Kv1.2 channels appear less important for regulating iSPN excitability (see new Fig. 4 and Supp. Fig. 6), this can explain the greater impact of Cntnap2 loss on dSPN physiology.

      (4) Male and female SPNs have different intrinsic properties but the number and/or balance of M/F mice used for each experiment is not reported.

      We agree that this is an important consideration. Author response table 1 provides the sex breakdown for the intrinsic excitability experiments. While we did not explicitly power the experiments to test for sex differences, Author response image 3 shows the data separated by sex and genotype for the intrinsic excitability experiments. Within genotype, we find no significant differences between males and females, except for Cntnap2<sup>-/-</sup> iSPNs which showed a significant interaction between sex and current step (Author response image 3F). Interestingly, while present in both sexes, the excitability shift of Cntnap2<sup>-/-</sup> dSPNs may be slightly more pronounced in females compared to males (Author response image 3C and D). However, this result would require further validation with a greater sample size.

      Author response table 1.

      Numbers of male and female mice used for the intrinsic excitability experiments.

      Author response image 3.

      Enhanced excitability of Cntnap2<sup>-/-</sup> dSPNs is present in both males and females. (A) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> males and females at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=12 cells from 4 mice, Cntnap2<sup>+/+</sup> females n=8 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 560) = 0.8992, sex F (1, 20) = 0.3754, current F (1.279, 25.57) = 56.85. (B) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>-/-</sup> males and females at different current step amplitudes. Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=11 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 588) = 0.6752, sex F (1, 21) = 0.04534, current F (2.198, 46.15) = 78.89. (C) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> males and Cntnap2<sup>-/-</sup> males at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 672) = 2.233, geno F (1, 24) = 3.746, current F (1.708, 40.98) = 79.82. (D) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> females and Cntnap2<sup>-/-</sup> females at different current step amplitudes. Cntnap2<sup>+/+</sup> females n=8 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=11 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 476) = 1.547, geno F (1, 17) = 5.912, current F (1.892, 32.17) = 58.76. (E) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> males and females at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=10 cells from 4 mice, Cntnap2<sup>+/+</sup> females n=12 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 560) = 1.236, sex F (1, 20) = 1.074, current F (2.217, 44.34) = 179.6. (F) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>-/-</sup> males and females at different current step amplitudes. Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=9 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 532) = 2.513, sex F (1, 19) = 2.639, current F (1.858, 35.31) = 152.5. (G) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> males and Cntnap2<sup>-/-</sup> males at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=10 cells from 4 mice, Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice. Repeated measures twoway ANOVA p values are shown; g x c F (28, 560) = 0.4723, geno F (1, 20) = 0.5675, current F (2.423, 48.47) = 301.7. (H) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> females and Cntnap2<sup>-/-</sup> females at different current step amplitudes. Cntnap2<sup>+/+</sup> females n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=9 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 532) = 1.655, geno F (1, 19) = 0.2322, current F (2.081, 39.55) = 99.45.

      (5) There is no mention of how membrane resistance was calculated, and no I/V plots are shown.

      Passive properties were calculated from the average of five -5 mV, 100 ms long test pulse steps applied at the beginning of every experiment. Membrane resistance was calculated from the double exponential curve fit. This has now been added to the methods section.

      (6) It would be interesting to see which behavior transitions most contribute to the decrease in entropy. Are these caused by repeated or perseverative grooming bouts? Or is this inflexibility also observed across other behaviors? The transition map in Figure S5 shows the overall number of syllables and transitions but not their sequence during behavior. Can this be analyzed by calculating the ratio of individual 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 factors across genotypes?

      We thank the reviewer for raising an insightful question. Here we use a finite state Markov chain model to describe the syllable transitions in animal behavior. To quantify the randomness in the system, we calculated the entropy of the Markov chain (see methods section). The reviewer suggested calculating the partial entropy of the transition matrix, which would allow us to estimate the contribution of a subset of states to the entropy of the whole system, given by the equation:

      The partial equation can indeed quantify the stochasticity, or “flexibility” in our context, of the sub-system containing only a subset of the behavior syllables. However, there are two main limitations to this approach:

      (1) The partial entropy fails to account for the transitions connecting the subset with the rest of the states in the system

      (2) The stationary distribution may not reflect the actual probabilities in the isolated sub-system S.

      Consequently, the partial entropy cannot be directly interpreted as the fraction of contributions from specific syllable pairs or sub-system to the entropy of the whole system. To be more specific, while a significant difference between the same sub-system in WT and KO groups could indicate that the sub-system contributes significantly to the difference of overall entropy, a non-significant result does not mean that the sub-system does not contribute to overall entropy difference, as interactions between the sub-system and other notconsidered states are not accounted for.

      Author response image 4.

      Grooming syllables contribute to some but not all differences in syllable transitions in Cntnap2<sup>-/-</sup> mice. We calculated the entropy of each syllable pair using 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 for every syllable pair and every animal. We then statistically tested the difference between genotypes for each syllable pair using Mann-Whitney tests. This plot displays those adjusted p-values for each syllable pair between WT and KO groups. The significant p-values suggest that the transitions to syllables 24 and 25 are different between genotypes (note that these correspond to grooming syllables, see Fig. 5N). However, since the overall entropy is a summation of every pair, it is difficult to conclude that syllables 24 and 25 are the sole contributors to the different entropy we observed.

      Reviewer #3 (Public Review):

      Summary:

      The authors analyzed Cntnap2 KO mice to determine whether loss of the ASD risk gene CNTNAP2 alters the dorsal striatum's function.

      Strengths:

      The results demonstrate that loss of Cntnap2 results in increased excitability of striatal projection neurons (SPNs) and altered striatal-dependent behaviors, such as repetitive, inflexible behaviors. Unlike other brain areas and cell types, synaptic inputs onto SPNs were normal in Cntnap2 KO mice. The experiments are welldesigned, and the results support the authors' conclusions.

      Weaknesses:

      The mechanism underlying SPN hyperexcitability was not explored, and it is unclear whether this cellular phenotype alone can account for the behavioral alterations in Cntnap2 KO mice. No clear explanation emerges for the variable phenotype in different brain areas and cell types.

      We agree that identifying the mechanism by which Cntnap2 loss affects intrinsic excitability is interesting and important. We have added experiments to address this and conclude that the improper clustering, number, or function of Kv1.2 channels in Cntnap2<sup>-/-</sup> dSPNs is likely responsible for their increased excitability. These channels are known to be clustered/organized in part by Caspr2 (PMIDs: 10624965, 12963709, 29300891), and Kv1.2 channels are known to play an important role in regulating excitability in SPNs (PMIDs: 13679409, 32075716). In the case of dSPNs, blocking these channels with α-DTX significantly increased the excitability of WT cells (as has been previously reported); however, this effect was occluded in mutant cells, perhaps owing to a decreased contribution of Kv1.2 channels to excitability in Cntnap2<sup>-/-</sup> dSPNs. In addition, we found that blockade of these channels with α-DTX only modestly affected the excitability of iSPNs. Therefore, this can explain why loss of Cntnap2 more strongly affects the excitability of dSPNs. Please see new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6 for these new data. 

      We agree with the reviewer that we have not identified a causative relationship between the change in dSPN excitability and the behavioral alterations in Cntnap2<sup>-/-</sup> mice. This is a limitation of the study. 

      It is interesting to speculate on the root of the varying impacts to excitability that occur across different brain regions and cell types in Cntnap2<sup>-/-</sup> mice. Increased excitability, as we see in dSPNs, has been identified in cerebellar Purkinje cells and L2/3 pyramidal neurons in somatosensory cortex in the context of Cntnap2 loss (PMIDs: 34593517, 30679017, 36793543). However, other cell types in Cntnap2<sup>-/-</sup> mice have exhibited no change in excitability (mPFC, L2/3 pyramidal neurons, PMID: 31141683) or hypoexcitability (subset of L5/6 pyramidal neurons, PMID: 29112191). While all of these cell types express Kv1.2 channels, they fundamentally vary in their intrinsic properties, owing to the role that other ion channels play in membrane excitability. As a result, loss of Cntnap2 is expected to have a variable effect on excitability depending on the cell type and the complement of other ion channels that are present. In addition, an initial change in excitability may drive secondary, potentially compensatory, changes in other channels that lead to a different excitability state. These changes are also expected to be cell type-specific. We do note that both of the cell types that show increased excitability in the context of Cntnap2 loss have been shown to exhibit an α-DTX-sensitive Kv1 channel current, such that application of α-DTX results in increased firing of these cells (cerebellar Purkinje cells; PMIDs: 17087603, 16210348 and L2/3 pyramidal neurons in somatosensory cortex; PMID: 17215507). These findings are consistent with our results in Cntnap2<sup>-/-</sup> dSPNs. 

      Reviewer #1 (Recommendations For The Authors):

      More thorough analysis of some of the manually quantified behaviors would be helpful. For example, only the grooming bout number was presented- what about the duration of bouts and total time grooming? Similarly, for the open field the number of center entries was reported but what about the total time in the center?

      We have quantified the time spent grooming and total time spent in the center during the open field test from our original data (Author response image 5). These data were not originally included in the manuscript because they were recorded for only a subset of the total animals. For each of these measures we find trend level changes, which are consistent with the primary measures reported in the main manuscript. 

      Author response image 5.

      Time in center and time spent grooming trend towards an increase in Cntnap2<sup>-/-</sup> mice.  (A) Quantification (mean ± SEM) of total time spent in the center of the open field during a 60 minute test, p=0.0656, Mann-Whitney test. (B) Time spent grooming during the first 20 minutes of the open field test, p=0.0611, Mann-Whitney test. For both measurements, Cntnap2<sup>+/+</sup> n=18 mice, Cntnap2<sup>-/-</sup> n=19 mice.

      Reviewer #3 (Recommendations For The Authors):

      What accounts for the hyperexcitability observed in Cntnap2-deficient SPNs? The authors noted that excitability is reportedly increased, reduced, or unchanged in different brain areas. What accounts for this disparity? Is it about the subcellular localization of Kv1 channels? The authors may want to test this possibility experimentally. At least, they may want to test whether Kv1 channels are mislocalized in SPNs.

      We agree that this is an important point, and we have performed additional experiments to address this. We find that the Kv1.2 blocker a-DTX significantly increases the excitability of WT dSPNs but not Cntnap2<sup>-/-</sup> dSPNs. This suggests that the mechanism underlying dSPN hyperexcitability in Cntnap2 mutants is the improper clustering, number, or function of Kv1.2 channels. These channels are known to be clustered and organized in part by Caspr2 (PMIDs: 10624965, 12963709, 29300891) and have been shown to play an important role in regulating the excitability of SPNs (PMIDs: 13679409, 32075716). Interestingly, we find that a-DTX has less of an effect on the excitability of iSPNs, which may account for the greater impact of Cntnap2 loss on dSPNs. Please see new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6 for these added data and analyses. 

      Please see above response to Reviewer #3 for our speculation on the variable impact of Cntnap2 loss on different cell types and brain regions. 

      We agree with the reviewer that assessing potential differences in subcellular localization of Kv1 channels in our model would bolster the conclusion that these channels are mislocalized in the Cntnap2<sup>-/-</sup> striatum. We piloted these experiments using immunohistochemistry to stain for Kv1.1 and 1.2 but found that without very high-resolution imaging, it would be challenging to accurately quantify Kv1 puncta in a cell type-specific manner. We instead chose to investigate the functional contribution of Kv1 channels to the dSPN hyperexcitability phenotype through the a-DTX experiments outlined above. α-DTX strongly inhibits Kv1.2 channels, but also Kv1.1 channels to some extent (PMIDs: 12042352, 13679409). We find that the effects of a-DTX on SPN excitability are occluded in Cntnap2<sup>-/-</sup> dSPNs; therefore, we conclude that Kv1.2 (and possibly Kv1.1) channels have reduced function in these cells. Further work will be needed to determine if this is a result of channel mislocalization or another type of alteration. 

      The authors did not detect synaptic changes in Cntnap-deficient SPNs. This important observation should be briefly discussed in the context of previous work in other brain regions and cell types. For example, some studies reported structural and functional changes at excitatory synapses. The variable impact on synapses suggests distinct compensatory mechanisms in different brain areas.

      Given the prior literature showing effects of Cntnap2 loss on synapses in other brain regions, we were surprised that striatal synapses were not impacted in our model. We agree with the reviewer that the variable changes in synaptic properties across brain regions in Cntnap2 mutant mice is likely a result of distinct compensatory changes in these regions. Differences may also arise depending on whether the synaptic changes originate from the post-synaptic cell or from pre-synaptic changes. An interesting direction for future studies would be to explore the developmental trajectory of excitability and synaptic changes to determine which may be initial perturbations versus those that are secondary and potentially compensatory.

      Line 138: "synaptic excitability". How is this term defined? Consider "synaptic changes" instead.

      “Synaptic excitability” was used to mean a change in the number and/or function of glutamate receptors. We have now changed this term to “excitatory synaptic changes.”

      Consider a short paragraph to highlight some limitations of this study. For example, it is unclear whether SPN hyperexcitability results from a compensatory change in Cntnap2 KO mice and whether the behavioral phenotype is solely due to this cellular phenotype. The study focuses on cortical projections onto SPNs, but these cells receive inputs from other brain areas that were not explored. Lastly, no clear explanation emerges for the variable phenotype in different brain areas and cell types.

      We thank the reviewer for this suggestion and have added several paragraphs to the discussion highlighting some limitations of this study.

      We hypothesize that the dSPN hyperexcitability in Cntnap2<sup>-/-</sup> mice is a primary change, due to the direct relationship between Caspr2 and Kv1.2 channels. The results of our -DTX experiments suggest that the function and/or contribution of these channels to excitability is altered in Cntnap2<sup>-/-</sup> dSPNs. However, it is possible that there are additional changes in dSPNs that occur as a result of Cntnap2 loss and contribute to the hyperexcitability of these cells. Rather surprisingly, we don’t find evidence for altered excitatory (specifically from cortical inputs) or inhibitory synaptic function, suggesting lack of engagement of homeostatic mechanisms at the synaptic level.

      We have not yet determined whether there is a causative relationship between the change in dSPN excitability and the behavioral alterations in Cntnap2<sup>-/-</sup> mice. This is a limitation of the current study. In our discussion section, we highlight that the dSPN changes we observe in dorsolateral striatum (DLS) are known to be sufficient to enhance rotarod learning in other mouse models and thus supports a connection between this cellular change and behavior. For the other behaviors we measured, we acknowledge that both DLS and other striatal or extra-striatal brain regions have been implicated in these behaviors, and therefore less of a direct connection can be made. 

      In terms of the inputs, we focused on cortical inputs given their known role in mediating motor and habit learning (PMID: 15242609, 16237445, 19198605). Notably, corticostriatal synapses have been shown to be altered across a variety of mouse models with mutations in ASD risk genes and therefore may be a point of convergence for disparate genetic insults (PMID: 31758607). We agree that the striatum receives inputs from a variety of brain regions, notably the thalamus, which we did not explore in this study. This would be an interesting area for future studies.

      Finally, it is difficult to speculate on the root of the varying impacts to excitability that occur across different brain regions and cell types in Cntnap2<sup>-/-</sup> mice. Please see above response to Reviewer #3 for some speculation on this point in regard to the potential involvement of Kv1.2 in the excitability changes in various Cntnap2<sup>-/-</sup> cell types. To expand upon this, it is known that ASD-associated mutations can have varying impacts on cell function even across similar cell types within a given brain region – we have seen this between dSPNs and iSPNs (this study, PMIDs: 34380034, 39358043), as have other groups studying ASD risk gene mutations in striatum (PMID: 24995986). This differential impact of the same mutation on intrinsic and/or synaptic physiology across cell types has been identified in other brain regions as well (PMID: 22884327, 26601124). Differences in transcriptional programs, protein expression, neuronal morphology, synaptic inputs and plasticity state make up a non-exhaustive set of variables that will impact the physiological function of a neuron, both in terms of the direct but also indirect consequences of an ASD risk gene mutation. To better address this important question, future studies would benefit from a systematic approach to assessing physiological changes in a given ASD mouse model, both across development and across brain regions.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Wang et al. identifies a new type of deacetylase, CobQ, in Aeromonas hydrophila. Notably, the identification of this deacetylase reveals a lack of homology with eukaryotic counterparts, thus underscoring its unique evolutionary trajectory within the bacterial domain.

      Strengths:

      The manuscript convincingly illustrates CobQ's deacetylase activity through robust in vitro experiments, establishing its distinctiveness from known prokaryotic deacetylases. Additionally, the authors elucidate CobQ's potential cooperation with other deacetylases in vivo to regulate bacterial cellular processes. Furthermore, the study highlights CobQ's significance in the regulation of acetylation within prokaryotic cells.

      Weaknesses:

      The problem I raised has been well resolved. I have no further questions.

      Thanks for your valuable comments very much.

      Reviewer #2 (Public review):

      In recent years, lots of researchers tried to explore the existence of new acetyltransferase and deacetylase by using specific antibody enrichment technologies and high resolution mass spectrometry. Here is an example for this effort. Yuqian Wang et al. studied a novel Zn2+- and NAD+-independent KDAC protein, AhCobQ, in Aeromonas hydrophila. They studied the biological function of AhCobQ by using biochemistry method and MS identification technology to confirm it. These results extended our understanding of the regulatory mechanism of bacterial lysine acetylation modifications. However, I find this conclusion is a little speculative, and unfortunately it also doesn't totally support the conclusion as the authors provided.

      Major concerns:

      - It is a little arbitrary to come to the title "Aeromonas hydrophila CobQ is a new type of NAD+- and Zn2+-independent protein lysine deacetylase in prokaryotes." It should be modified to delete the "in the prokaryotes" except that the authors get new more evidence in the other prokaryotes for the existence of the AhCobQ.

      Thank you for your suggestion. However, I believe there has been some confusion regarding the title. In the revised manuscript we have already updated the title to: "Aeromonas hydrophila CobQ is a new type of NAD+- and Zn2+-independent protein lysine deacetylase."

      This title does not include the phrase "in prokaryotes," as you mentioned. We kindly suggest verifying the version of the manuscript that was reviewed to ensure you are reviewing the most recent changes.

      - I was confused about the arrangement of the supplementary results. Because there are no citations for Figures S9-S19.

      Thank you for your feedback. It appears there may have been a misunderstanding, possibly due to reviewing an outdated version of the manuscript. In the revised manuscript we revised the supplementary figures and now have only 12 figures, all of which are correctly cited in the manuscript on pages 12 to 15. Below is a detailed list of the updated figure citations:

      Figures S1: page 8, line 148;

      Figures S2: page 9, line 168;

      Figures S3 and S4: page 10, line 178;

      Figures S5: page 10, line 186;

      Figures S6: page 10, line 189;

      Figures S7: page 12, line 221;

      Figures S8-S10: page 13, line 245;

      Figures S11: page 11, line 282;

      Figures S12: page 15, line 286

      - Same to the above, there are no data about Tables S1-S6.

      Thank you for your attention to the supplementary materials. As with the figures, we have already uploaded the data for Tables S1-S6 in the revised manuscript on November 19, 2024, and properly cited Tables S1 – S6 in the manuscript. Below is the citation information:

      Tables S1: page 10, line 194;

      Tables S2: page 13, line 245;

      Tables S3: page 21, line 438;

      Tables S4: page 22, line 439;

      Tables S5: page 22, line 445;

      Tables S6: page 27, line 564.

      Please note that Tables S3 – S4 include the chemical reagents, primers, and other experimental materials, which are not intended to be cited in the results section.)

      - All the load control is not integrated. Please provide all of the load controls with whole PAGE gel or whole membrane western blot results. Without these whole results, it is not convincing to come the conclusion as the authors mentioned in the context.

      Thank you for your comment. Please note that the full membrane western blot results were included in the revised manuscript. We hope this satisfies your request. If you need further clarification or additional data, please do not hesitate to let us know.

      - Thoroughly review the materials & methods section. It is unclear to me what exactly the authors describe in the method. All the experimental designs and protocols should be described in detail, including growth conditions, assay conditions, and purification conditions, etc.

      Thank you for your valuable suggestion. In response to your comment and previous feedback, we have alredy revised the Materials & Methods section thoroughly in the revised manuscript. The experimental details, including growth conditions, assay protocols, and purification procedures, are described in full on pages 22 to 30 of the revised manuscript.

      - Include relevant information about the experiments performed in the figure legends, such as experimental conditions, replicates, etc. Often it is not clear what was done based on the figure legend description.

      Thank you very much for your detailed feedback and suggestions. We have made sure to describe what each data point represents in the figure legends, as per the previous feedback. However, we would like to clarify that while we have provided detailed descriptions in the legends, the inclusion of every specific experimental condition in the figure legends could result in redundancy, as these details are already thoroughly outlined in the Materials & Methods section.

      We hope this explanation addresses your concern.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no further revision comments.

      Thank you very much.

      Reviewer #2 (Recommendations for the authors):

      I carefully read the point-to-point response from the author. Although they listed lots of the reasons for the ugly results, it still can not persuade me to accept their conclusions. While, as I know, it is impossible to reject their work in eLife as it was sent out for peer-review. I also can't accuse them of being wrong, but I have my opinion on this point. That is not the results, but the attitude.

      Thank you for your feedback. However, I must express some concerns regarding the nature of your comments. Based on the issues you've raised, it seems that you may have reviewed an outdated version of the manuscript. In the updated revision we addressed all the points you've raised, including the figure and table citations, experimental methods, and data integration.

      We understand that differing opinions are part of the peer-review process, but we respectfully believe that your conclusion regarding our attitude is based on a misunderstanding, possibly caused by reviewing an incorrect version of the manuscript. We have always strived to approach this manuscript with utmost professionalism and have diligently responded to each of your concerns.

      We sincerely suggest reviewing the latest version of our manuscript, and we welcome any further constructive feedback. We hope this clarifies any misunderstandings and look forward to your continued support.

      Thank you for your time and thoughtful consideration.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Wang et al. identifies a new type of deacetylase, CobQ, in Aeromonas hydrophila. Notably, the identification of this deacetylase reveals a lack of homology with eukaryotic counterparts, thus underscoring its unique evolutionary trajectory within the bacterial domain.

      Strengths:

      The manuscript convincingly illustrates CobQ's deacetylase activity through robust in vitro experiments, establishing its distinctiveness from known prokaryotic deacetylases. Additionally, the authors elucidate CobQ's potential cooperation with other deacetylases in vivo to regulate bacterial cellular processes. Furthermore, the study highlights CobQ's significance in the regulation of acetylation within prokaryotic cells.

      Weaknesses:

      The problem I raised has been well resolved. I have no further questions.

      Reviewer #2 (Public review):

      In recent years, lots of researchers tried to explore the existence of new acetyltransferase and deacetylase by using specific antibody enrichment technologies and high resolution mass spectrometry. Here is an example for this effort. Yuqian Wang et al. studied a novel Zn2+- and NAD+-independent KDAC protein, AhCobQ, in Aeromonas hydrophila. They studied the biological function of AhCobQ by using biochemistry method and MS identification technology to confirm it. These results extended our understanding of the regulatory mechanism of bacterial lysine acetylation modifications. However, I find this conclusion is a little speculative, and unfortunately, it also doesn't totally support the conclusion as the authors provided.

      Reviewer #3 (Public review):

      Summary:

      This study reports on a novel NAD+ and Zn2+-independent protein lysine deacetylase (KDAC) in Aeromonas hydrophila, termed as AhCobQ (AHA_1389). This protein is annotated as a CobQ/CobB/MinD/ParA family protein and does not show similarity with known NAD+-dependent or Zn2+-dependent KDACs. The authors showed that AhCobQ has NAD+ and Zn2+-independent deacetylase activity with acetylated BSA by western blot and MS analyses. They also provided evidence that the 195-245 aa region of AhCobQ is responsible for the deacetylase activity, which is conserved in some marine prokaryotes and has no similarity with eukaryotic proteins. They identified target proteins of AhCobQ deacetylase by proteomic analysis and verified the deacetylase activity using site-specific Kac proteins. Finally, they showed that AhCobQ activates isocitrate dehydrogenase by deacetylation at K388.

      Strengths:

      The finding of a new type of KDAC has a valuable impact on the field of protein acetylation. The characters (NAD+ and Zn2+-independent deacetylase activity in an unknown domain) shown in this study are very unexpected.

      Weaknesses:

      (1) The characters (NAD+ and Zn2+-independent deacetylase activity in an unknown domain) shown in this study are very unexpected. To convince readers, MSMS data must be necessary to accurately detect (de)acetylation at the target site in the deacetylase activity assay. The authors showed the MSMS data in assays with acetylated BSA, but other assays only rely on western blot.

      (2) They prepared site-specific Kac proteins and used them in deacetylase activity assays. Incorporation of acetyllysine at the target site should be confirmed by MSMS and shown as supplementary data.

      (3) The authors imply that the 195-245 aa region of AhCobQ may represent a new domain responsible for deacetylase activity. The feature of the region would be of interest but is not sufficiently described in Figure 5. The amino acid sequence alignments with representative proteins with conserved residues would be informative. It would be also informative if the modeled structure predicted by AlphaFold is shown and the structural similarity with known deacetylases is discussed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The problem I raised has been well resolved. I have no further questions.

      Reviewer #2 (Recommendations for the authors):

      Questions to response of"-The load control is not all integrated. All of the load controls with whole PAGE gel or whole membrane western blot results should be provided. Without these whole results, it is not convincing to come to the conclusion that the authors have."

      Just as the Authors answered. The Coomassie Blue R-350 staining outcomes from the PVDF membranes. That is a good control for the experiment. However, I still have several questions about it:

      (1) The first is the quality of these Western blot. Why all the bands of these Western blot is so ugly? To tell the truth, it is very difficult to come to a conclusion from these poor western blots.

      We appreciate your feedback regarding the quality of the Western blots presented in Figure 7. We believe the “ugly bands” you referred to reflect our results validating the functions of CobQ through the use of recombinant site-specific Kac protein substrates.

      In our study, we meticulously engineered these recombinant site-specific Kac proteins using a two-plasmid system, based on foundational research published in Nature Chemical Biology (2017, 13(12): 1253-1260), which introduced the genetic encoding of Nε-acetyllysine into recombinant proteins. However, we faced a common challenge: protein truncation due to premature translation termination at the reassigned codon. This issue not only hampers protein yields, as discussed in ChemBioChem (2017, 18(20): 1973-1983), but also contributes to the suboptimal appearance of the Western blot results.

      Despite conducting at least two independent repetitions for the Western blot analysis of the site-specific Kac proteins, which yielded consistent results, we recognize that the overall quality remains less than ideal. This variability is inherently related to the characteristics of the target proteins. Nevertheless, the primary aim of our manuscript is to validate the novel deacetylase activity of CobQ. We have provided multiple lines of evidence, including mass spectrometry (MS/MS) and Western blot analyses, to substantiate this claim. In response to your comments, we have decided to remove the ambiguous Western blot results from Figure 7, retaining only four figures that demonstrate significant differences across at least two independent replicates (Author response images 1-5). Additionally, we have included four biological replicates of the Western blot results for ICD Kac388 + CobQ in the supplementary materials (Author response image 5) to further validate the deacetylase function of CobQ.

      Author response image 1.

      Western blot validation of the Kac26 AcrA-2 protein substrates regulated by the three KDACs in two biological replicates.

      Author response image 2.

      Western blot validation of the Kac48 Sun protein substrates regulated by the three KDACs in two biological replicates.

      Author response image 3.

      Western blot validation of the Kac103 Sun protein substrates regulated by the three KDACs in two biological replicates.

      Author response image 4.

      Western blot validation of the Kac195 Eno protein substrates regulated by the three KDACs in three biological replicates.

      Author response image 5.

      Western blot validation of the Kac388 ICD protein substrates regulated by AhCobQ in this study. Each sample was independently repeated at least three time.

      (2) The second is why some of the results are not from the same PVDF by comparing the Coommassie staining with the WB results just as authors responded. For example, the HrpA-K816 (ac), Eno-K195 (ac), ArcA-2-K26 (ac), ArcA-2-K26 (ac), IscS-K93(ac), A0KJ75-K81(ac), GyrB-K331(ac), GyrB-K449(ac), FtsA-K320(ac), FtsA-K409(ac), RecA-K279(ac), and the RecA-K306(ac). All of them are clearly not from the same staining results of PVDF membrane but from a new PVDF membrane.

      We assure you that the R-350 stained PVDF membranes originate from the same Western blot membranes. However, we acknowledge that visual discrepancies may arise due to differences in imaging techniques. The Western blot results were scanned using a ChemiDoc MP (Bio-Rad, Hercules, CA, USA), while the Coomassie R-350 stained PVDF membranes were captured using a standard camera. These differences can create a misleading appearance, making it seem as though they come from different membranes.

      It is also important to note that the intensity of the protein marker cannot be directly compared between the two imaging methods. As illustrated in Author response image 6, the protein marker at 70 kDa is clearly detectable in the Coomassie R-350 image, whereas it may not be as apparent in the Western blot result due to inherent differences in detection sensitivity.

      Author response image 6.

      The comparison of Western blotting and R-350 strained results of same protein marker in the same PVDF membrane. The protein marker located at 70 kDa can be detected easily in Coomassie R-350, while is difficult to display in WB result.

      Additionally, we have removed some of the so-called "ugly" Western blot results in the updated manuscript and provided the original full film of the relevant images as an attachment. This documentation demonstrates that all the data you referenced originate from the same film, as shown in Figures 1-5.

      (3) The third is why there is no replication for all these WB results. We should draw a conclusion with serious attitude, but not from the only one repeat, even say nothing about the poor results.

      Thank you for your valuable suggestion. In the second version of the manuscript, we have included the original full film of the relevant images. While we previously explained the reasons behind the "ugly" Western blot results, we have decided to remove some, or even all, of these results from Figure 7 in the updated version. The related images will be updated in the supplementary materials (Figures 1-5 in responding letter and Figure 7 in the revised manuscript).

      Furthermore, we have provided a more detailed discussion regarding the poor results in the updated manuscript to ensure clarity and transparency. We appreciate your understanding and hope these changes meet your expectations.

      Questions to response of " L174-187, L795 (Please show the whole membrane (or PAGE gel) of the loading control of CobB, and CobQ, except for the Kac-BSA)".

      (1) As we all educated that there is no control, and no biology. Where is the band of CobQ? Why do not stain the same PVDF membrane with R-350 staining but with a new membrane?

      Thank you for your insightful feedback. As noted in our previous response, the absence of visible bands for AhCobQ and AhCobB on the Coomassie R-350 stained PVDF membrane is primarily due to the low loading amounts and protein loss during the Western blotting process.

      To reinforce our findings, we repeated the analysis of the protein samples via SDS-PAGE, using the same loading quantity as in the previous Western blot shown in Figure 2 of the manuscript. As illustrated in Author response image 7, the bands for CobB and CobQ are discernible, albeit with significantly lower intensities compared to the Kac-BSA bands. Upon examining the full Coomassie R-350 stained PVDF membranes provided in Supplementary Material 1, we observe that the CobB and CobQ bands are not easily visible. This aligns with your observations and can be attributed to potential protein loss during the transfer from SDS-PAGE to the PVDF membrane.

      Author response image 7.

      The SDS-PAGE gel displayed the loading amounts of Kac-BSA and CobB/CobQ.

      To enhance the visibility of the CobQ/CobB bands, we increased the loading of CobQ/CobB in a new Western blot experiment, using 2 µg of Kac-BSA in combination with 0.8 µg of CobQ/CobB. As shown in Figure 8, while the increasing amounts of Kac-BSA resulted in a more blurred signal, the bands for the recombinant CobQ and CobB proteins were clearly detectable. This indicates that both proteins were indeed involved in the in vitro protein deacetylation assay.

      Author response image 8.

      Western blot verified the deacetylase activity assay of AhCobQ and AhCobB on Kac-BSA.

      Furthermore, we conducted a mass spectrometry analysis comparing Kac-BSA and Kac-BSA incubated with CobQ, as well as BSA without acetylation, against the A. hydrophila database with a cut-off of unique matched peptides >1. It is challenging to completely avoid contaminant detection during protein purification, especially when using high-resolution mass spectrometry. Our findings revealed that CobQ has the highest number of unique matched peptides (Author response table 1), while contaminants such as AHA_3036, AHA_0497, AHA_1279, and valS could be excluded, as they were present in Kac-BSA or BSA samples. Additionally, Tuf1, RplQ, GroEL, RpsF, RpsU, RpsB, RpsO, and RpsJ are known ribosomal subunits or chaperonins that are abundantly expressed in cells and may interact with various proteins, leading to contaminant detection.

      Author response table 1.

      LC MS/MS results of selected peptide quantification among Kac-BSA and Kac-BSA incubated with CobQ and BSA without acetylation against A. hydrophila database (unique matched peptides>1).

      Although AceE, a pyruvate dehydrogenase E1 component, theoretically possesses deacetylase activity, this possibility is low. First, in the SDS-PAGE gel of the purified recombinant protein, CobQ is the major band, with other proteins present at very low levels (less than 1/10 of CobQ). This suggests that significant deacetylation by contaminants is unlikely. Second, we purified His-tagged AhCobQ and GST-fused AhCobQ separately and tested their deacetylase activities. As shown in Figure S4 of the updated manuscript, both purified AhCobQ proteins exhibited deacetylase activity, while the negative control (purified GST protein only) did not, further supporting our conclusion that enzyme activity is not attributable to contaminating proteins (Figure S5).

      (2) Without the CobB and CobQ bands, it is impossible to say the function of CobQ is a new deacetylase. To avoid this confusion, it is easy to run a new gel and stain it with anti-His antibody to show these deacetylases.

      Thank you very much for your suggestion. We have performed the experiment in the comment (1) as your suggestion.

      (3) The explanation about the CobB/CobQ bands are not visible is not acceptable. Because the molecular weight of the CobB and CobQ is smaller than that of BSA, it is impossible that these bands will be loss during membrane transfer.

      Thank you for your valuable feedback. I completely agree that the loss of CobB and CobQ proteins during membrane transfer is unlikely due to their smaller molecular weight compared to BSA. As shown in Figure 7, the bands for CobB and CobQ are detectable in the SDS-PAGE gel but not visible on the Coomassie R-350 stained PVDF membrane.

      Several factors could contribute to this issue. One possibility is that the detection sensitivity of Coomassie R-350 may be lower than that of Coomassie R-250 used in the gel. Additionally, the Western blot results using an anti-His antibody further indicate low loading amounts of CobB and CobQ proteins on the PVDF membrane (Figure 8). This suggests that the observed low levels may indeed be due to protein loss during the membrane transfer process, despite their relatively small size.

      Reviewer #3 (Recommendations for the authors):

      (1) I found Tables S1 and S2 in the revised manuscript. It is strange to me that the intensity of Kac-BSA+CobQ is zero, completely nothing. Typically, a portion of the acetylated peptide remains after the deacetylation reaction.

      Thank you for your observation. When we report an intensity of zero, it does not imply a complete absence of signal; rather, it indicates that the signal for the target peptide is below the detectable threshold. This is likely due to the minimum cut-off setting in the MaxQuant (MQ) software, which is determined by parameters like "peptide_mass_tolerance" (as discussed in MQ user groups online, though it may not be explicitly listed in the parameters file).

      In our study, we performed a deacetylase assay that demonstrated CobQ's rapid activity; for instance, it can deacetylate ICD-K388ac within just four minutes. This leads me to hypothesize that the CobQ + Kac-BSA sample may have undergone near-complete enzymatic hydrolysis during the reaction.

      Furthermore, Table S1 in manuscript presents only a selection of the mass spectrometry results to illustrate CobQ's activity. In addition to the 15 acetylated peptides shown, there are many more (27 peptides) that exhibit significantly reduced acetylation levels without reaching zero intensity. The overall acetylation level of BSA peptides incubated with CobQ is calculated to be only 0.13 times that of Kac-BSA (Diagnostic peak: yes, peptide score: >100, Localization probability: >0.95) (Author response image 9).

      Based on these findings, we believe our mass spectrometry results are reliable and effectively support our conclusions. Thank you for your understanding.

      Author response image 9.

      The intensities of all Kac peptides of Kac-BSA with or without AhCobQ incubation in LC MS/MS.

      (2) It would be better to provide the information about ArcA and ArcA-2 as mentioned in the authors' response. It would be helpful for readers to understand that they are different proteins.

      Thank you for your suggestion. In the A. hydrophila ATCC 7966 dataset, there are indeed two distinct proteins referred to as ArcA: ArcA-1, which functions as an aerobic respiration control protein, and ArcA-2, which acts as an arginine deiminase. Importantly, these two proteins do not share any sequence homology; they are only similarly named due to their acronyms. While we believe this distinction does not require extensive explanation in the current study, we appreciate your input. Additionally, in response to Reviewer 2’s feedback, we have decided to remove the Western blot result for ArcA-2 due to its poor quality in the updated manuscript.

      (3) Line 409-416. Despite my comment, the citation of related papers on ICD acetylation in E. coli is still missing.

      Thank you for your suggestion. It has been added and highlighted in red. (Venkat S, et al, 2018, 430(13): 1901-1911)

      (4) The image resolution of Figure 3C and 3D is still bad. I could not evaluate that Kac was exactly incorporated at the target site.

      Thank you for your feedback regarding the image resolution of Figures 3C and 3D. We have now displayed these figures with improved clarity, as you suggested.

      To further validate the reliability of our MS2 data, we employed Proteome Discoverer 2.4 (Thermo) to analyze the raw data and provide theoretical mass information. As shown in Author response images 10-13, the MS2 spectra and fragment match lists for both unmodified and acetylated peptides offer additional confirmation of the reliability of our mass spectrometry results.

      Author response image 10.

      MS2 spectrum of unmodified peptide using PD v2.4 software.

      Author response image 11.

      The theoretical mass of unmodified peptide by PD 2.4

      Author response image 12.

      MS2 spectrum of acetylated peptide using PD v2.4 software.

      Author response image 13.

      The theoretical mass of acetylated peptide by PD 2.4.

      (5) Again, in Figure 8D, it should be shown the significance between ICD-Kac388 and ICD-Kac388+AhCobB to support the authors' conclusion that AhCobQ activates ICD by deacetylation at K388.

      Thanks for your suggestion, we have updated the figure in Figure 8D in updated manuscript.

      (6) It was nice that the authors presented the mass spectrum data of ICD-K388 acetylation (Figure 2 in responding letter). However, the data did not convince me that K388 is acetylated. In the figure, two b-ion peaks are detected, 285.1557 and 386.2034, which may correspond to NK (theoretical mass, 260.15) and NKT (theoretical mass, 361.20) peptides, respectively. If K388 is acetylated, an increase in the mass of 42 should be observed, but the difference between the detected and theoretical mass is 25. I also could not understand what the peak of 126.0913 mass is, indicated with acK* in red.

      Thank you for your detailed observation. The data presented in the MS2 spectrum for ICD-K388 acetylation in Figure 2 of the previous response letter were generated using Proteome Discoverer 2.4 (PD, Thermo) to ensure accurate mass calculations. Similar to the results from MaxQuant, ICD-K388 was identified again (Author response image 14).

      Regarding the b-ion peaks you mentioned, the values 285.1557 and 386.2034 correspond to NK<sup>ac</sup> and NK<sup>ac</sup>T peptides, respectively. The theoretical masses for these peptides are as follows: NK<sup>ac</sup> (285.15 = 115.05020 + 128.095 + 42.01) and NK<sup>ac</sup>T (386.20 = NK<sup>ac</sup> + 101.04768). The differences between the theoretical and detected masses for the relevant b-ions (b2*-NK, b52+-NH3, and b3) are minimal, at 0.00 Da and 2.1 ppm, respectively, which is consistent with the incorporation of an NH3 group (Author response image 15).

      Author response image 14.

      The MS2 of ICD-K388 peptide by PD 2.4.

      Author response image 15.

      The theoretical mass of ICD-K388 peptide by PD 2.4.

      The peak at 126.0913 m/z, indicated as acK*, represents immonium ions of ε-N-acetyllysine, which are generated during the fragmentation of acetyllysine. This diagnostic ion is widely recognized as a marker for identifying acetylated peptides (Nakayasu, et al,. A method to determine lysine acetylation stoichiometries. International journal of proteomics. 2014;2014(1):730725; Trelle et al., Utility of immonium ions for assignment of ε-N-acetyllysine-containing peptides by tandem mass spectrometry. Analytical chemistry. 2008;80(9):3422-30). Additionally, it is a default parameter in MaxQuant for identifying Kac peptides (Author response image 16).

      Based on these findings, we believe the evidence supporting ICD-K388 acetylation is robust.

      Author response image 16.

      The default parameter in Kac peptide identification in Maxquant v1.6 software

      (7) As mentioned by other reviewers, some of the figures and tables are incomplete. Some panels (ex. Figure 7C and 7D) and explanations (ex. What are lanes 1, 2, and 3 in Figure S3) are still missing.

      Thank you for your suggestion. It has been added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The authors of the study investigated the generalization capabilities of a deep learning brain age model across different age groups within the Singaporean population, encompassing both elderly individuals aged 55 to 88 years and children aged 4 to 11 years. The model, originally trained on a dataset primarily consisting of Caucasian adults, demonstrated a varying degree of adaptability across these age groups. For the elderly, the authors observed that the model could be applied with minimal modifications, whereas for children, significant fine-tuning was necessary to achieve accurate predictions. Through their analysis, the authors established a correlation between changes in the brain age gap and future executive function performance across both demographics. Additionally, they identified distinct neuroanatomical predictors for brain age in each group: lateral ventricles and frontal areas were key in elderly participants, while white matter and posterior brain regions played a crucial role in children. These findings underscore the authors' conclusion that brain age models hold the potential for generalization across diverse populations, further emphasizing the significance of brain age progression as an indicator of cognitive development and aging processes.

      Strengths: 

      (1) The study tackles a crucial research gap by exploring the adaptability of a brain age model across Asian demographics (Chinese, Malay, and Indian Singaporeans), enriching our knowledge of brain aging beyond Western populations.

      (2) It uncovers distinct anatomical predictors of brain aging between elderly and younger individuals, highlighting a significant finding in the understanding of age-related changes and ethnic differences.

      Weaknesses: 

      (1) Clarity in describing the fine-tuning process is essential for improved comprehension.

      (2) The analysis often limits its findings to p-values, omitting the effect sizes crucial for understanding the relationship with cognition.

      (3) Employing a predictive framework for cognition using brain age could offer more insight than mere statistical correlations.

      (4) Expanding the study's scope to evaluate the model's generalisability to unseen Caucasian samples is vital for establishing a comparative baseline.

      In summary, this paper underscores the critical need to include diverse ethnicities in model testing and estimation.

      Reviewer #1 (Recommendations for the authors): 

      Comment #1 - Fine-Tuning Process Clarity: Enhanced clarity in the fine-tuning process documentation is crucial for understanding how models are adapted to new datasets. This involves explaining parameter adjustments and choices, which facilitates replication and application in further research.

      We thank Reviewer #1 for this pertinent point. As advised, we have added a Supplementary Methods section with more details on the finetuning process. This includes the addition of Supplementary Figure S6, which shows examples of learning curves that helped inform our parameter adjustments and choices. We have added a reference to this section in Section 5.2 of the Methods.

      Comment #2 - Effect Sizes Reporting: The emphasis on reporting effect sizes alongside p-values addresses the need to quantify the strength of observed effects, particularly the relationship between brain age and cognition. Effect sizes provide insights into the practical significance of findings, crucial for clinical and practical applications.

      We thank Reviewer #1 for raising this important comment. As suggested, we have added standardized regression coefficients (as measures of effect size) alongside p-values in Figures 3 – 4, Supplementary Figures S2 – S4, Supplementary Tables S4 – S15, and the text of Sections 2.2 – 2.3 of the Results. We have additionally added 95% confidence intervals to Supplementary Tables S4 – S15.

      Comment #3 - Predictive Framework for Cognition: Adopting a predictive framework for cognition using brain age moves the research from mere correlation to actionable prediction, offering potentials based on predictive analytics.

      We thank Reviewer #1 for this insightful suggestion. Adopting a predictive framework would certainly be a useful and exciting avenue for the application of brain age. However, we note that the current study was primarily interested in the generalizability and interpretability of brain age in Asian children and older adults, as well as the added value of longitudinal measures of brain age. Thus, we believe our correlation-based analysis effectively demonstrated that deviations of brain age from chronological age were not merely random errors, but were informative of cognition. Furthermore, ongoing changes to these deviations were informative of future cognition. This helps to establish the brain age gap as a biomarker for aging, independent of chronological age. Additionally, we expect that the accurate prediction of future cognition would require a multitude of factors, in addition to T1-based brain age, as well as a large sample size to train and test. We believe such a dataset would be a promising avenue for future work, but it is outside the scope of the current study.

      Nonetheless, we were able to conduct a preliminary analysis using the current longitudinal data from SLABS and GUSTO. We extracted the same variables used in the original analyses of future cognition, corresponding to Figures 3D and 4B in the main text. To implement a predictive framework, we split the data into 10 stratified cross-validation folds. We also used kernel ridge regression (KRR) as the predictive model, as it has previously shown promising performance in behavioral and cognitive prediction [1]. We used a cosine kernel and nested 5-fold cross-validation to pick the optimal regularization strength (alpha).

      To investigate the added value of BAG and longitudinal changes in BAG, we compared 3 predictive models for each cognitive domain. The baseline model consisted of the demographic covariates used in the original analyses (i.e. chronological age, sex, and years of education for older adults). A second model combined demographics with baseline BAG, and the third model incorporated demographics, baseline BAG, and the (early) annual rate of change in BAG. Predictions were extracted from each test fold, and performance was measured by the correlation between test predictions and actual values of future cognition (or change in cognition). Models were statistically compared using the corrected resampled t-test for machine learning models [1], [2], [3]. The Benjamini-Hochberg procedure was used to correct for multiple comparisons.

      Author response image 1 shows the prediction results for SLABS and GUSTO. Notably, adding the early change in BAG significantly improves the prediction of future change in executive function in SLABS. There is also an improvement in predicting the future inhibition score in GUSTO, but this is not significant after multiple comparison correction. Encouragingly, these are the same domains that showed significant associations with the change in BAG in the original analyses. This suggests that longitudinal brain age continues to contribute information, independent of baseline factors, in a predictive framework. We hope that future work can expand on this analysis with, for instance, larger sample sizes, more varied and informative predictors, and state-of-the-art prediction methods, in order to establish actionable predictions of future cognition.

      Author response image 1.

      Predictive framework for cognition similarly suggests value of longitudinal change in BAG. Prediction performance (Pearson's correlation) of KRR across future cognitive outcomes. Each boxplot shows the distribution of performance over cross-validation folds. Model performances are statistically compared for each outcome. Significant outcomes from the original analyses are bolded. (A) Results for SLABS using the early change in BAG and future change in cognitive scores (non-overlapping). Early change in BAG again shows benefit for predicting future change in executive function. (B) Results for GUSTO using the early change in BAG (from 4.5-7.5 years old) and future cognitive score (at 8.5 years old). Early change in BAG again shows benefit for predicting future inhibition, but it is not significant after multiple comparison correction. Key - **: p < 0.01; * (ns): p < 0.05 but p<sub>corr</sub> > 0.05 after multiple comparison correction; ns: p > 0.05

      Comment #4 - Generalizability to Unseen Caucasian Samples: Evaluating the model's performance on unseen (longitudinal) Caucasian samples is important for benchmarking.

      We thank Reviewer #1 for this important comment. We agree that generalizability should be benchmarked against performance on unseen Caucasian samples. In the SFCN model paper [4], they conducted an out-of-sample test on unseen Caucasian samples from ages 13 to 95. In this age range, they reported a high correlation (r = 0.975) and low MAE (MAE = 3.90). This favorable generalization performance was verified in adults by independent evaluations [5], [6]. This is also in line with what we observed in Asian older adults, taking into account the different age ranges and sample sizes involved [7].

      However, this also highlights the difficulty in evaluating on younger ages in the range of GUSTO (4.5 – 10.5 years old). Most accessible developmental datasets (e.g. HBN, PING) were already included in model training, preventing an unbiased evaluation on these samples. Datasets such as PNC and ABCD were not included in training, but they primarily consist of an older age range than GUSTO. Holm et al. [8] previously tested the SFCN model in ABCD and reported satisfactory performance (low MAE) from 9 – 13 years old. However, to the best of our knowledge, there are no reported generalization results (for any ethnicity) from 4.5 – 7.5 years old, which is where we found the most performance degradation in GUSTO. We are also not aware of any datasets in this age range we could access to test this, unfortunately, but it would be an important area for future work.

      While benchmarking in Caucasian children is difficult, we were able to conduct a preliminary analysis with older adults using the ADNI dataset (which was not included in the model training [4]). We selected a longitudinal subset with cognitive data available and no dementia at baseline (N = 137). We used composite cognitive scores covering memory, executive function, language, and visuospatial function [9], [10], [11]. We followed the same methodology (e.g. preprocessing, finetuning, statistical analysis) as the main analyses on EDIS, SLABS, and GUSTO. To maximize the data available, we tested associations with future cognition (taken at the last available time point), similar to GUSTO. We again included chronological age, sex, and years of education as demographic covariates.

      Author response image 2 shows the brain age predictions for the pretrained and finetuned models on ADNI. Similar to Singaporean older adults, the pretrained model performs well, producing a high correlation (r = 0.8053; compared to r = 0.7389 for EDIS and r = 0.8136 for SLABS) and somewhat low MAE (MAE = 4.9735; compared to MAE = 3.9895 for EDIS and MAE = 3.4668 for SLABS). After finetuning, the MAE improves (MAE = 3.6837; compared to MAE = 3.3232 for EDIS and MAE = 3.2653 for SLABS) with a similar correlation (r = 0.7854; compared to r = 0.7445 for EDIS and r = 0.8138 for SLABS). This suggests that generalization to unseen Singaporean older adults is in line with the generalization to unseen Caucasian older adults.

      Author response image 2. 

      Brain age predictions on unseen Caucasian sample of older adults. Predictions from the A) pretrained and B) finetuned brain age models on ADNI participants. Compare to Figure 2 of the main text.

      For the associations with future cognition, we again find that baseline BAG does not associate with future cognition (Author response tables 1 and 2). However, encouragingly, we find that the early annual rate of change in BAG does associate with future memory, which is significant after multiple comparison correction for the finetuned model (Author response tables 2 and 3). This suggests  a degree of replicability to the original results, but interestingly, in a different domain (memory vs. executive function). In contrast to SLABS, which consists of healthy older adults recruited from the community, ADNI consists of participants at risk of AD recruited from memory clinics. Thus, this difference in domain could be due to factors such as a stronger signal for memory in the testing battery or greater variations in memory function and decline. However, it could also reflect other population differences between ADNI and SLABS. This is an intriguing area for future study, ideally with larger sample sizes and more diverse populations included.

      Author response table 1.

      Linear relationship between pretrained baseline BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 2. 

      Linear relationship between finetuned baseline BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 3.

      Linear relationship between pretrained change in BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 4. 

      Linear relationship between finetuned change in BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      References

      (1) L. Q. R. Ooi et al., “Comparison of individualized behavioral predictions across anatomical, diffusion and functional connectivity MRI,” NeuroImage, vol. 263, p. 119636, Nov. 2022, doi: 10.1016/j.neuroimage.2022.119636.

      (2) C. Nadeau and Y. Bengio, “Inference for the Generalization Error,” Mach. Learn., vol. 52, no. 3, pp. 239–281, Sep. 2003, doi: 10.1023/A:1024068626366.

      (3) R. R. Bouckaert and E. Frank, “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms,” in Advances in Knowledge Discovery and Data Mining, H. Dai, R. Srikant, and C. Zhang, Eds., Berlin, Heidelberg: Springer, 2004, pp. 3–12. doi: 10.1007/978-3-540-24775-3_3.

      (4) E. H. Leonardsen et al., “Deep neural networks learn general and clinically relevant representations of the ageing brain,” NeuroImage, vol. 256, p. 119210, Aug. 2022, doi: 10.1016/j.neuroimage.2022.119210.

      (5) R. P. Dörfel et al., “Prediction of brain age using structural magnetic resonance imaging: A comparison of accuracy and test-retest reliability of publicly available software packages,” Neuroscience, preprint, Jan. 2023. doi: 10.1101/2023.01.26.525514.

      (6) J. L. Hanson, D. J. Adkins, E. Bacas, and P. Zhou, “Examining the reliability of brain age algorithms under varying degrees of participant motion,” Brain Inform., vol. 11, no. 1, p. 9, Apr. 2024, doi: 10.1186/s40708-024-00223-0.

      (7) A.-M. G. de Lange et al., “Mind the gap: Performance metric evaluation in brain-age prediction,” Hum. Brain Mapp., vol. 43, no. 10, pp. 3113–3129, Jul. 2022, doi: 10.1002/hbm.25837.

      (8) M. C. Holm et al., “Linking brain maturation and puberty during early adolescence using longitudinal brain age prediction in the ABCD cohort,” Dev. Cogn. Neurosci., vol. 60, p. 101220, Feb. 2023, doi: 10.1016/j.dcn.2023.101220.

      (9) P. K. Crane et al., “Development and assessment of a composite score for memory in the Alzheimer’s Disease Neuroimaging Initiative (ADNI),” Brain Imaging Behav., vol. 6, no. 4, pp. 502–516, Dec. 2012, doi: 10.1007/s11682-012-9186-z.

      (10) L. E. Gibbons et al., “A composite score for executive functioning, validated in Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment,” Brain Imaging Behav., vol. 6, no. 4, pp. 517–527, Dec. 2012, doi: 10.1007/s11682-012-9176-1.

      (11) S.-E. Choi et al., “Development and validation of language and visuospatial composite scores in ADNI,” Alzheimers Dement. Transl. Res. Clin. Interv., vol. 6, no. 1, p. e12072, 2020, doi: 10.1002/trc2.12072.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dong et al here have studied the impact of the small Ras-like GTPase Rab10 on the exocytosis of dense core vesicles (DVC), which are important mediators of neuropeptide signaling in the brain. They use optical imaging to show that lentiviral depletion of Rab10 in mouse hippocampal neurons in culture independent of the established defects in neurite outgrowth hamper DCV exocytosis. They further demonstrate that such defects are paralleled by changes in ER morphology and defective ER-based calcium buffering as well as reduced ribosomal protein expression in Rab10-depleted neurons. Re-expression of Rab10 or supplementation of exogenous L-leucine to restore defective neuronal protein synthesis rescues impaired DCV secretion. Based on these results they propose that Rab10 regulates DCV release by maintaining ER calcium homeostasis and neuronal protein synthesis.

      Strengths:

      This work provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. The authors combine advanced optical imaging with light and electron microscopy, biochemistry, and proteomics approaches to thoroughly assess the effects of Rab10 knockdown at the cellular level in primary neurons. The proteomic dataset provided may be valuable in facilitating future studies regarding Rab10 function. This work will thus be of interest to neuroscientists and cell biologists.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      While the main conclusions of this study are comparably well supported by the data, I see three major weaknesses:

      (1) For some of the data the statistical basis for analysis remains unclear. I.e. is the statistical assessment based on N= number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing.

      This is an important point and we agree that multiple samples from the same biological replicate are not independent observations. We reanalyzed all nested data using a linear mixed model and indicated this in the Methods section and the relevant figure legends (Brunner et al., 2022). In brief, biological replicates (individual neuronal cultures) were used as a linear predictor. Outliers were identified and excluded using the ROUT method in GraphPad. A fixed linear regression model was then fitted to the data using the lm() function in R. A one-way anova (analysis of variance) was used to assess whether including the experimental group as a second linear predictor (formula = y ~ Group + Culture) statistically improved the fit of a model without group information (formula = y ~ 1 + Culture). Post-hoc analysis was performed using the emmeans() function with Tukey’s adjustment when more than two experimental groups were present. Importantly, our conclusions remain unchanged.

      (2) As it stands the paper reports on three partially independent phenotypic observations, the causal interrelationship of which remains unclear. Based on prior studies (e.g. Mercan et al 2013 Mol Cell Biol; Graves et al JBC 1997) it is conceivable that defective ER-based calcium signaling and the observed reduction in protein synthesis are causally related. For example, ER calcium release is known to promote pS6K1 phosphorylation, a major upstream regulator of protein synthesis and ribosome biogenesis. Conversely, L-leucine supplementation is known to trigger calcium release from ER stores via IP3Rs. Given the reported impact of Rab10 on axonal transport of autophagosomes and, possibly, lysosomes via JIP3/4 or other mediators (see e.g. Cason and Holzbaur JCB 2023) and the fact that mTORC1, the alleged target of leucine supplementation, is located on lysosomes, which in turn form membrane contacts with the ER, it seems worth analyzing whether the various phenotypes observed are linked at the level of mTORC1 signaling.

      This is great suggestion that could indeed further clarify the potential interplay between ER-based Ca2+ signaling and protein synthesis. To address this, we assessed the phosphorylation level of pS6K1 in control and Rab10 knockdown (KD) neurons with or without leucine treatment. These data are included in the new Figure 8—figure supplement 1 in the revised manuscript. Our results indicate that pS6K1 phosphorylation was not upregulated in Rab10 KD neurons, suggesting that the level of mTORC1 signaling is not different between wild-type or KD neurons. Furthermore, leucine treatment increased the pS6K1 phosphorylation level, as expected, but this effect was similar in both groups. Hence, we conclude that differences in mTORC1 signaling induced by Rab10 loss is not a major factor in the observed impairment in protein synthesis.

      Author response image 1.

      Rab10 depletion does not upregulate mTORC1 pathway. (A)Typical immunoblot showing pS6K1 levels in each condition. (B) Quantification of relative pS6K1 levels in each condition. All Data are plotted as mean±s.e.m. (C) Control, Control + Leu: N = 2, n = 2, Rab10 KD, Rab10 KD + Leu: N = 2, n = 4.

      (3) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature.

      We agree that 200 APs stimulation might be too strong to detect specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies (Emperador-Melero et al., 2018; Granseth et al., 2006; Ivanova et al., 2021; Kwon and Chapman, 2011; Reshetniak et al., 2020). We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Reviewer #2 (Public Review):

      Summary:<br /> In this paper, the authors assess the function of Rab10 in dense core vesicle (DCV) exocytosis using RNAi and cultured neurons. The author provides evidence that their knockdown (KD) is effective and provides evidence that DCV is compromised. They also perform proteomic analysis to identify potential pathways that are affected upon KD of Rab10 that may be involved in DCV release. Upon focusing on ER morphology and protein synthesis, the authors conclude that defects in protein synthesis and ER Ca2+ homeostasis contributes to the DVC release defect upon Rab10 KD. The authors claim that Rab10 is not involved in synaptic vesicle (SV) release and membrane homeostasis in mature neurons.

      Strengths:

      The data related to Rab10's role in DCV release seems to be strong and carried out with rigor. While the paper lacks in vivo evidence that this gene is indeed involved in DCV in a living mammalian organism, I feel the cellular studies have value. The identification of ER defect in Rab10 manipulation is not truly novel but it is a good conformation of studies performed in other systems. The finding that DCV release defect and protein synthesis defect seen upon Rab10 KD can be significantly suppressed by Leucine supplementation is also a strength of this work.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      The data showing Rab10 is NOT involved in SV exocytosis seems a bit weak to me. Since the proteomic analysis revealed so many proteins that are involved in SV exo/encodytosis to be affected upon Rab10, it is a bit strange that they didn't see an obvious defect. Perhaps this could have been because of the protocol that the authors used to trigger SV release (I am not an E-phys expert but perhaps this could have been a 'sledge-hammer' manipulation that may mask any subtle defects)? Perhaps the authors can claim that DCV is more sensitive to Rab10 KD than SV, but I am not sure whether the authors should make a strong claim about Rab10 not being important for SV exocytosis.

      We agree that 200 APs stimulation might be too strong to see specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies. We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Also, the authors mention "Rab10 does not regulate membrane homeostasis in mature neurons" but I feel this is an overstatement. Since the authors only performed KD experiments, not knock-out (KO) experiments, I believe they should not make any conclusion about it not being required, especially since there is some level of Rab10 present in their cells. If they want to make these claims, I believe the authors will need to perform conditional KO experiments, which are not performed in this study.

      This is a valid point. We have changed the statement to “membrane homeostasis in mature neurons was unaffected by Rab10 knockdown” (p. 13, l.376-377).

      Finally, the authors show that protein synthesis and ER Ca2+ defects seem to contribute to the defect but they do not discuss the relationship between the two defects. If the authors treat the Rab10 KD cells with both ionomycin and Leucine, do they get a full rescue? Or is one defect upstream of the other (e.g. can they see rescue of ER morphology upon Leucine treatment)? While this is not critical for the conclusions of the paper, several additional experiments could be performed to clarify their model, especially considering there is no clear model that explains how Rab10, protein synthesis, ER homeostasis, and Ca2+ are related to DCV (but not SV) exocytosis.

      This is an important point and a great suggestion. We have now tested the rescue effects of leucine treatment on ER morphology, as suggested. These data are included in the new Figure 8—figure supplement 2 in the revised manuscript. Our results indicate that the same dose of leucine that rescues DCV fusion and protein translation failed to rescue ER morphology. Hence, the defects in ER morphology appear to be independent of the impaired protein translation.

      Author response image 2.

      Leucine supplementation does not rescue ER morphological deficiency in Rab10 KD neurons. (A) Typical examples showing the KDEL signals in each condition. (B) Quantification of RTN4 intensity in MAP2-positive dendrites. (C) The ratio of neuritic to somatic RTN4 intensity (N/S). All Data are plotted as mean±s.e.m. (B, C) Control: N = 3, n = 10; Rab10 KD: N = 3, n = 11; Rab10 KD + Leu: N = 3; n = 11. A one-way ANOVA tested the significance of adding experimental group as a predictor. **** = p<0.0001, ns = not significant.

      Reviewer #3 (Public Review):

      In the submitted manuscript, Dong and colleagues set out to dissect the role of the Rab10 small GTPase on the intracellular trafficking and exocytosis of dense core vesicles (DCVs). While the authors have already shown that Rab3 plays a central role in the exocytosis of DVC in mammalian neurons, the roles of several other Rab-members have been identified genetically, but their precise mechanism of action in mammalian neurons remains unclear. In this study, the authors use a carefully designed and thoroughly executed series of experiments, including live-cell imaging, functional calcium-imaging, proteomics, and electron microscopy, to identify that DCV secretion upon Rab10 depletion in adult neurons is primarily a result of dysregulated protein synthesis and, to a lesser extent, disrupted intracellular calcium buffering. Given that the full deletion of Rab10 has a deleterious effect on neurons and that Rab10 has a major role in axonal development, the authors cautiously employed the knock-down strategy from 7 DIV, to focus on the functional impact of Rab10 in mature neurons. The experiments in this study were meticulously conducted, incorporating essential controls and thoughtful considerations, ensuring rigorous and comprehensive results.

      We are grateful for the positive evaluation of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The work by Dong et al provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. I suggest that the authors address the following points experimentally to increase the impact of this potentially important study.

      Major points:

      (1) As alluded to above, for some of the data the statistical basis for analysis remains unclear (examples are Figures 1C-F, J,K; Figure 2 1B-D,I-K; Figure 2 - Supplement 1D-F; Figure 2 - Supplement 2J,K, etc). I.e. is the statistical assessment based on N = number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing. The Ms misses also misses a dedicated paragraph on statistics in the methods section.

      See reply to reviewer 1 above. We fully agree and solved this point.

      (2) A main weakness of the paper is the missing connection between neuronal protein synthesis, and the observed structural and signaling defects at the level of the ER. I suggest that the authors analyze mTORC1 signaling in Rab10 depleted neurons and under rescue conditions (+Leu or re-expression of Rab10) as ribosome biogenesis is a major downstream target of mTORC1 and mTORC1 activity is related to lysosome position, which may be affected upon rab10 loss -either directly or via effects on the ER that forms tight contacts with lysosomes.

      See reply to reviewer 1 above. We agreed and followed up experimentally.

      (3) Related to the above: Does overexpression of SERCA2 restore normal DCV exocytosis in Rab10-depleted neurons? This would help to distinguish whether calcium storage and release at the level of the ER indeed contribute to the exocytosis defect.

      This is an important point and a great suggestion. We have now tested the rescue effects of overexpression of SERCA2 on DCV fusion. These data are included in the new Figure 8—figure supplement 3 in the revised manuscript. SERCA2 OE failed to rescue the DCV fusion defects in Rab10 KD neurons.

      Author response image 3.

      Overexpression of SERCA2 does not rescue DCV fusion deficits in Rab10 KD neurons. (A) Typical examples showing the SERCA2 signals in each condition. (B) Cumulative plot of DCV fusion events per cell. (C) Summary graph of DCV fusion events per cell. (A) Total number of DCVs (total pool) per neuron, measured as the number of NPY-pHluorin puncta upon NH4Cl perfusion. (B) Fraction of NPY-pHluorin-labeled DCVs fusing during stimulation. All Data are plotted as mean±s.e.m. (C-E) Control: N = 2, n = 10; Rab10 KD: N = 2, n = 13; SERCA2 OE: N = 2; n = 15. A one-way ANOVA tested the significance of adding experimental group as a predictor. *** = p<0.001, ** = p<0.01, ns = not significant.

      (4) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature. The authors should conduct additional experiments under conditions of single or few Aps (e.g. 4 or 10 Aps) to really assess whether or not Rab10 depletion alters SV exocytosis at the level of pHluorin analysis in cultured neurons.

      See reply to reviewer 2 above. Agreed to and made textual adjustments to solve this

      (5) Related to the above: I am puzzled by the data shown in Figure 1H-J: From the pHluorin traces shown I would estimate a tau value of about 20-30 s (e.g. decay to 1/e = 37% of the peak value). The bar graph in Figure 1K claims 3-4 s, clearly clashing with the data shown. Were these experiments conducted at RT (where expected tau values are in the range of 30s) or at 37{degree sign}C (one would expect taus of around 10 s in this case for Syp-pH)? I ask the authors to carefully check and possibly re-analyze their datasets.

      This is indeed a mistake. We thank the reviewer for flagging this miscalculation. Our original Matlab script used for calculating the tau value contained an error and the datasets were normalized twice by mistake. We now reanalyzed the data and the corresponding figures and texts have been updated. Our conclusion that Rab10 KD does not affect SV endocytosis remains unchanged since the difference in tau between the control (28.5 s) and Rab10 KD (32.8 s) suffered from the same systematic error and were/are not significantly different.

      (6) How many times was the proteomics experiment shown in Figure 3 conducted? I noticed that the data in panel H missed statistical analysis and error bars. Given the typical variation in these experiments, I suggest to only include data for proteins identified in at least 3 out of 4 experimental replicates.

      We agree that this information has not been clear. We have now explained replication in the Methods section (p. 42, l. 879-885). In brief, the proteomics experiment presented in Fig 3 was conducted with two independent cultures (‘biological replicates’), hence, formally only two independent observations. For each biological replicate, we performed four technical replicates. For our analysis, we only included peptides that were consistently detected across all samples (not only three as this reviewer suggests). Proteins in Panel H are ER-related proteins that are significantly different from control neurons with an adjusted FDR ≤ 0.01 and Log2 fold change ≥ 0.56. The primary purpose of our proteomics experiments was to generate hypotheses and guide subsequent experiments and the main findings were corroborated by other experiments presented in the manuscript.

      Minor:

      (7) Figure 2 - supplement 3 and Figure 4 - supplement 3 are only mentioned in the discussion. The authors should consider referring to these data in the results section.

      This is a valid point. We have now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6-7, l. 162-164).

      (8) Where is the pHluorin data shown in Figure 1 bleach-corrected? If so, this should be stated somewhere in the Ms. Moreover, the timing of the NH4Cl pulse should be indicated in the scheme in panel I.

      We thank the reviewer for pointing these omissions out. We have now included information about the timing of NH4Cl pulse in panel I. We did not do bleach-correction for the pHluorin data shown in Figure 1. It has been shown that pHluorin is very stable with a bleaching rate in the alkaline state of 0.06% per second and 0.0024% per second in the quenched state (Balaji and Ryan, 2007). Indeed, we did not observe obvious photobleaching in the first 30s during our imaging as indicated by the average trace of pHluorin intensity in panel I.

      (9) Page 3/ lines 59-60: "...strongest inhibition of neuropeptide accumulation...". What is probably meant is "...strongest inhibition of neuropeptide release".

      We agree this statement is unclear. Sasidharan et al used a coelomocyte uptake assay as an indirect readout for DCV release. The ‘strongest inhibition of neuropeptide accumulation’ in coelomocytes in Rab10 mutant indicates DCV fusion deficits. We have now replaced the text with “Rab10 deficiency produces the strongest inhibition of neuropeptide release in C. elegans” to make it more clear.

      Reviewer #3 (Recommendations For The Authors):

      I strongly recommend the publishing of this study as a VOR with minor comments directed to the authors.

      (1) In Figure 4, the authors should include examples of tubular ER at the synapse, especially as this is an interesting point discussed in ln 226-229. Are there noticeable changes in the ER-mitochondria contacts at the synaptic boutons?

      We agree that examples of tubular ER at the synapse would improve the manuscript. We have now replaced the Figure 4A with such examples. We found it challenging to quantify ER-mitochondria contacts based on the electron microscopy (EM) images we currently have. The ER-mitochondria contact sites are quite rare in the cross-sections of our samples, making it difficult to perform a reliable quantitative analysis.

      (2) The limited impairment of calcium-ion homeostasis in Rab10 KD neurons is very interesting. Would the overexpression of Rab10T23N mimic the effect of a KD scenario? Is there a separation of function for Rab10 in calcium homeostasis vs. the regulation of protein synthesis?

      This is an interesting possibility. We tested this and expressed Rab10T23N in a new series of experiments. These data are presented as a new Figure 5 in the revised manuscript (p. 29). We observed that Ca2+ refilling after caffeine treatment was delayed to a similar extent in Rab10T23N-expressing and Rab10 KD neurons. While impaired Ca2+ homeostasis may affect protein synthesis through ER stress or mTORC1 activation, our findings indicate otherwise in Rab10 KD neurons. First, ATF4 levels, a marker of ER stress, were unaffected in Rab10 KD neurons. This indicates that any ER stress present is minimal or insufficient to significantly impact protein synthesis through this pathway. Second, we did not observe significant changes in mTORC1 activation in Rab10 KD neurons as indicated by a normal pS6K1 phosphorylation (see above). Based on these observations, we conclude that Rab10's roles in calcium homeostasis and protein synthesis are most likely separate.

      (3) The authors indicate that the internal release of calcium ions from the ER has no effect on DCV trafficking and fusion without showing the data. It is important to include this data as the major impact of the study is the dissecting of the calcium effects in mammalian neurons from the previous studies in invertebrates.

      We agree this is an important aspect in our reasoning. We are submitting the related manuscript on internal calcium stores to BioRVix. The link will be added to the consolidated version of our manuscript

      (4) The distinction between Rab3 and Rab10 co-trafficking on DCVs should be reported in the Results (currently, Figure 2 - supplement 3 is only mentioned in the Discussion) as it helps to understand the effects on DCV fusion.

      We agree. We now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6, l. 162-163).

      Reference:

      Balaji, J., Ryan, T.A., 2007. Single-vesicle imaging reveals that synaptic vesicle exocytosis and endocytosis are coupled by a single stochastic mode. Proceedings of the National Academy of Sciences 104, 20576–20581. https://doi.org/10.1073/pnas.0707574105

      Brunner, J.W., Lammertse, H.C.A., Berkel, A.A. van, Koopmans, F., Li, K.W., Smit, A.B., Toonen, R.F., Verhage, M., Sluis, S. van der, 2022. Power and optimal study design in iPSC-based brain disease modelling. Molecular Psychiatry 28, 1545. https://doi.org/10.1038/s41380-022-01866-3

      Emperador-Melero, J., Huson, V., van Weering, J., Bollmann, C., Fischer von Mollard, G., Toonen, R.F., Verhage, M., 2018. Vti1a/b regulate synaptic vesicle and dense core vesicle secretion via protein sorting at the Golgi. Nat Commun 9, 3421. https://doi.org/10.1038/s41467-018-05699-z

      Granseth, B., Odermatt, B., Royle, S.J., Lagnado, L., 2006. Clathrin-Mediated Endocytosis Is the Dominant Mechanism of Vesicle Retrieval at Hippocampal Synapses. Neuron 51, 773–786. https://doi.org/10.1016/j.neuron.2006.08.029

      Ivanova, D., Dobson, K.L., Gajbhiye, A., Davenport, E.C., Hacker, D., Ultanir, S.K., Trost, M., Cousin, M.A., 2021. Control of synaptic vesicle release probability via VAMP4 targeting to endolysosomes. Science Advances 7, eabf3873. https://doi.org/10.1126/sciadv.abf3873

      Kwon, S.E., Chapman, E.R., 2011. Synaptophysin Regulates the Kinetics of Synaptic Vesicle Endocytosis in Central Neurons. Neuron 70, 847–854. https://doi.org/10.1016/j.neuron.2011.04.001

      Reshetniak, S., Fernández-Busnadiego, R., Müller, M., Rizzoli, S.O., Tetzlaff, C., 2020. Quantitative Synaptic Biology: A Perspective on Techniques, Numbers and Expectations. International Journal of Molecular Sciences 21, 7298. https://doi.org/10.3390/ijms21197298

    1. Author response:

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.” The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) Data labeling and additional supporting data

      Major points (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Author response image 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Author response image 3). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Author response image 3.

      Nrn1 antibody blockade in WT iTreg cell culture caused similar phenotypic change as in Nrn1-/- iTreg cells. Nrn1-/- and WT CD4 cells were differentiated under iTreg condition in the presence of anti-Nrn1 (aNrn1) antibody or isotype control for 3 days. Cells were restimulated with anti-CD3 and in the presence of aNrn1 or isotype. a. MP measured 18hr after anti-CD3 restimulation. b. live CD4 cell number and proportion of Ki67 expression among live cells three days after restimulation. c. The proportion of Foxp3+ cells among live cells three days after restimulation.  

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. :1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Li et al investigated how adjuvants such as MPLA and CpG influence antigen presentation at the level of the Antigen-presenting cell and MHCII : peptide interaction. They found that the use of MPLA or CpG influences the exogenous peptide repertoire presented by MHC II molecules. Additionally, their observations included the finding that peptides with low-stability peptide:MHC interactions yielded more robust CD4+ T cell responses in mice. These phenomena were illustrated specifically for 2 pattern recognition receptor activating adjuvants. This work represents a step forward for how adjuvants program CD4+ Th responses and provides further evidence regarding the expected mechanisms of PRR adjuvants in enhancing CD4+ T cell responses in the setting of vaccination.

      Strengths:

      The authors use a variety of systems to analyze this question. Initial observations were collected in an H pylori model of vaccination with a demonstration of immunodominance differences simply by adjuvant type, followed by analysis of MHC:peptide as well as proteomic analysis with comparison by adjuvant group. Their analysis returns to peptide immunization and analysis of strength of relative CD4+ T cell responses, through calculation of IC:50 values and strength of binding. This is a comprehensive work. The logical sequence of experiments makes sense and follows an unexpected observation through to trying to understand that process further with peptide immunization and its impact on Th responses. This work will premise further studies into the mechanisms of adjuvants on T cells.

      Weaknesses:

      Comment 1. While MDP has a different manner of interaction as an adjuvant compared to CpG and MPLA, it is unclear why MDP has a different impact on peptide presentation and it should be further investigated, or at minimum highlighted in the discussion as an area that requires further investigation.

      Thank you for the suggestion. We investigated the reasons for the different effects of MDP on peptide presentation compared with those of CpG and MPLA. We found that the expression of some proteins involved in antigen processing and presentation, such as CTSS, H2-DM, Ifi30, and CD74, was substantially lower in the MDP-treated group than in the CpG- and MPLA-treated groups. To further confirm whether these proteins play a key role during adjuvant modification of peptide presentation, we knocked down them using shRNA and then performed immunopeptidomics. The original mass spectra and peptide spectrum matches have been deposited in the public proteomics repository iProX (https://www.iprox.cn/page/home.html) under accession number IPX0007611000. Unfortunately, the expected results for peptide presentation repertoires were not observed. Thus, we hypothesized that the different effects of MDP on peptide presentation might not result from differences in protein expression. We cannot exclude the possibility that some other proteins that may be important in this process were overlooked. We are still working on the mechanisms and do not have an exact conclusion. Thus, we did not present related data in this manuscript.

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation. Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

      Comment 2. It is alluded by the authors that TLR activating adjuvants mediate selective, low affinity, exogenous peptide binding onto MHC class II molecules. However, this was not demonstrated to be related specifically to TLR binding. I wonder if some work with TLR deficient mice (TLR 4KO for example) could evaluate this phenomenon more specifically.

      Thank you for the suggestion. This is an important point that was overlooked in this study. Based on published research on the mechanisms of PRR adjuvants, CpG and MPLA, we believe that the effect of CpG and MPLA on APCs-selective epitope presentation needs to be bound to the corresponding receptor, although we did not give a definitive conclusion in the manuscript.

      To confirm the TLR-activating adjuvants affecting peptides presented on MHC molecules specifically through TLR binding, we have used CRISPR-cas9 to knock out TLR4 and TLR9 of A20 cells and repeated the experiments, as suggested. We chose TLR4- and TLR9- knockout A20 cell lines instead of TLR-deficient mice because a large number of APCs are required for immunopeptidomics. Moreover, the data observed in this study were based on the A20 cell line. However, these experiments are time-consuming. Unfortunately, we were unable to provide timely data. In addition, we believe that elucidating the downstream molecular mechanisms of TLR activation is necessary, as mentioned in comment 1. All these data will be combined and reported in our upcoming publications.

      Comment 3. It is unclear to me if this observation is H pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental. Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      Q1: It is unclear to me if this observation is H. pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental.

      Thank you for the comment. To confirm the effect of the adjuvant on the exogenous peptide repertoire presented by MHC II molecules, a set of antigens from another bacterium, Pseudomonas aeruginosa, was used, and the experiments were repeated. The A20 cells were treated with CpG and pulsed with Pseudomonas aeruginosa antigens. Twelve hours later, MHC-II–peptide complexes were immunoprecipitated, and immunopeptidomics were performed. The data are shown below (Author response image 1). Information on the MHC-peptides from Pseudomonas aeruginosa is given in the Supplementary Table named “Table S3 Response to comment3”. A total of 713 and 205 bacterial peptides were identified in the PBS and CpG groups (Author response image 1A). The number of exogenous peptides in the CpG-treated group was significantly lower than that in the PBS-treated control group (Author response image 1B). A total of 568 bacterial peptides were presented only in the PBS group; 60 bacterial peptides were presented in the CpG-treated group, and 145 bacterial peptides were presented in both groups (Author response image 1C). We then analyzed the MHC-binding stability of the peptides present in the adjuvant-treated group and that of the peptide-deficient after adjuvant stimulation using the IEDB website. We found that the IC50 of the peptides in the adjuvant-treated group were much higher than those of the deficient peptides, which indicated that the peptides presented in the CpG-treated groups have lower binding stability for MHC-II (Author response image 1D). These results indicate that CpG adjuvant affects the presentation of exogenous peptides with high binding stability, which is consistent with the data reported in our manuscript. Using another set of antigens, we confirmed that our observations were not H. pylori model- or antigen-specific.

      Author response image 1.

      MHC-II peptidome measurements in adjuvant-treated APCs pulsed with Pseudomonas aeruginosa antigens. (A) Total number of bacterial peptides identified in the PBS- and CpG-treated groups. (B) The number and length distribution of bacterial peptides in different groups were compared. (C) Venn diagrams showing the distribution of bacterial peptides in different groups. (D) IC50 of the presented, deficient, and co-presented peptides post-adjuvant stimulation from immunopeptidome binding to H2-IA and H2-IE were predicted using the IEDB website. High IC50 means low binding stability. *p<0.05, **p<0.01.

      Q2: Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      In this study, we used a peptide immunization experiment to evaluate the responses induced by the screened peptides with different stabilities. In addition to this method, tetramer staining and ELISA have been used to assess epitope-specific T-cell proliferation and cytokine secretion. Among these, tetramer staining is often used in studies involving model antigens. However, as many peptides were screened in our study, synthesizing a sufficient number of tetramers was difficult. However, we believe that the experimental data obtained in this study support the conclusion. Nevertheless, we agree that more methods applied will make the pattern more clearly.

      Reviewer #2 (Public Review):

      Adjuvants boost antigen-specific immune responses to vaccines. However, whether adjuvants modulate the epitope immunodominance and the mechanisms involved in adjuvant's effect on antigen processing and presentation are not fully characterized. In this manuscript, Li et al report that immunodominant epitopes recognized by antigen-specific T cells are altered by adjuvants.

      Using MPLA, CpG, and MDP adjuvants and H. pylori antigens, the authors screened the dominant epitopes of Th1 responses in mice post-vaccination with different adjuvants and found that adjuvants altered antigen-specific CD4+ T cell immunodominant epitope hierarchy. They show that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of APCs. Surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by APCs. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively.

      Thanks a lot for your comments.

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1. Figure 6: The peptides considered low affinity- it would be helpful to specify from which adjuvant they were collected from. When they are pooled it is unclear if we are analyzing peptides collected from adjuvanting with any of the three adjuvants studied.

      Thank you for the suggestion. The related description in Figure 6 has been modified in the revised manuscript. Data for the peptides identified from the adjuvants MPLA- and CpG-treated groups are shown separately.

      Recommendation 2. It is unclear to me why the A20 cell line is less preferred to the J774 line for the immunopeptidome analysis - can the authors expand on this?

      We apologize for not clearly explaining this in the original manuscript. In fact, the A20 cell line is better than J774A.1 cell line for immunopeptidomics experiments. Compared to J774A.1 cells, more MHC-II peptides were obtained from a smaller number of A20 cells using immunopeptidomics. At the beginning of this study, we chose the J774A.1 cell line as it is a macrophage cell line. J774A.1 cells (up to 5×108) were pulsed with the antigens, and MHC-II–peptide complexes were eluted from the cell surface for immunopeptidomics. Unfortunately, only a few hundred peptides from the host were detected and no exogenous peptides were detected. Next, we tested the A20 cell line. In total, 108 A20 cells were used in this study. More than 3500 host peptides and approximately 50 exogenous peptides have been identified. These data indicate that the A20 cell line was better.

      To investigate the reasons for this, we detected MHC-II expression on cell surfaces using FACS. Our purpose was to elute peptides from MHC–peptide complexes present on the cell surface. Low MHC expression resulted in the elution of a few peptides. We found the MFI of MHC-II molecules on J774A.1 cell is about 500; however, the MFI of MHC-II molecules on A20 cells is more than 300,000. These data indicate that MHC-II expression on A20 cells was much higher than that on J774A.1 cells. J774A.1 cell is a macrophage cell line. Macrophages have excellent antigen phagocytic capabilities; however, their ability to present antigens is relatively weak. MHC molecules on the macrophage cell surface can be upregulated in the stimulation of some cytokines, for example, IFN-γ. In this study, we used adjuvants as stimulators and did not want to use additional cytokine stimulators. Thus, J774A.1 cells were not used in the present study.

      The related statements are reflected on page 6 lines 120–128 “We also selected another H-2d cell J774A.1, a macrophage cell line, for immunopeptidome analysis in this study. Briefly, 5×108 J774A.1 cells were used for immunopeptidomics. Moreover, fewer than 350 peptides were observed at a peptide spectrum match (PSM) level of < 1.0% false discovery rate (FDR). However, more than 5500 peptides were detected in 108 A20 cells at FDR < 1.0% (Figure S2A). CD86 and MHC-II molecule expression on J774A.1 cells was substantially lower than that on A20 cells (Figure S2B). Low MHC-II expression on J774A.1 cells could be the reason for the lack of peptides identified by LC–MS/MS. Thus, A20 cells instead of J774A.1 cells were used for the subsequent experiments.”

      Recommendation 3. Lines 172-177, can more details be provided about the whole proteome analysis? The plots are shown for relative representation of protein expression to PBS, but it is unclear to me what examples of these proteins are (IFN pathway, Ubiquitination pathway). Could these be confirmed by protein expression analyses in supplemental?

      Thank you for the suggestion. In this study, we conducted whole proteome analysis to investigate changes in protein expression across different pathways in the adjuvant groups. Through KEGG enrichment analysis, we compared the differential expression of MHC presentation pathway proteins (such as H2-M, Ifi30, CD74, CTSS, proteasome, and peptidase subunits) between the PBS- and adjuvant-treated groups using our proteome data. In addition, we focused on IFN and ubiquitination pathways that play crucial roles in antigen presentation modification and immune response. The proteins and their relative expression in these pathways are shown in Figure S4B. Details regarding the protein names and expressions are provided in Supplemental Table S2 of the revised manuscript.

      The original statements in the results “Then, we analyzed the whole proteome data to determine whether the proteins involved in antigen presentation and processing were altered. We found that proteins involved in antigen processing, peptidase function, ubiquitination pathway, and interferon (IFN) signaling were altered post adjuvants treatment, especially in MPLA and CpG groups (Figure 5C; Figure S4B and S4C). These data suggest that adjuvants MPLA and CpG may affect the antigen processing of APCs, resulting in fewer peptides presentation.” This has been revised on page 8 lines 172–182 as “We then investigated whole-proteome data to determine the evidence of adjuvant modification of antigen presentation. We focused on the proteins involved in antigen processing, peptidase function, ubiquitination pathway, and IFN signaling. The ubiquitination pathway and IFN signaling play crucial roles in the modification of antigen presentation and immune responses. Through KEGG enrichment analysis, we found that many proteins involved in antigen processing, peptidase function, ubiquitination pathways, and IFN signaling were altered after adjuvant treatment, particularly in the MPLA- and CpG-treated groups (Figure 5C; Figure S4B). The expression of each protein is shown in Figure S4C and Supplementary Table 2. These data suggest that MPLA and CpG adjuvants may affect the antigen processing of APCs, resulting in fewer peptide presentations.”

      Recommendation 4. Lines 212-218: I think there needs to be more discussion of interpretation here. Only one of the low-stability peptides required low concentrations for CD4+ T cell responses in vitro. What about the other peptides in the analysis? Perhaps if the data is taken together there is not a clear pattern?

      Thank you for the comment. In this study, epitope-specific CD4+ T-cells were expanded in vitro from the spleens of peptide-pool-immunized mice. T-cell responses to individual peptides were detected using ICS and FACS. Only one peptide, recA #23, with low binding stability, and one high-stability peptide, ureA #2, induced effective T-cell responses. Peptide ureA #3 with high stability induces low Th1 responses. The other peptides cannot induce CD4+ T-cell secreting IFN-γ (Data are shown in Author response image 2). Thus, we compared the strength of IFN-γ responses induced by these three peptides at a set of low concentrations. Data for other peptides without any response could not be taken together.

      Author response image 2.

      The expanded CD4+T cells from peptides immunized mice were screened for their response to the peptides in an ICS assay.

      In this study, we used a peptide pool containing four low-stability peptides to vaccinate mice; however, only one peptide induced an effective CD4+ T-cell response. We speculate that the possible reasons are as follows. First, the number of peptides used for vaccination is too small. Only four low-stability peptides were synthesized and used to immunize mice. Three of these could not induce an effective T-cell response, possibly because of their low immunogenicity. If more peptides are synthesized and used, more peptides that induce T-cell responses may be observed. Second, epitope-specific T-cell responses are variable. Responses to the subdominant peptides can be inhibited by the dominant peptide. The subdominant peptide can become dominant by changing the peptide dose or in the absence of the dominant peptide. Thus, we believe that responses to the other three peptides may be detected if mice are immunized with a peptide pool that does not contain a response epitope.

      The corresponding statements have been added to the Discussion section on page 13 lines 287–291 as “Unfortunately, only one peptide, recA #23, with low binding stability and induced significant Th1 responses, was identified in this study. To further confirm that low-stability peptides can induce stronger and higher TCR-affinity antigen-specific T-cell clonotype responses than high-stability peptides, further studies should monitor more peptides with different stabilities.”

      Recommendation 5. There are some areas where additional editing to text would be beneficial due to grammar (eg lines 122-126; line 116, etc).

      The manuscript has been edited by a professional language editing company.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1. It is interesting that there was no difference in IFNg responses induced by different adjuvants.

      Thank you for the comment. Possible reasons for the lack of difference in IFN-γ responses could be as follows. First, all adjuvants used in this study have been confirmed to effectively induce Th1 responses. Second, in this study, IFN-γ responses were examined using expanded antigen-specific T cells in vitro. The in vitro cell expansion efficiency may have affected these results.

      Recommendation 2. The data to support the claim that changes in exogenous peptide presentation among adjuvant groups were not due to differences in antigen phagocytosis is insufficient.

      Thank you for the comment. In this study, proteomics of A20 cells pulsed with antigens in different adjuvant-treated groups were used to determine exogenous antigens phagocytosed by cells. In addition, we used fluorescein isothiocyanate (FITC)-labeled OVA to pulse APCs and detected antigen phagocytosis by APCs after treatment with different adjuvants. The MFI of FITC was detected by FACS at different time points. The data are shown below (Author response image 3). No obvious differences in FITC MFI were detected after adjuvant stimulation, indicating that antigen phagocytosis among the adjuvant groups was almost the same.

      A20 cells, used as APCs, are the B-cell line. Antigen recognition and phagocytosis by B-cells depends on the B-cell receptor (BCR) on the cell surface. The ability of BCRs to bind to different antigens varies, leading to significant differences in the phagocytosis of different antigens by B-cells. Therefore, detecting the phagocytosis of a single antigen may not reflect the overall phagocytic state of the B-cells. Thus, in this study, we used proteomics to detect exogenous proteins in B-cells pulsed with H. pylori antigens, which contain thousands of components, to evaluate their overall phagocytic capacity. Only the proteomic data are presented in our manuscript.

      Author response image 3.

      Antigen phagocytosis of A20 cells were measured using FITC-labeled OVA. (A) A20 cells were pulsed with FITC-labeled OVA. MFI of FITC was measured after 1 h. (B) MFI of FITC was examined post the stimulation of adjuvants at different time points.

      Recommendation 3. It is not clear how MPLA, CpG, and MDP adjuvants modulate the presentation of low vs high stability peptides.

      Thank you for pointing this out. We acknowledge that we did not clarify the mechanisms by which adjuvants affect the stability of the peptide presentations of APCs.

      We performed experiments to detect the expression of proteins involved in antigen processing and presentation in the different adjuvant-treated groups. Furthermore, shRNAs were used to knock down the expression of key molecules. Immunopeptidomics was used to detect peptide presentation. Unfortunately, the expected results for peptide presentation repertoires were not observed. We are still working on the mechanisms.

      Please also see our response to comment 1 of reviewer 1

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation.  Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review): 

      The reviewer retained most of their comments from the previous reviewing round. In order to meet these comments and to further examine the dynamic nature of threat omission-related fMRI responses, we now re-analyzed our fMRI results using the single trial estimates. The results of these additional analyses are added below in our response to the recommendations for the authors of reviewer 1. However, we do want to reiterate that there was a factually incorrect statement concerning our design in the reviewer’s initial comments. Specifically, the reviewer wrote that “25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%.” We want to repeat that this is not what we did. 100% trials were always reinforced (100% reinforcement rate); 0% trials were never reinforced (0% reinforcement rate). For all other instructed probability levels (25%, 50%, 75%), the stimulation was delivered in 25% of the trials (25% reinforcement rate). We have elaborated on this misconception in our previous letter and have added this information more explicitly in the previous revision of the manuscript (e.g., lines 125-129; 223-224; 486-492).   

      Reviewer #1 (Recommendations For The Authors): 

      I do not have any further recommendations, although I believe an analysis of learning-related changes is still possible with the trial-wise estimates from unreinforced trials. The authors' response does not clarify whether they tested for interactions with run, and thus the fact that there are main effects does not preclude learning. I kept my original comments regarding limitations, with the exception of the suggestion to modify the title. 

      We thank the reviewer for this recommendation. In line with their suggestion, we have now reanalyzed our main ROI results using the trial-by-trial estimates we obtained from the firstlevel omission>baseline contrasts. Specifically, we extracted beta-estimates from each ROI and entered them into the same Probability x Intensity x Run LMM we used for the relief and SCR analyses. Results from these analyses (in the full sample) were similar to our main results. For the VTA/SN model, we found main effects of Probability (F = 3.12, p = .04), and Intensity (F = 7.15, p < .001) (in the model where influential outliers were rescored to 2SD from mean). There was no main effect of Run (F = 0.92, p = .43) and no Probability x Run interaction (F = 1.24, p = .28). If the experienced contingency would have interfered with the instructions, there should have been a Probability x Run interaction (with the effect of Probability only being present in the first runs). Since we did not observe such an interaction, our results indicate that even though some learning might still have taken place, the main effect of Probability remained present throughout the task.  

      There is an important side note regarding these analyses: For the first level GLM estimation, we concatenated the functional runs and accounted for baseline differences between runs by adding run-specific intercepts as regressors of no-interest. Hence, any potential main effect of run was likely modeled out at first level. This might explain why, in contrast to the rating and SCR results (see Supplemental Figure 5), we found no main effect of Run. Nevertheless, interaction effects should not be affected by including these run-specific intercepts.

      Note that when we ran the single-trial analysis for the ventral putamen ROI, the effect of intensity became significant (F = 3.89, p = .02). Results neither changed for the NAc, nor the vmPFC ROIs.  

      Reviewer #2 (Public Review): 

      Comments on revised version: 

      I want to thank the authors for their thorough and comprehensive work in revising this manuscript. I agree with the authors that learning paradigms might not be a necessity when it comes to study the PE signals, but I don't particularly agree with some of the responses in the rebuttal letter ("Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted."). This is of course correct description for the conditioning paradigm, but the same can be said for an instructed design: the aversive outcome was either delivered or not. That being said, adopting the instructed design itself is legitimate in my opinion. 

      We thank the reviewer for this comment. We have now modified the phrasing of this argument to clarify our reasoning (see lines 102-104: “First, these only included one level of aversive outcome: the electrical stimulation was either delivered at a fixed intensity, or omitted; but the intensity of the stimulation was never experimentally manipulated within the same task.”).  

      The reason why we mentioned that “the aversive outcome is either delivered or omitted” is because in most contemporary conditioning paradigms only one level of aversive US is used. In these cases, it is therefore not possible to investigate the effect of US Intensity. In our paradigm, we included multiple levels of aversive US, allowing us to assess how the level of aversiveness influences threat omission responding. It is indeed true that each level was delivered or not. However, our data clearly (and robustly across experiments, see Willems & Vervliet, 2021) demonstrate that the effects of the instructed and perceived unpleasantness of the US (as operationalized by the mean reported US unpleasantness during the task) on the reported relief and the omission fMRI responses are stronger than the effect of instructed probability.  

      My main concern, which the authors spent quite some length in the rebuttal letter to address, still remains about the validity for different instructed probabilities. Although subjects were told that the trials were independent, the big difference between 75% and 25% would more than likely confuse the subjects, especially given that most of us would fall prey to the Gambler's fallacy (or the law of small numbers) to some degree. When the instruction and subjective experience collides, some form of inference or learning must have occurred, making the otherwise straightforward analysis more complex. Therefore, I believe that a more rigorous/quantitative learning modeling work can dramatically improve the validity of the results. Of course, I also realize how much extra work is needed to append the computational part but without it there is always a theoretical loophole in the current experimental design. 

      We agree with the reviewer that some learning may have occurred in our task. However, we believe the most important question in relation to our study is: to what extent did this learning influence our manipulations of interest?  

      In our reply to reviewer 1, we already showed that a re-analysis of the fMRI results using the trial-by-trial estimates of the omission contrasts revealed no Probability x Run interaction, suggesting that – overall – the probability effect remained stable over the course of the experiment. However, inspired by the alternative explanation that was proposed by this reviewer, we now also assessed the role of the Gambler’s fallacy in a separate set of analyses. Indeed, it is possible that participants start to expect a stimulation more after more time has passed since the last stimulation was experienced. To test this alternative hypothesis, we specified two new regressors that calculated for each trial of each participant how many trials had passed since the last stimulation (or since the beginning of the experiment) either overall (across all trials of all probability types; hence called the overall-lag regressor) or per probability level (across trials of each probability type separately; hence called the lag-per-probability regressor). For both regressors a value of 0 indicates that the previous trial was either a stimulation trial or the start of experiment, a value of 1 means that the last stimulation trial was 2 trials ago, etc.  

      The results of these additional analyses are added in a supplemental note (see supplemental note 6), and referred to in the main text (see lines 231-236: “Likewise, a post-hoc trial-by-trial analysis of the omission-related fMRI activations confirmed that the Probability effect for the VTA/SN activations was stable over the course of the experiment (no Probability x Run interaction) and remained present when accounting for the Gambler’s fallacy (i.e., the possibility that participants start to expect a stimulation more when more time has passed since the last stimulation was experienced) (see supplemental note 6). Overall, these post-hoc analyses further confirm the PE-profile of omission-related VTA/SN responses”.  

      Addition to supplemental material (pages 16-18)

      Supplemental Note 6: The effect of Run and the Gambler’s Fallacy 

      A question that was raised by the reviewers was whether omission-related responses could be influenced by dynamical learning or the Gambler’s Fallacy, which might have affected the effectiveness of the Probability manipulation.  

      Inspired by this question, we exploratorily assessed the role of the Gambler’s Fallacy and the effects of Run in a separate set of analyses. Indeed, it is possible that participants start to expect a stimulation more when more time has passed since the last stimulation was experienced. To test this alternative hypothesis, we specified two new regressors that calculated for each trial of each participant how many trials had passed since the last stimulation (or since the beginning of the experiment) either overall (across all trials of all probability types; hence called the overall-lag regressor) or per probability level (across trials of each probability type separately; hence called the lag-per-probability regressor). For both regressors a value of 0 indicates that the previous trial was either a stimulation trial or the start of experiment, a value of 1 means that the last stimulation trial was 2 trials ago, etc.  

      The new models including these regressors for each omission response type (i.e., omission-related activations for each ROI, relief, and omission-SCR) were specified as follows:   

      (1) For the overall lag:

      Omission response ~ Probability * Intensity * Run + US-unpleasantness + Overall-lag + (1|Subject).  

      (2) For the lag per probability level:

      Omission response ~ Probability * Intensity * Run + US-unpleasantness + Lag-perprobability : Probability + (1|Subject).  

      Where US-unpleasantness scores were mean-centered across participants; “*” represents main effects and interactions, and “:” represents an interaction (without main effect). Note that we only included an interaction for the lag-per-probability model to estimate separate lag-parameters for each probability level.  

      The results of these analyses are presented in the tables below. Overall, we found that adding these lag-regressors to the model did not alter our main results. That is: for the VTA/SN, relief and omission-SCR, the main effects of Probability and Intensity remained. Interestingly, the overall-lag-effect itself was significant for VTA/SN activations and omission SCR, indicating that VTA/SN activations were larger when more time had passed since the last stimulation (beta = 0.19), whereas SCR were smaller when more time had passed (beta = -0.03). This pattern is reminiscent of the Perruchet effect, namely that the explicit expectancy of a US increases over a run of non-reinforced trials (in line with the gambler’s fallacy effect) whereas the conditioned physiological response to the conditional stimulus declines (in line with an extinction effect, Perruchet, 1985; McAndrew, Jones, McLaren, & McLaren, 2012). Thus, the observed dissociation between the VTA/SN activations and omission SCR might similarly point to two distinctive processes where VTA/SN activations are more dependent on a consciously controlled process that is subjected to the gambler’s fallacy, whereas the strength of the omission SCR responses is more dependent on an automatic associative process that is subjected to extinction. Importantly, however, even though the temporal distance to the last stimulation had these opposing effects on VTA/SN activations and omission SCRs, the main effects of the probability manipulation remained significant for both outcome variables. This means that the core results of our study still hold.   

      Next to the overall-lag effect, the lag-per-probability regressor was only significant for the vmPFC. A follow-up of the beta estimates of the lag-per-probability regressors for each probability level revealed that vmPFC activations increased with increasing temporal distance from the stimulation, but only for the 50% trials (beta = 0.47, t = 2.75, p < .01), and not the 25% (beta = 0.25, t = 1.49, p = .14) or the 75% trials (beta = 0.28, t = 1.62, p = .10).

      Author response table 1.

      F-statistics and corresponding p-values from the overall lag model. (*) F-test and p-values were based on the model where outliers were rescored to 2SD from the mean. Note that when retaining the influential outliers for this model, the p-value of the probability effect was p = .06. For all other outcome variables, rescoring the outliers did not change the results. Significant effects are indicated in bold.

      Author response table 2.

      F-statistics and corresponding p-values from the lag per probability level model. (*) F-test and p-values were based on the model where outliers were rescored to 2SD from the mean. Note that when retaining the influential outliers for this model, the p-value of the Intensity x Run interaction was p = .05. For all other outcome variables, rescoring the outliers did not change the results. Significant effects are indicated in bold.

      As the authors mentioned in the rebuttal letter, "selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants", only ~1/3 of the subjects actually showed strong evidence for the validity of the instructions. This further raises the question of whether the instructed design, due to the interference of false instruction and the dynamic learning among trials, is solid enough to test the hypothesis .  

      We agree with the reviewer that a monotonic increase in anticipatory SCR with increasing probability instructions would provide the strongest evidence that the manipulation worked. However, it is well known that SCR is a noisy measure, and so the chances to see this monotonic increase are rather small, even if the underlying threat anticipation increases monotonically. Furthermore, between-subject variation is substantial in physiological measures, and it is not uncommon to observe, e.g., differential fear conditioning in one measure, but not in another (Lonsdorf & Merz, 2017). It is therefore not so surprising that ‘only’ 1/3 of our participants showed the perfect pattern of monotonically increasing SCR with increasing probability instructions. That being said, it is also important to note that not all participants were considered for these follow-up analyses because valid SCR data was not always available.

      Specifically, N = 4 participants were identified as anticipation non-responders (i.e. participant with smaller average SCR to the clock on 100% than on 0% trials; pre-registered criterium) and were excluded from the SCR-related analyses, and N = 1 participant had missing data due to technical difficulties. This means that only 26 (and not 31) participants were considered for the post hoc analyses. Taking this information into account, this means that 21 out of 26 participants (approximately 80%) showed stronger anticipatory SCR following 75% instructions compared to 25% instructions and that  11 out of 26 participants (approximately 40%) even showed the monotonical increase in their anticipatory SCR (see supplemental figure 4). Furthermore, although anticipatory SCR gradually decreased over the course of the experiment, there was no Run x Probability interaction, indicating that the instructions remained stable throughout the task (see supplemental figure 3).  

      Reviewer #2 (Recommendations For The Authors):

      A more operational approach might be to break the trials into different sections along the timeline and examine how much the results might have been affected across time. I expect the manipulation checks would hold for the first one or two runs and the authors then would have good reasons to focus on the behavioral and imaging results for those runs. 

      This recommendation resembles the recommendation by reviewer 1. In our reply to reviewer 1, we showed the results of a re-analysis of the fMRI data using the trial-by-trial estimates of the omission contrasts, which revealed no Probability x Run interaction, suggesting that – overall - the probability effect remained (more or less) stable over the course of the experiment.  For a more in depth discussion of the results of this additional analysis, we refer to our answer to reviewer 1.  

      Reviewer #3 (Public Review): 

      Comments on revised version: 

      The authors were extremely responsive to the comments and provided a comprehensive rebuttal letter with a lot of detail to address the comments. The authors clarified their methodology, and rationale for their task design, which required some more explanation (at least for me) to understand. Some of the design elements were not clear to me in the original paper. 

      The initial framing for their study is still in the domain of learning. The paper starts off with a description of extinction as the prime example of when threat is omitted. This could lead a reader to think the paper would speak to the role of prediction errors in extinction learning processes. But this is not their goal, as they emphasize repeatedly in their rebuttal letter. The revision also now details how using a conditioning/extinction framework doesn't suit their experimental needs. 

      We thank the reviewer for pointing out this potential cause of confusion. We have now rewritten the starting paragraph of the introduction to more closely focus on prediction errors, and only discuss fear extinction as a potential paradigm that has been used to study the role of threat omission PE for fear extinction learning (see lines 40-55). We hope that these adaptations are sufficient to prevent any false expectations. However, as we have mentioned in our previous response letter, not talking about fear extinction at all would also not make sense in our opinion, since most of the knowledge we have gained about threat omission prediction errors to date is based on studies that employed these paradigms.  

      Adaptation in the revised manuscript (lines 40-55):  

      “We experience pleasurable relief when an expected threat stays away1. This relief indicates that the outcome we experienced (“nothing”) was better than we expected it to be (“threat”). Such a mismatch between expectation and outcome is generally regarded as the trigger for new learning, and is typically formalized as the prediction error (PE) that determines how much there can be learned in any given situation2. Over the last two decades, the PE elicited by the absence of expected threat (threat omission PE) has received increasing scientific interest, because it is thought to play a central role in learning of safety. Impaired safety learning is one of the core features of clinical anxiety4. A better understanding of how the threat omission PE is processed in the brain may therefore be key to optimizing therapeutic efforts to boost safety learning. Yet, despite its theoretical and clinical importance, research on how the threat omission PE is computed in the brain is only emerging.  

      To date, the threat omission PE has mainly been studied using fear extinction paradigms that mimic safety learning by repeatedly confronting a human or animal with a threat predicting cue (conditional stimulus, CS; e.g. a tone) in the absence of a previously associated aversive event (unconditional stimulus, US; e.g., an electrical stimulation). These (primarily non-human) studies have revealed that there are striking similarities between the PE elicited by unexpected threat omission and the PE elicited by unexpected reward.”

      It is reasonable to develop a new task to answer their experimental questions. By no means is there a requirement to use a conditioning/extinction paradigm to address their questions. As they say, "it is not necessary to adopt a learning paradigm to study omission responses", which I agree with.  But the authors seem to want to have it both ways: they frame their paper around how important prediction errors are to extinction processes, but then go out of their way to say how they can't test their hypotheses with a learning paradigm.

      Part of their argument that they needed to develop their own task "outside of a learning context" goes as follows: 

      (1) "...conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested." 

      (2) "....in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses" 

      (3) "...because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction, which further reduces the necessary variability to properly evaluate the probability axiom" 

      These points seem to hinge on how tasks are "generally" constructed. However, there are many adaptations to learning tasks:

      (1) There is no rule that conditioning can't include different levels of aversive outcomes following different cues. In fact, their own design uses multiple cues that signal different intensities and probabilities. Saying that conditioning "generally only include one level of aversive outcome" is not an explanation for why "these paradigms are not tailored" for their research purposes. There are also several conditioning studies that have used different cues to signal different outcome probabilities. This is not uncommon, and in fact is what they use in their study, only with an instruction rather than through learning through experience, per se.

      (2) Conditioning/extinction doesn't have to occur fast. Just because people "generally learn fast" doesn't mean this has to be the case. Experiments can be designed to make learning more challenging or take longer (e.g., partial reinforcement). And there can be intra-individual differences in conditioning and extinction, especially if some cues have a lower probability of predicting the US than others. Again, because most conditioning tasks are usually constructed in a fairly simplistic manner doesn't negate the utility of learning paradigms to address PEaxioms.

      (3) Many studies have tracked trial-by-trial BOLD signal in learning studies (e.g., using parametric modulation). Again, just because other studies "often pool across trials" is not an explanation for these paradigms being ill-suited to study prediction errors. Indeed, most computational models used in fMRI are predicated on analyzing data at the trial level. 

      We thank the reviewer for these remarks. The “fear conditioning and extinction paradigms” that we were referring to in this paragraph were the ones that have been used to study threat omission PE responses in previous research (e.g., Raczka et al., 2011; Thiele et al. 2021; Lange et al. 2020; Esser et al., 2021; Papalini et al., 2021; Vervliet et al. 2017). These studies have mainly used differential/multiple-cue protocols where either one (or two) CS+  and one CS- are trained in an acquisition phase and extinguished in the next phase. Thus, in these paradigms: (1) only one level of aversive US is used; and (2) as safety learning develops over the course of extinction, there are relatively few omission trials during which “large” threat omission PEs can be observed (e.g. from the 24 CS+ trials that were used during extinction in Esser et al., the steepest decreases in expectancy – and thus the largest PE – were found in first 6 trials); and (3) there was never absolute certainty that the stimulation will no longer follow. Some of these studies have indeed estimated the threat omission PE during the extinction phase based on learning models, and have entered these estimates as parametric modulators to CS-offset regressors. This is very informative. However, the exact model that was used differed per study (e.g. Rescorla-Wagner in Raczka et al. and Thiele et al.; or a Rescorla- Wagner–Pearce- Hall hybrid model in Esser et al.). We wanted to analyze threat omission-responses without commitment to a particular learning model. Thus, in order to examine how threat omissionresponses vary as a function of probability-related expectations, a paradigm that has multiple probability levels is recommended (e.g. Rutledge et al., 2010; Ojala et al., 2022)

      The reviewer rightfully pointed out that conditioning paradigms (more generally) can be tailored to fit our purposes as well. Still, when doing so, the same adaptations as we outlined above need to be considered: i.e. include different levels of US intensity; different levels of probability; and conditions with full certainty about the US (non)occurrence. In our attempt to keep the experimental design as simple and straightforward as possible, we decided to rely on instructions for this purpose, rather than to train 3 (US levels) x 5 (reinforcement levels) = 15 different CSs. It is certainly possible to train multiple CSs of varying reinforcement rates (e.g. Grings et al. 1971, Ojala et al., 2022). However, given that US-expectation on each trial would primarily depend on the individual learning processes of the participants, using a conditioning task would make it more difficult to maintain experimental control over the level of USexpectation elicited by each CS. As a result, this would likely require more extensive training, and thus prolong the study procedure considerably. Furthermore, even though previous studies have trained different CSs for different reinforcement rates, most of these studies have only used one level of US. Thus, in order to not complexify our task to much, we decided to rely on instructions rather than to train CSs for multiple US levels (in addition to multiple reinforcement rates).

      We have tried to clarify our reasoning in the revised version of the manuscript (see introduction, lines 100-113):  

      “The previously discussed fear conditioning and extinction studies have been invaluable for clarifying the role of the threat omission PE within a learning context. However, these studies were not tailored to create the varying intensity and probability-related conditions that are required to systematically evaluate the threat omission PE in the light of the PE axioms. First, these only included one level of aversive outcome: the electrical stimulation was either delivered or omitted; but the intensity of the stimulation was never experimentally manipulated within the same task. As a result, the magnitude-related axiom could not be tested. Second, as safety learning progressively developed over the course of extinction learning, the most informative trials to evaluate the probability axiom (i.e. the trials with the largest PE) were restricted to the first few CS+ offsets of the extinction phase, and the exact number of these informative trials likely differed across participants as a result of individually varying learning rates. This limited the experimental control and necessary variability to systematically evaluate the probability axiom. Third, because CS-US contingencies changed over the course of the task (e.g. from acquisition to extinction), there was never complete certainty about whether the US would (not) follow. This precluded a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether brain responses to the threat omission are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”

      Again, the authors are free to develop their own task design that they think is best suited to address their experimental questions. For instance, if they truly believe that omission-related responses should be studied independent of updating. The question I'm still left puzzling is why the paper is so strongly framed around extinction (the word appears several times in the main body of the paper), which is a learning process, and yet the authors go out of their way to say that they can only test their hypotheses outside of a learning paradigm. 

      As we have mentioned before, the reason why we refer to extinction studies is because most evidence on threat omission PE to date comes from fear extinction paradigms.  

      The authors did address other areas of concern, to varying extents. Some of these issues were somewhat glossed over in the rebuttal letter by noting them as limitations. For example, the issue with comparing 100% stimulation to 0% stimulation, when the shock contaminates the fMRI signal. This was noted as a limitation that should be addressed in future studies, bypassing the critical point. 

      It is unclear to us what the reviewer means with “bypassing the critical point”. We argued in the manuscript that the contrast we initially specified and preregistered to study axiom 3 (fully predicted outcomes elicit equivalent activation) could not be used for this purpose, as it was confounded by the delivery of the stimulation. Because 100% trials aways included the stimulation and 0% trials never included stimulation, there was no way to disentangle activations related to full predictability from activations related to the stimulation as such.   

      Reviewer #3 (Recommendations For The Authors): 

      I'm not sure the new paragraph explaining why they can't use a learning task to test their hypotheses is very convincing, as I noted in my review. Again, it is not a problem to develop a new task to address their questions. They can justify why they want to use their task without describing (incorrectly in my opinion) that other tasks "generally" are constructed in a way that doesn't suit their needs. 

      For an overview of the changes we made in response to this recommendation, we refer to our reply to the public review.   

      We look forward to your reply and are happy to provide answers to any further questions or comments you may have.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study describes a new computational method for unsupervised (i.e., non-artificial intelligence) segmentation of objects in grayscale images that contain substantial noise, to differentiate object, no object, and noise. Such a problem is essential in biology because they are commonly confronted in the analysis of microscope images of biological samples and recently have been resolved by artificial intelligence, especially by deep neural networks. However, training artificial intelligence for specific sample images is a difficult task and not every biological laboratory can handle it. Therefore, the proposed method is particularly appealing to laboratories with little computational background. The method was shown to achieve better performance than a threshold-based method for artificial and natural test images. To demonstrate the usability, the authors applied the method to high-power confocal images of the thalamus for the identification and quantification of immunostained potassium ion channel clusters formed in the proximity of large axons in the thalamic neuropil and verified the results in comparison to electron micrographs.

      Strengths:

      The authors claim that the proposed method has higher pixel-wise accuracy than the threshold-based method when applied to gray-scale images with substantial noises.

      Since the method does not use artificial intelligence, training and testing are not necessary, which would be appealing to biologists who are not familiar with machine learning technology.

      The method does not require extensive tuning of adjustable parameters (trying different values of "Moran's order") given that the size of the object in question can be estimated in advance.

      We appreciate the positive assessment of our approach.

      Weaknesses:

      It is understood that the strength of the method is that it does not depend on artificial intelligence and therefore the authors wanted to compare the performance with another non-AI method (i.e. the threshold-based method; TBM). However, the TBM used in this work seems too naive to be fairly compared to the expensive computation of "Moran's I" used for the proposed method. To provide convincing evidence that the proposed method advances object segmentation technology and can be used practically in various fields, it should be compared to other advanced methods, including AI-based ones, as well.

      Protein localization studies revealed that protein distributions are frequently inhomogeneous in a cell. This is very common in neurons which are highly polarized cell types with distinct axo-somato-dendritic functions. Moreover, due to the nature of the cell-to-cell interactions among neurons (e.g. electrical and chemical synapses) the cell membrane includes highly variable microdomains with unique protein assemblies (i.e. clusters). Protein clusters are defined as membrane segments with higher protein densities compared to neighboring membrane regions. However, protein density can continuously change between “clusters” and “non-clusters”. As a consequence, differentiating proteins involved vs not involved in clusters is a challenging task.  Indeed, our analysis showed that the boundaries of protein clusters varied remarkably when 23 human experts delineated them.

      Despite the fact the protein clusters can only be vaguely defined numerous studies have demonstrated the functional relevance of inhomogeneous protein distribution. Thus, there is a high relevance and need for an observer independent, “operative” segmentation method that can be accomplished and compared among different conditions and specimens. The strength of the Moran’s I analysis we propose here, as pointed out by our reviewers and editors, is that it can extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

      In AI based analysis the ground truth is known by an observer and using a large training set AI learns to extract the relevant information for image segmentation. As outlined above the “ground truth”, however, cannot be unequivocally defined for protein clusters. There is no doubt, that with sufficient resource investment there would be an AI based analysis of the same problem. In our view, however, in an average laboratory setting generating a training set using hundreds of images examined by many experts may not be plausible. Moreover, generalization of one training set to another set of cluster, resistance to noise or different levels of background could also not be guaranteed.

      This method was claimed to be better than the TBM when the noise level was high. Related to the above, TBMs can be used in association with various denoising methods as a preprocess. It is questionable whether the claim is still valid when compared to the methods with adequate complexity used together with denoising. Consider for example, Weigert et al. (2018) https://doi.org/10.1038/s41592-018-0216-7; or Lehtinen et al (2018) https://doi.org/10.48550/arXiv.1803.04189.

      In Weigert et al. AI was trained with high-quality images of the same object obtained with extreme photon exposure in confocal microscope. As delineated above without training AI systems cannot be used for such purposes. The Lehtinen paper is unfortunately no longer available at this doi.

      We must emphasize that in our work we did not intend to compare the image segmentation method based on local Moran’s I with all other available segmentation techniques. Rather we wanted to demonstrate a straightforward method of grouping pixels with similar intensities and in spatial proximity which does not require a priori knowledge of the objects. We used TBM to benchmark the method. We agree that with more advanced TBM methods the difference between Moran’s and TBM might have been smaller. The critical component here is, however, that even with most advanced TBM an artificial threshold is needed to be defined. The optimal threshold may change from sample to sample depending on the experimental conditions which makes quantification questionable. Moran’s method overcomes this problem and allows more objective segmentation of images even if the exact conditions (background labeling, noise, intensity etc) are not identical among the samples.

      The computational complexity of the method, determined by the convolution matrix size (Moran's order), linearly increases as the object size increases (Fig. S2b). Given that the convolution must be run separately for each pixel, the computation seems quite demanding for scale-up, e.g. when the method is applied for 3D image volumes. It will be helpful if the requirement for computer resources and time is provided.

      Here we provide the required data concerning the hardware and the computational time:

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Author response table 1.

      Computation times:

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.     

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy. 

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      We are grateful for the constructive assessment of our results.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      The code for the analysis is now available online as a user-friendly MATLAB script at: https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m

      Recommendations for the authors:

      Summary of reviews:

      Both reviewers acknowledge the potential significance and practicality of the newly proposed image segmentation method. This method uses Local Moran's principles, offering an advantage over traditional intensity thresholding approaches by providing more sensitivity, particularly in reducing background noise and preserving biologically relevant pixels.

      Strengths Highlighted:

      • The proposed method can provide more accurate results, especially for grayscale images with significant noise.

      • The method is not dependent on artificial intelligence, making it appealing for researchers with minimal computational background.    

      • The approach can operate without the need for extensive tuning, given that the size of the object is known.

      • Several proof-of-concept experiments were carried out, revealing the effectiveness of the method in comparison with the threshold-based segmentation methods.

      • The manuscript is clear in terms of methodology, and the results are supported by effective illustrations and references.

      Weaknesses Noted:

      • The study lacked a comparative analysis with advanced segmentation methods, especially those that employ artificial intelligence.

      See our response above to the same question of Reviewer 1.

      • There are concerns about computational complexity, especially when dealing with larger data sets or 3D image volumes.

      See our response about the calculations of computation times above to the similar question of Reviewer 1.

      • Both reviewers noted the absence of a data/code availability statement in the manuscript, which might restrict the method's adoption by other researchers.

      The code availability is provided now.

      • Reviewer 2 suggested that some results, particularly related to Kv4.2 in the thalamus, might be better presented in a separate study due to their significance.

      We thank our reviewers for this suggestion. We carefully evaluated the pros and cons of publishing the Kv4.2 data separately. We finally decided to keep the segmentation and experimental data together due to the following reason. We believe that the ultrastructural localization provides strong experimental proof for the relevance of our novel segmentation method. In order to make the potassium channel data more visible we added a subsentence to the title. In this manner we think scientist interested in the imaging method as well as the neurobiology will be both find and cite the paper. The novel title reads now:

      “An image segmentation method based on the spatial correlation coefficient of Local Moran’s I - identification of A-type potassium channel clusters in the thalamus.”

      Reviewer Recommendations:

      (1) Provide details about the data and program code availability.

      See our response above

      (2) Offer practical recommendations and provide clarity on software packages and coding for the proposed method to enhance its adoption.

      Done.

      (3) Consider presenting the findings about Kv4.2 in the thalamus separately as they hold significant importance on their own.

      See our response above

      Given the reviews, the proposed image segmentation method presents a promising advancement in the domain of image analysis. The technique offers tangible benefits, especially for researchers dealing with biological microscopy data. However, for this method to see a broader application, it's imperative to provide clearer practical guidance and make data or code easily accessible. Additionally, while the findings regarding Kv4.2 in the thalamus are intriguing, they might achieve more impact if detailed in a dedicated paper.

      Reviewer #1 (Recommendations For The Authors):

      The availability of data or program code was not stated in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the principles of the method are explained clearly in a step-by-step fashion in the Methods section, the practical aspects of running sequential computations over a large matrix of pixel values are not well described. It would be very useful if the authors could provide recommendations on how to set the data structure and clarify which software and programming package for Local Moran's analysis they used. In addition, providing the code for the sequential implementation described in the Methods section would facilitate the adoption of the method by other researchers, and thus, the impact of the study. Currently, there is no data or code availability statement included in the manuscript.

      See our response above.

      (2) Figure 4 illustrates an experiment in which transmission electron microscopy and freeze-fracture replica labeling approaches were used to demonstrate that a potassium channel marker, Kv4.2 was selective to synapses forming on larger caliber dendrites in the thalamus. As impressive as the EM approaches utilized in this figure are, the results of this experiment have a somewhat tangential bearing on the segmentation method that is the focus of this study. In fact, the experiments illustrated in Figure 5, dual immuno-EM, are more than sufficient to confirm what the dual-confocal imaging coupled with Local Moran's segmentation analysis reveals. Furthermore, the author's findings about the localization and selectivity of Kv4.2 in the thalamus are too important and exciting to bury in a paper focusing on the methodology. Those results may have a wider impact if they are presented and discussed in a separate experimental paper.

      See our response above

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      The iron manipulation experiments are in the whole animal and it is likely that this affects general feeding behaviour, which is known to affect NB exit from quiescence and proliferative capacity. The loss of ferritin in the gut and iron chelators enhancing the NB phenotype are used as evidence that glia provide iron to NB to support their number and proliferation. Since the loss of NB is a phenotype that could result from many possible underlying causes (including low nutrition), this specific conclusion is one of many possibilities.

      We have investigated the feeding behavior of fly by Brilliant Blue (sigma, 861146)[1]. Our result showed that the amount of dye in the fly body were similar between control group and BPS group, suggesting that BPS almost did not affect the feeding behavior (Figure 3—figure supplement 1A).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There was a gap between the Pros nuclear localization and downstream targets of ferritin, particularly NADH dehydrogenase and biosynthesis. Could overexpression of Ndi1 restore Pros localization in NBs?

      Ferritin defect downregulates iron level, which leads to cell cycle arrest of NBs via ATP shortage. And cell cycle arrest of NBs probably results in NB differentiation[2, 3]. We have added the experiment in Figure 5—figure supplement 2. This result showed that overexpression of Ndi1 could significantly restore Pros localization in NBs.

      The abstract requires revision to cover the major findings of the manuscript, particularly the second half.

      We revised the abstract to add more major findings of the manuscript in the second half as follows:

      “Abstract

      Stem cell niche is critical for regulating the behavior of stem cells. Drosophila neural stem cells (Neuroblasts, NBs) are encased by glial niche cells closely, but it still remains unclear whether glial niche cells can regulate the self-renewal and differentiation of NBs. Here we show that ferritin produced by glia, cooperates with Zip13 to transport iron into NBs for the energy production, which is essential to the self-renewal and proliferation of NBs. The knockdown of glial ferritin encoding genes causes energy shortage in NBs via downregulating aconitase activity and NAD+ level, which leads to the low proliferation and premature differentiation of NBs mediated by Prospero entering nuclei. More importantly, ferritin is a potential target for tumor suppression. In addition, the level of glial ferritin production is affected by the status of NBs, establishing a bicellular iron homeostasis. In this study, we demonstrate that glial cells are indispensable to maintain the self-renewal of NBs, unveiling a novel role of the NB glial niche during brain development.”

      In Figure 2B Mira appeared to be nuclear in NBs, which is inconsistent with its normal localization. Was it Dpn by mistake?

      In Figure 2B, we confirmed that it is Mira. Moreover, we also provide a magnified picture in Figure 2B’, showing that the Mira mainly localizes to the cortex or in the cytoplasm as previously reported.

      Figure 2C, Fer1HCH-GFP/mCherry localization was non-uniform in the NBs revealing 1-2 regions devoid of protein localization potentially corresponding to the nucleus and Mira crescent enrichment. It is important to co-label the nucleus in these cells and discuss the intracellular localization pattern of Ferritin.

      We have revised the picture with nuclear marker DAPI in Figure 2C. The result showed that Fer1HCH-GFP/Fer2LCH-mCherry was not co-localized with DAPI, which indicated that Drosophila ferritin predominantly distributes in the cytosol[4, 5]. As for the concern mentioned by this reviewer, GFP/mCherry signal in NBs was from glial overexpressed ferritin, which probably resulted in non-uniform signal.

      In Figure 3-figure supplement 3F, glial cells in Fer1HCH RNAi appeared to be smaller in size. This should be quantified. Given the significance of ferritin in cortex glial cells, examining the morphology of cortex glial cells is essential.

      In Figure 3—figure supplement 3F, we did not label single glial cells so it was difficult to determine whether the size was changed. However, it seems that the chamber formed by the cellular processes of glial cells becomes smaller in Fer1HCH RNAi. The glial chamber will undergo remodeling during neurogenesis, which responses to NB signal to enclose the NB and its progeny[6]. Thus, the size of glial chamber is regulated by NB lineage size. In our study, ferritin defect leads to the low proliferation, inducing the smaller lineage of each NB, which likely makes the chamber smaller.

      Since the authors showed that the reduced NB number was not due to apoptosis, a time-course experiment for glial ferritin KD is recommended to identify the earliest stage when the phenotype in NB number /proliferation manifests during larval brain development.

      We observed brains at different larval stages upon glial ferritin KD. The result showed that NB proliferation decreased significantly, but NB number declined slightly at the second-instar larval stage (Figure 5—figure supplement 1E and F), suggesting that brain defect of glial ferritin KD manifests at the second-instar larval stage.

      Transcriptome analysis on ferritin glial KD identified genes in mitochondrial functions, while the in vivo EM data suggested no defects in mitochondria morphology. A short discussion on the inconsistency is required.

      For the observation of mitochondria morphology via the in vivo EM data, we focused on visible cristae in mitochondria, which was used to determine whether the ferroptosis happens[7]. It is possible that other details of mitochondria morphology were changed, but we did not focus on that. To describe this result more accurately, we replaced “However, our observation revealed no discernible defects in the mitochondria of NBs after glial ferritin knockdown” with the “However, our result showed that the mitochondrial double membrane and cristae were clearly visible whether in the control group or glial ferritin knockdown group, which suggested that ferroptosis was not the main cause of NB loss upon glial ferritin knockdown” in line 207-209.

      The statement “we found no obvious defects of brain at the first-instar larval stage (0-4 hours after larval hatching) when knocking down glial ferritin (Figure 5-figure supplement 1C).” lacks quantification of NB number and proliferation, making it challenging to conclude.

      We have provided the quantification of NB number and proliferation rate of the first-instar larval stage in Figure 5—figure supplement 1C and D. The data showed that there is no significant change in NB number and proliferation rate when knocking down ferritin, suggesting that no brain defect manifests at the first-instar larval stage.

      A wild-type control is necessary for Figure 6A-C as a reference for normal brain sizes.

      We have added Insc>mCherry RNAi as a reference in Figure 6A-D, which showed that the brain size of tumor model is larger than normal brain. Moreover, we removed brat RNAi data from Figure 6A-D to Figure 6—figure supplement 1A-D for the better layout.

      In Figures 6B, D, “Tumor size” should be corrected to “Larval brain volume”.

      Here, we measured the brain area to assess the severity of the tumor via ImageJ instead of 3D data of the brain volume. So we think it would be more appropriate to use the “Larval brain size” than “Larval brain volume” here. Thus, we have corrected “Tumor size” to “Larval brain size” in Figure 6B and D to Figure 6—figure supplement 1B and D.

      Considering that asymmetric division defects in NBs may lead to premature differentiation, it is advisable to explore the potential involvement of ferritin in asymmetric division.

      aPKC is a classic marker to determine the asymmetric division defect of NB. We performed the aPKC staining and found it displayed a crescent at the apical cortex based on the daughter cell position whether in control or glial ferritin knockdown (Figure 5—figure supplement 3A). This result indicated that there was no obvious asymmetric defect after glial ferritin knockdown.

      In the statement "Secondly, we examined the apoptosis in glial cells via Caspase-3 or TUNEL staining, and found the apoptotic signal remained unchanged after glial ferritin knockdown (Figure 3-figure supplement 3A-D).", replace "the apoptosis in glial cells" with "the apoptosis in larval brain cells".

      We have replaced "the apoptosis in glial cells" with "the apoptosis in larval brain cells" in line 216.

      Include a discussion on the involvement of ferritin in mammalian brain development and address the limitations associated with considering ferritin as a potential target for tumor suppression.

      We have added the discussion about ferritin in mammalian brain development in line 428-430 and limitation of ferritin for suppressing tumor in line 441-444.

      Indicate Insc-GAL4 as BDSC#8751, even if obtained from another source. Additionally, provide information on the extensively used DeRed fly stock used in this study within the methods section.

      We provided the stock information of Insc-GAL4 and DsRed in line 673-674.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      The number of NBs differs a lot between experiments. For example, in Fig 1B and 1K controls present less than 100 NBs whereas in Figure 1 Supplementary 2B it can be seen that controls have more than 150. Then, depending on which control you compare the number of NBs in flies silencing Fer1HCH or Fer2LCH, the results might change. The authors should explain this.

      Figure 1 Supplementary 2B (Figure 1 Supplementary 3B in the revised version) shows NB number in VNC region while Fig 1B and 1K show NB number in CB region. At first, we described the general phenotype showing the NB number in CB and VNC respectively (Fig 1 and Fig 1-Supplementary 1 and 3 in the revised version). And the NB number is consistent in each region. After then, we focused on NB number in CB for the convenience.

      This reviewer encourages the authors to use better Gal4 lines to describe the expression patterns of ferritins and Zip13 in the developing brain. On the one hand, the authors do not state which lines they are using (including supplementary table). On the other hand, new Trojan GAL4 (or at least InSite GAL4) lines are a much better tool than classic enhancer trap lines. The authors should perform this experiment.

      All stock source and number were documented in Table 2. Ferritin GAL4 and Zip13 GAL4 in this study are InSite GAL4. In addition, we also used another Fer2LCH enhancer trapped GAL4 to verify our result (DGRC104255) and provided the result in Figure 2—figure supplement 1. Our data showed that DsRed driven by Fer2LCH-GAL4 was co-localized with the glia nuclear protein Repo, instead of the NB nuclear protein Dpn, which was consistent with the result of Fer1HCH/Fer2LCH GAL4. In addition, we will try to obtain the Trojan GAL4 (Fer1HCH/Fer2LCH GAL4 and Zip13 GAL4) and validate this result in the future.

      The authors exclude very rapidly the possibility of ferroptosis based only on some mitochondrial morphological features without analysing the other hallmarks of this iron-driven cell death. The authors should at least measure Lipid Peroxidation levels in their experimental scenario either by a kit to quantify by-products of lipid peroxidation such as Malonaldehide (MDA) or using an anti 4-HNE antibody.

      We combined multiple experiments to exclude the possibility of ferroptosis. Firstly, ferroptosis can be terminated by iron chelator. And we fed fly with iron chelator upon glial ferritin knockdown, but NB number and proliferation were not restored, which suggested that ferroptosis probably was not the cause of NB loss induced by glial ferritin knockdown (Figure 3B and C). Secondly, Zip13 transports iron into the secretary pathway and further out of the cells in Drosophila gut[8]. Our data showed that knocking down iron transporter Zip13 in glia resulted in the decline of NB number and proliferation, which was consistent with the phenotype upon glial ferritin knockdown (Figure 3E-G). More importantly, the knockdown of Zip13 and ferritin simultaneously aggravated the phenotype in NB number and proliferation (Figure 3E-G). These results suggested that the phenotype was induced by iron deficiency in NB, which excluded the possibility of iron overload or ferroptosis to be the main cause of NB loss upon glial ferritin knockdown. Finally, we observed mitochondrial morphology on double membrane and the cristae that are critical hallmarks of ferroptosis, but found no significant damage (Figure 3-figure supplement 2E and F).

      In addition, we have added the 4-HNE determination in Figure 3—figure supplement 2G and H. This result showed that 4-HNE level did not change significantly, suggesting that lipid peroxidation was stable, which supported to exclude the possibility that the ferroptosis led to the NB loss upon glial ferritin knockdown.

      All of the above results together indicate that ferroptosis is not the cause of NB loss after ferritin knockdown.

      A major flaw of the manuscript is related to the chapter Glial ferritin defects result in impaired Fe-S cluster activity and ATP production and the results displayed in Figure 4. The authors talk about the importance of FeS clusters for energy production in the mitochondria. Surprisingly, the authors do not analyse the genes involved in this process such as but they present the interaction with the cytosolic FeS machinery that has a role in some extramitochondrial proteins but no role in the synthesis of FeS clusters incorporated in the enzymes of the TCA cycle and the respiratory chain. The authors should repeat the experiments incorporating the genes NSF1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) or remove (or at least rewrite) this entire section.

      Thanks for this constructive advice and we have revised this in Figure 4B and C. We repeated the experiment with blocking mitochondrial Fe-S cluster biosynthesis by knocking down Nfs1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971), respectively. Nfs1 knockdown in NB led to a low proliferation, which was consistent with CIA knockdown. However, we did not observe the obvious brain defect in ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) knockdown in NB. Our interpretation of these results is that Nfs1 probably is a necessary core component in Fe-S cluster assembly while others are dispensable[9].

      The presence and aim of the mouse model Is unclear to this reviewer. On the one hand, It Is not used to corroborate the fly findings regarding iron needs from neuroblasts. On the other hand, and without further explanation, authors migrate from a fly tumor model based on modifying all neuroblasts to a mammalian model based exclusively on a glioma. The authors should clarify those issues.

      Although iron transporter probably is different in Drosophila and mammal, iron function is conserved as an essential nutrient for cell growth and proliferation from Drosophila to mammal. The data of fly suggested that iron is critical for brain tumor growth and thus we verified this in mammalian model. Glioma is the most common form of central nervous system neoplasm that originates from neuroglial stem or progenitor cells[10]. Therefore, we validated the effect of iron chelator DFP on glioma in mice and found that DFP could suppress the glioma growth and further prolong the survival of tumor-bearing mice.

      Minor points

      Although referred to adult flies, the authors did not include either in the introduction or in the discussion existing literature about expression of ferritins in glia or alterations of iron metabolism in fly glia cells (PMID: 21440626 and 25841783, respectively) or usage of the iron chelator DFP in drosophila (PMID: 23542074). The author should check these manuscripts and consider the possibility of incorporating them into their manuscript.

      Thanks for your remind. We have incorporated all recommended papers into our manuscript line 65-67 and 168.

      The number of experiments in each figure is missing.

      All experiments were repeated at least three times. And we revised this in Quantifications and Statistical Analysis of Materials and methods.

      If graphs are expressed as mean +/- sem, it is difficult to understand the significance stated by the authors in Figure 2E.

      We apologize for this mistake and have revised this in Quantifications and Statistical Analysis. All statistical results were presented as means ± SD.

      When authors measure aconitase activity, are they measuring all (cytosolic and mitochondrial) or only one of them? This is important to better understand the experiments done by the authors to describe any mitochondrial contribution (see above in major points).

      In this experiment, we were measuring the total aconitase activity. We also tried to determine mitochondrial aconitase but it failed, which was possibly ascribed to low biomass of tissue sample.

      In this line, why do controls in aconitase and atp lack an error bar? Are the statistical tests applied the correct ones? It is not the same to have paired or unpaired observations.

      It is the normalization. We repeated these experiments at least three times in different weeks respectively, because the whole process was time-consuming and energy-consuming including the collection of brains, protein determination and ATP or aconitase determination. And the efficiency of aconitase or ATP kit changed with time. We cannot control the experiment condition identically in different batches. Therefore, we performed normalization every time to present the more accurate result. The control group was normalized as 1 via dividing into itself and other groups were divided by the control. This normalized process was repeated three times. Therefore, there is no error bar in the control group. We think it is appropriate to apply ANOVA with a Bonferroni test in the three groups.

      In some cases, further rescue experiments would be appreciated. For example, expression of Ndi restores control NAD+ levels or number of NBs, it would be interesting to know if this is accompanied by restoring mitochondrial integrity and its ability to produce ATP.

      We have determined ATP production after overexpressing Ndi1 and provided this result in Figure 4—figure supplement 1B. The data showed that expression of Ndi1 could restore ATP production upon glial Fer2LCH knockdown, which was consistent with our conclusion.

      Lines 293-299 on page 7 are difficult to understand.

      According to our above results, the decrease of NB number and proliferation upon glial ferritin knockdown (KD) was caused by energy deficiency. As shown in the schematic diagram (Author response image 1), “T” represented the total energy which was used for NB maintenance and proliferation. “N” indicated the energy for maintaining NB number. “P” indicated the energy for NB proliferation. “T” is equal to “N” plus “P”. When ferritin was knocked down in glia, “T”, “N” and “P” declined in “Ferritin KD” compared to “wildtype (WT)”. Knockdown of pros can prevent the differentiation of NB, but it cannot supply the energy for NB, which probably results in the rescue of NB number but not proliferation. Specifically, NB number increased significantly in “Ferritin KD Pros KD” compared to “Ferritin KD”, which resulted in consuming more energy for NB maintenance in “Ferritin KD Pros KD”. As shown in the schematic diagram, “T” was not changed between “Ferritin KD Pros KD” and “Ferritin KD”, whereas ”N” was increased in “Ferritin KD Pros KD” compared to “Ferritin KD”. Thus, “P” was decreased, which suggested that less energy was remained for proliferation, leading to the failure of rescue in NB proliferation. It seemed that the level of proliferation in “Ferritin KD Pros KD” was even lower than “Ferritin KD”.

      Author response image 1.

      The schematic diagram of relationship between energy and NB function in different groups. “T” represents total energy for NB maintenance and proliferation. “N” represents the energy for NB maintenance. “P” represents the energy for NB proliferation. T=N+P 

      Line 601 should indicate that Tables 2 and 3 are part of the supplementary material.

      We have revised this in line 678.

      Figure 4-supplement 1. Only validation of 2 genes from a RNAseq seems too little.

      We dissected hundreds of brains for sorting NBs because of low biomass of fly brain. This is a difficult and energy-consuming work. Most NBs were used for RNA-seq, so we can only use a small amount of sample left for validation which is not enough for more genes.

      Figure 6E, the authors indicate that 10 mg/ml DFP injection could significantly prolong the survival time. Which increase in % is produced by DFP?

      We have provided the bar graph in Author response image 2. The increase is about 16.67% by DFP injection.

      Author response image 2.

      The bar graph of survival time of mice treated with DFP. (The unpaired two-sided Student’s t test was employed to assess statistical significance. Statistical results were presented as means ± SD. n=7,6; *: p<0.05)

      Reviewer #3 (Recommendations For The Authors):

      As I read the initial results that built the story (glia make ferritin>release it> NBs take them up>use it for TCA and ETC) I kept thinking about what it meant for NBs to be 'lost'. This led me to consider alternate possibilities that the results might point to, other than the ones the authors were suggesting. It was only in Figure 5 that the authors ruled out some of those possibilities. I would suggest that they first illustrate how NBs are lost upon glial ferritin loss of function before they delve into the mechanism. This would also be a place to similarly address that glial numbers and general morphology are unchanged upon ferritin loss.

      This recommendation provides a valuable guideline to build this story especially for researchers who are interested in neural stem cell studies. Actually, we tried this logic to present our study but found that there are several gaps in the middle of the manuscript, such as the relationship between glial ferritin and Pros localization in NB, so that the whole story cannot be fluently presented. Therefore, we decided to present this study in the current way.

      More details of the screen would be useful to know. How many lines did they screen, what was the assay? This is not mentioned anywhere in the text.

      We have added this in Screen of Materials and methods. We screened about 200 lines which are components of classical signaling pathways, highly expressed genes in glial cells or secretory protein encoding genes. UAS-RNAi lines were crossed with repo-Gal4, and then third-instar larvae of F1 were dissected. We got the brains from F1 larvae and performed immunostaining with Dpn and PH3. Finally, we observed the brain in Confocal Microscope.

      Many graphs seem to be repeated in the main figures and the supplementary data. This is unnecessary, or at least should be mentioned.

      We appreciate your kind reminder. However, we carefully went through all the figures and did not find the repeated graphs, though some of them look similar.

      The authors mention that they tested which glial subtypes ferritin is needed in, but don't show the data. Could they please show the data? Same with the other iron transport/storage/regulation. Also, in both this and later sections, the authors could mention which Gal4 was used to label what cell types. The assumption is that the reader will know this information.

      We have added the result of ferritin knockdown in glial subpopulations in Figure 1—figure supplement 2. However, considering that the quantity of iron-related genes, we did not take the picture, but we recorded this in Table 3.

      For all their images showing colocalisation, magnified, single-colour images shown in grayscale will be useful. For example, without the magnification, it is not possible to see the NB expression of the protein trap line in Figure 2B. A magnified crop of a few NBs (not a single one like in 2C) would be more useful.

      We have provided Figure 2A’, B’, D’ and Figure 3D’ as suggested.

      There are a lot of very specific assays used to detect ROS, NAD, aconitase activity, among others. It would be nice to have a brief but clear description of how they work in the main text. I found myself having to refer to other sources to understand them. (I believe SoNAR should be attributed to Zhao et al 206 and not Bonnay et al 2020.)

      We have added a brief description about ROS, aconitase activity, NAD in line 198-199, 229-231, and 269 as suggested.

      I did not understand the normalisation done with respect to SoNAR. Is this standard practice? Is the assumption that 'overall protein levels will be higher in slowly proliferating NBs' reasonable? This is why they state the need to normalise.

      The SoNAR normalization is not a standard practice. However, we think that our normalization of SoNar is reasonable. According to our results, the expression level of Dpn and Mira seemed higher in glial ferritin knockdown, so we speculated that some proteins accumulated in slowly proliferating NBs. Thus, we used Insc-GAL4 to drive DsRed for indicating the expression level of Insc and found that DsRed rose after glial ferritin knockdown, suggesting that Insc expression was increased indeed. Therefore, we have to normalize SoNar driven by Insc-GAL4 based on DsRed driven by Insc-Gal4, which eliminates the effect of increased Insc upon glial ferritin knockdown.

      FAC is mentioned as a chelator? But the authors seem to use it oppositely. Is there an error?

      FAC is a type of iron salt, which is used to supply iron. We have also indicated that in line 156 according to your advice. 

      The lack of any cell death in the L3 brain surprised me. There should be plenty of hemilineages that die, as do many NBs, particularly in the abdominal segments. Is the stain working? Related to this, P35 is not the best method for rescuing cell death. H99 might be a better way to go.

      We were also surprised to see this result and repeated this experiment for several times with both negative and positive controls. Moreover, we also used TUNEL to validate this result, which led to the same result. We will try to use H99 to rescue NB loss in the future, because it needs to be integrated and recombined with our current genetic tools.

      It would be nice to see the aconitase activity signal as opposed to just the quantification.

      This method can only determine the absorbance for indicating aconitase activity, so our result is just the quantification.

      Glia are born after NBs are specified. In fact, they arise from NBs (and glioblasts). So, it's unlikely that the knockdown of ferritin in glia can at all affect initial NB specification.

      We completely agree with this statement.

      The section on tumor suppression seems out of place. The fly data on which the authors base this as an angle to chase is weak. Dividing cells will be impaired if they have inadequate energy production. As a therapeutic, this will affect every cell in the body. I'm not sure that cancer therapeutics is pursuing such broadly acting lines of therapies anymore.

      Our data suggested that iron/ferritin is more critical for high proliferative cells. Tumor cells have a high expression of TfR (Transferrin Receptor)[11], which can bind to Transferrin and ferritin[12]. And ferritin specifically targets on the tumor cells[11]. Thus, we think iron/ferritin is extremely essential for tumor cells. If we can find the appropriate dose of iron/ferritin inhibitor, suppressing tumor growth but maintaining normal cell growth, iron/ferritin might be an effective target of tumor treatment.

      The feedback from NB to glial ferritin is also weak data. The increased cell numbers (of unknown identity) could well be contributing to the increase in ferritin. I would omit the last two sections from the MS.

      In brat RNAi and numb RNAi, increased cells are NB-like cells, which cannot undergo further differentiation and are not expected to produce ferritin. More importantly, we used Repo (glia marker) as the reference and quantified the ratio of ferritin level to Repo level, which can exclude the possibility that increased glial cells lead to the increase in ferritin.

      References

      (1) Tanimura T, Isono K, Takamura T, et al. Genetic Dimorphism in the Taste Sensitivity to Trehalose in Drosophila-Melanogaster. J Comp Physiol, 1982,147(4):433-7

      (2) Myster DL, Duronio RJ. Cell cycle: To differentiate or not to differentiate? Current Biology, 2000,10(8):R302-R4

      (3) Dalton S. Linking the Cell Cycle to Cell Fate Decisions. Trends in Cell Biology, 2015,25(10):592-600

      (4) Nichol H, Law JH, Winzerling JJ. Iron metabolism in insects. Annu Rev Entomol, 2002,47:535-59

      (5) Pham DQ, Winzerling JJ. Insect ferritins: Typical or atypical? Biochim Biophys Acta, 2010,1800(8):824-33

      (6) Speder P, Brand AH. Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife, 2018,7

      (7) Mumbauer S, Pascual J, Kolotuev I, et al. Ferritin heavy chain protects the developing wing from reactive oxygen species and ferroptosis. PLoS Genet, 2019,15(9):e1008396

      (8) Xiao G, Wan Z, Fan Q, et al. The metal transporter ZIP13 supplies iron into the secretory pathway in Drosophila melanogaster. Elife, 2014,3:e03191

      (9) Marelja Z, Leimkühler S, Missirlis F. Iron Sulfur and Molybdenum Cofactor Enzymes Regulate the  Life Cycle by Controlling Cell Metabolism. Front Physiol, 2018,9

      (10) Morgan LL. The epidemiology of glioma in adults: a "state of the science" review. Neuro-Oncology, 2015,17(4):623-4

      (11) Fan K, Cao C, Pan Y, et al. Magnetoferritin nanoparticles for targeting and visualizing tumour tissues. Nat Nanotechnol, 2012,7(7):459-64

      (12) Li L, Fang CJ, Ryan JC, et al. Binding and uptake of H-ferritin are mediated by human transferrin receptor-1. Proc Natl Acad Sci U S A, 2010,107(8):3505-10

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between the expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in-depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance, based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. MeRIP experiments are not quantitative, and since there are far fewer peaks in aneuploidies, it stands to reason that more antibody binding sites may be available to enrich those fewer peaks to a larger extent. But based on the data as presented (figure 2D) this conclusion was drawn from RPKM in IP samples, which may not fully account for changing transcript abundances in absolute (expression level changes) and relative (proportion of transcripts in input RNA sample) terms.

      Methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq) is a commonly used strategy of genome-wide mapping of m6A modification. This method uses anti-m6A antibody to immunoprecipitate RNA fragments, which results in selective enrichment of methylated RNA. Then the RNA fragments were subjected to deep sequencing, and the regions enriched in the immunoprecipitate relative to input samples are identified as m6A peaks using the peak calling algorithm. We identified m6A peaks in different samples by the exomePeak2 program and determined common m6A peaks for each genotype based on the intersection of biological replicates. Figure 2D shows the RPM values of m6A peaks in MeRIP samples for each genotype, indicating that the levels of reads in the m6A peak regions were significantly higher in the aneuploid IP samples than in wildtypes. When the enrichment of IP samples relative to Input samples (RPM.IP/RPM.Input) was taken into account, the statistics for all three aneuploidies were still significantly higher than those of the wildtypes (Mann Whitney U test p-values < 0.001). This analysis is not about changes in the abundance of transcripts, but from the MeRIP perspective, showing that there are relatively more m6A-modified reads mapped to the m6A peaks in aneuploidies than that in wildtypes. In addition, we have added the results of IP/Input in the main text, and revised the description in the manuscript to make it more precise to reduce possible misunderstandings.

      The bulk-level m6A measurements as performed here also cannot effectively support these conclusions, as they are measured in total RNA. The focus of the work is mRNA m6A regulators, but m6A levels measured from total RNA samples will not reflect mRNA m6A levels as there are other abundance RNAs that contain m6A (including rRNA). As a result, conclusions about mRNA m6A levels from these measurements are not supported.

      According to some published articles, m6A levels of purified mRNA or total RNA can be detected by different methods (such as mass spectrometry, 2D thin-layer chromatography, etc.) in Drosophila cells or tissues [1-3].

      Here, we used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005), which is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses. This kit has previously been used by researchers to detect the m6A/A ratio in total RNA [4, 5] or purified mRNA [6] from different species.

      In order to compare the m6A levels between the total RNA and mRNA, it was shown that the enrichment of mRNA from total RNA using Dynabeads™mRNA Purification Kit (Invitrogen Cat # 61006) did not show any significantly differences comparing with the results of total RNA (Figure 1). That’s the reason why most of the results of m6A levels in the manuscript were detected in total RNA.

      Author response image 1.

      The m6A levels of total RNA and mRNA

      As suggested, we will try to extract and purify mRNA from different genotypes to verify our conclusion based on the m6A levels of total RNA if necessary. In addition, m6A modification in other types of RNA other than mRNA (e.g., lncRNA, rRNA) is not necessarily meaningless. We will also add discussions of this issue in the manuscript.

      (1) Lence T, et al. (2016) m6A modulates neuronal functions and sex determination in Drosophila. Nature 540(7632):242-247.

      (2) Haussmann IU, et al. (2016) m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540(7632):301-304.

      (3) Kan L, et al. (2017) The m(6)A pathway facilitates sex determination in Drosophila. Nat Commun 8:15737.

      (4) Zhu C, et al. (2023) RNA Methylome Reveals the m(6)A-mediated Regulation of Flavor Metabolites in Tea Leaves under Solar-withering. Genomics Proteomics Bioinformatics 21(4):769-787.

      (5) Song H, et al. (2021) METTL3-mediated m(6)A RNA methylation promotes the anti-tumour immunity of natural killer cells. Nat Commun 12(1):5522.

      (6) Yin H, et al. (2021) RNA m6A methylation orchestrates cancer growth and metastasis via macrophage reprogramming. Nat Commun 12(1):1394.

      Reviewer #2 (Public Review):

      Summary:

      The authors have tested the effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites to become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between the activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and the expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop on each other. Overall, this paper uses bioinformatic trends to generate a candidate model of feedback between DCC and m6A. It would be improved by functional studies that validate the effect in vivo.

      Strengths:

      • Thorough bioinformatic analysis of their data.

      • Incorporation of other published datasets that enhance scope and rigor.

      • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males.

      • Suggests this counting mechanism may be due to the effect of chromatin-dependent effects on the expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A is indirect and based on bioinformatic trends with little follow-up to test the mechanistic bases of these trends.

      We found a set of ChIP-seq data (GSE109901) of H4K16ac in female and male Drosophila larvae from the public database, and analyzed whether H4K16ac is directly associated with m6A regulator genes. ChIP-seq is a standard method to study transcription factor binding and histone modification by using efficient and specific antibodies for immunoprecipitation. The results showed that there were H4K16ac peaks at the 5' region in gene of m6A reader Ythdc1 in both males and females. In addition, most of the genome sites where the other m6A regulator genes located are acetylated at H4K16 in both sexes, except that Ime4 shows sexual dimorphism and only contains H4K16ac peak in females. These results indicate that the m6A regulator gene itself is acetylated at H4K16, so there is a direct relationship between H4K16ac and m6A regulators. We have added these contents to the text.

      Besides the above conclusion from the seq data, we are also going to do some experiments to test the linkage between H4K16 and m6A in the next, such as how about the m6A levels when MOF is over expressed with the increased levels of H4K16Ac, the H4K16 levels when YT521B is knocked down or over expressed and the relative expression levels of important regulatory genes in there.

      • The paper lacks sufficient in vivo validation of the effects of DCC alleles on m6A and vice versa. For example, Is the Ythdc1 genomic locus a direct target of the DCC component Msl-2 ? (see Figure 7).

      In order to study whether Ythdc1 genomic locus is a direct target of DCC component, we first analyzed a published MSL2 ChIP-seq data of Drosophila (GSE58768). Since MSL2 is only expressed in males under normal conditions, this set of data is from male Drosophila. According to the results, the majority (99.1%) of MSL2 peaks are located on the X chromosome, while the MSL2 peaks on other chromosomes are few. This is consistent with the fact that MSL2 is enriched on the X chromosome in male Drosophila [1, 2]. Ythdc1 gene is located on chromosome 3L, and there is no MSL2 peak near it. Similarly, other m6A regulator genes are not X-linked, and there is no MSL2 peak. Then we analyzed the MOF ChIP-seq data (GSE58768) of male Drosophila. It was found that 61.6% of MOF peaks were located on the X chromosome, which was also expected [3, 4]. Although there are more MOF peaks on autosomes than MSL2 peaks, MOF peaks are absent on m6A regulator genes on autosomes. Therefore, at present, there is no evidence that the gene locus of m6A regulators are the direct targets of DCC component MSL2 and MOF, which may be due to the fact that most MSL2 and MOF are tethered to the X chromosome by MSL complex under physiological conditions. Whether there are other direct or indirect interactions between Ythdc1 and MSL2 is an issue worthy of further study in the future.

      (1) Bashaw GJ & Baker BS (1995) The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development 121(10):3245-3258.

      (2) Kelley RL, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81(6):867-877.

      (3) Kind J, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133(5):813-828.

      (4) Conrad T, et al. (2012) The MOF chromobarrel domain controls genome-wide H4K16 acetylation and spreading of the MSL complex. Dev Cell 22(3):610-624.

      Quite a bit of technical detail is omitted from the main text, making it difficult for the reader to interpret outcomes.

      (1) Please add the tissues to the labels in Figure 1D.

      Figure 1D shows the subcellular localization of FISH probe signals in Drosophila embryos. Arrowheads indicate the foci of probe signals. The corresponding tissue types are (1) blastoderm nuclei; (2) yolk plasm and pole cells; (3) brain and midgut; (4) salivary gland and midgut; (5) blastoderm nuclei and yolk cortex; (6) blastoderm nuclei and pole cells; (7) blastoderm nuclei and yolk cortex; (8) germ band. We have added these to the manuscript.

      (2) In the main text, please provide detail on the source tissues used for meRIP; was it whole larvae? adult heads? Most published datasets are from S2 cells or adult heads and comparing m6A across tissues and developmental stages could introduce quite a bit of variability, even in wt samples. This issue seems to be what the authors discuss in lines 197-199.

      In this article, the material used to perform MeRIP-seq was the whole third instar larvae. Because trisomy 2L and metafemale Drosophila died before developing into adults, it was not possible to use the heads of adults for MeRIP-seq detection of aneuploidy. For other experiments described here, the m6A abundance was measured using whole larvae or adult heads; material used for RT-qPCR analysis was whole larvae, larval brains, or adult heads; Drosophila embryos at different developmental stages were used for fluorescence in situ hybridization (FISH) experiments. We provide a detailed description of the experimental material for each assay in the manuscript.

      (3) In the main text, please identify the technique used to measure "total m6A/A" in Fig 2A. I assume it is mass spec.

      We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the m6A/A ratio in RNA samples. This kit is commercially available for quantification of m6A RNA methylation, which used colorimetric assay with easy-to-follow steps for convenience and speed, and is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses.

      (4) Line 190-191: the text describes annotating m6A sites by "nearest gene" which is confusing. The sites are mapped in RNAs, so the authors must unambiguously know the identity of the gene/transcript, right?

      When the m6A peaks were annotated using the R package ChIPseeker, it will include two items: "genomic annotation" and "nearest gene annotation". "Genomic annotation" tells us which genomic features the peak is annotated to, such as 5’UTR, 3’UTR, exon, etc. "Nearest gene annotation" indicates which specific gene/transcript the peak is matched to. We modified the description in the main text to make it easier to understand.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      Comments on current version:

      As mentioned in my first review, this work is significantly underpowered for the following reasons: 1) n=4 for each treatment group.; 2) no randomization of the surgical sites receiving treatments; 3) implants surgically inserted without precision/guided surgery. The authors have not addressed these concerns.

      On a minor note: not sure why the authors present a methodology to evaluate the dynamic bone formation (line 272) but do not present results (i.e. by means of histomorphometrical analyses) utilizing this methodology.

      We sincerely appreciate your thorough review and valuable feedback. We have carefully considered your comments and would like to address them as follows:

      As mentioned in my first review, this work is significantly underpowered for the following reasons:

      (1) n=4 for each treatment group.;

      We acknowledge your concern regarding the limited sample size (n=4 per group). While we understand this may affect statistical power, our choice was influenced by ethical considerations in animal experimentation and resource constraints. Increasing the sample size would undoubtedly strengthen the statistical power of our study. However, the logistical and ethical constraints associated with using a larger number of animals in such invasive procedures were significant limiting factors. Specifically, increasing the number of medium to large experimental animals could raise ethical issues, so we used the minimum number possible. Additionally, our study design was reviewed and approved by the animal IRB, which dictated the minimum number of animals we could use. Nevertheless, we conducted power analysis to ensure that our sample size, although limited, was sufficient to detect significant differences given the high variability typically observed in biological responses. The results obtained from our n=4 samples showed consistent trends and significant differences between groups, indicating the robustness of our findings. I will include this point in the limitations section of the discussion. Thank you.

      (2) no randomization of the surgical sites receiving treatments;

      Thank you for pointing out this issue. We agree that randomization is essential when considering individual differences and the anatomical variations of the jawbone, such as those found in humans. However, this study is an animal experiment where other conditions were controlled, and the interventions were applied after complete bone healing following tooth extraction. Therefore, the impact of randomization of surgical sites was likely minimal, and it is challenging to determine whether it significantly influenced the experimental results. Of course, twelve female OVX beagles were randomly designated into three groups. (Methods section, line 298) However regarding your concern, we would like to present the robustness of histological results from different surgical sites as shown below. Also we will include this point in the limitations section of the discussion.

      Histologic analysis of the different surgical sites showed significant differences in bone formation and osseointegration among the three treatment groups: vehicle control, rhPTH(1-34), and dimeric Cys25PTH(1-34). Goldner trichrome staining (Figure A-C) showed enhanced bone formation in both the rhPTH(1-34) and dimeric Cys25PTH(1-34) groups compared to the vehicle control group. The rhPTH(1-34) group showed the most pronounced bone mass gain around the implant. Both treatment groups showed improved bone-to-implant contact compared to the control group, as indicated by the red arrows.

      Masson trichrome staining (Figure D-F) further confirmed these results, showing an increase in bone matrix (blue staining) in the rhPTH(1-34) and dimeric Cys25PTH(1-34) groups, with the dimeric rhPTH(1-34) group showing the most extensive and dense bone formation.

      TRAP staining (Figure G-I and G'-I') was used to assess osteoclast activity. Interestingly, both the rhPTH(1-34) and dimeric Cys25PTH(1-34) groups showed an increase in TRAP-positive cells compared to the vehicle control, suggesting enhanced bone remodeling activity. The highest number of TRAP-positive cells was observed in the rhPTH(1-34) group and the highest trabecular number, indicating the most active bone remodeling.

      To summarize the results, histological analyses revealed that both rhPTH(1-34) and dimeric Cys25PTH(1-34) treatments significantly enhanced osseointegration and bone formation around titanium implants in a postmenopausal osteoporosis model compared to the control. The rhPTH(1-34) group demonstrated superior outcomes, exhibiting the most substantial increase in bone volume, bone-to-implant contact, and osteoclastic activity, indicating its greater efficacy in promoting bone regeneration and implant integration in this experimental context.

      Author response image 1.

      Histological analysis using Goldner trichrome, Masson trichrome, and TRAP staining

      (3) implants surgically inserted without precision/guided surgery. The authors have not addressed these concerns.

      The primary purpose of precision guides is to prevent damage to various anatomical structures and to ensure perfect placement at the desired location. Even disregarding the potential inaccuracies of precision guides in actual clinical settings, the primary goal of this animal experiment was not to achieve perfect placement or prevent damage to anatomical structures. Instead, the objective was to histologically measure the integrity of the bone surrounding titanium fixture's platform after pharmacological intervention, ensuring it was fully seated in the alveolar bone. To this end, we secured sufficient visibility through periosteal dissection to confirm the perfect placement of the implant and adhered to the principle of maintaining sufficient mesiodistal distance between each fixture. Using such precision guides in this animal experiment, which is not an evaluation of 'implant precision guides,' could potentially introduce inaccuracies and contradict the experimental objectives. Furthermore, since this experiment was conducted on an edentulous ridge where all teeth had been extracted, achieving the same placement as in the presurgical simulation would be impossible, even with the use of precision guides. Thank you once again for your constructive feedback. We will include this point in the limitations section of the discussion.

      On a minor note: not sure why the authors present a methodology to evaluate the dynamic bone formation (line 272) but do not present results (i.e. by means of histomorphometrical analyses) utilizing this methodology.

      As the reviewer mentioned, we confirmed that the sentence was included in the Methods section despite the analysis not actually being performed. We sincerely apologize for this oversight and will make the necessary corrections immediately. Thank you very much for your keen observation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behaviour with reward-based (behavioural psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision-making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behaviour, respectively. After demonstrating that active inference provides a better explanation of behavioural responses, the neuronal correlates of epistemic and instrumental value (under an optimised active inference model) are characterised using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      Strengths:

      The strengths of this work rest upon the theoretical underpinnings and careful deconstruction of the various determinants of choice behaviour using active inference. A particular strength here is that the experimental paradigm is designed carefully to elicit both information-seeking and reward-seeking behaviour; where the information-seeking is itself separated into resolving uncertainty about the context (i.e., latent states) and the contingencies (i.e., latent parameters), under which choices are made. In other words, the paradigm - and its subsequent modelling - addresses both inference and learning as necessary belief and knowledge-updating processes that underwrite decisions.

      The authors were then able to model belief updating using active inference and then look for the neuronal correlates of the implicit planning or policy selection. This speaks to a further strength of this study; it provides some construct validity for the modelling of belief updating and decision-making; in terms of the functional anatomy as revealed by EEG. Empirically, the source space analysis of the neuronal correlates licences some discussion of functional specialisation and integration at various stages in the choices and decision-making.

      In short, the strengths of this work rest upon a (first) principles account of decision-making under uncertainty in terms of belief updating that allows them to model or fit choice behaviour in terms of Bayesian belief updating - and then use relatively state-of-the-art source reconstruction to examine the neuronal correlates of the implicit cognitive processing.

      Response: We are deeply grateful for your careful review of our work and for the thoughtful feedback you have provided. Your dedication to ensuring the quality and clarity of the work is truly admirable. Your comments have been invaluable in guiding us towards improving the paper, and We appreciate your time and effort in not just offering suggestions but also providing specific revisions that I can implement. Your insights have helped us identify areas where I can strengthen the arguments and clarify the methodology.

      Comment 1:

      The main weaknesses of this report lies in the communication of the ideas and procedures. Although the language is generally excellent, there are some grammatical lapses that make the text difficult to read. More importantly, the authors are not consistent in their use of some terms; for example, uncertainty and information gain are sometimes conflated in a way that might confuse readers. Furthermore, the descriptions of the modelling and data analysis are incomplete. These shortcomings could be addressed in the following way.

      First, it would be useful to unpack the various interpretations of information and goal-seeking offered in the (active inference) framework examined in this study. For example, it will be good to include the following paragraph:

      "In contrast to behaviourist approaches to planning and decision-making, active inference formulates the requisite cognitive processing in terms of belief updating in which choices are made based upon their expected free energy. Expected free energy can be regarded as a universal objective function, specifying the relative likelihood of alternative choices. In brief, expected free energy can be regarded as the surprise expected following some action, where the expected surprise comes in two flavours. First, the expected surprise is uncertainty, which means that policies with a low expected free energy resolve uncertainty and promote information seeking. However, one can also minimise expected surprise by avoiding surprising, aversive outcomes. This leads to goal-seeking behaviour, where the goals can be regarded as prior preferences or rewarding outcomes.

      Technically, expected free energy can be expressed in terms of risk plus ambiguity - or rearranged to be expressed in terms of expected information gain plus expected value, where value corresponds to (log) prior preferences. We will refer to both decompositions in what follows; noting that both decompositions accommodate information and goal-seeking imperatives. That is, resolving ambiguity and maximising information gain have epistemic value, while minimising risk or maximising expected value have pragmatic or instrumental value. These two kinds of values are sometimes referred to in terms of intrinsic and extrinsic value, respectively [1-4]."

      Response 1: We deeply thank you for your comments and corresponding suggestions about our interpretations of active inference. In response to your identified weaknesses and suggestions, we have added corresponding paragraphs in the Methods section (The free energy principle and active inference, line 95-106):

      “Active inference formulates the necessary cognitive processing as a process of belief updating, where choices depend on agents' expected free energy. Expected free energy serves as a universal objective function, guiding both perception and action. In brief, expected free energy can be seen as the expected surprise following some policies. The expected surprise can be reduced by resolving uncertainty, and one can select policies with lower expected free energy which can encourage information-seeking and resolve uncertainty. Additionally, one can minimize expected surprise by avoiding surprising or aversive outcomes (oudeyer et al., 2007; Schmidhuber et al., 2010). This leads to goal-seeking behavior, where goals can be viewed as prior preferences or rewarding outcomes.

      Technically, expected free energy can also be expressed as expected information gain plus expected value, where the value corresponds to (log) prior preferences. We will refer to both formulations in what follows. Resolving ambiguity, minimizing risk, and maximizing information gain has epistemic value while maximizing expected value have pragmatic or instrumental value. These two types of values can be referred to in terms of intrinsic and extrinsic value, respectively (Barto et al., 2013; Schwartenbeck et al., 2019).”

      Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 1, 108.

      Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE transactions on autonomous mental development, 2(3), 230-247.

      Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise?. Frontiers in psychology, 4, 61898.

      Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. elife, 8, e41703.

      Comment 2:

      The description of the modelling of choice behaviour needs to be unpacked and motivated more carefully. Perhaps along the following lines:

      "To assess the evidence for active inference over reinforcement learning, we fit active inference and reinforcement learning models to the choice behaviour of each subject. Effectively, this involved optimising the free parameters of active inference and reinforcement learning models to maximise the likelihood of empirical choices. The resulting (marginal) likelihood was then used as the evidence for each model. The free parameters for the active inference model scaled the contribution of the three terms that constitute the expected free energy (in Equation 6). These coefficients can be regarded as precisions that characterise each subjects' prior beliefs about contingencies and rewards. For example, increasing the precision or the epistemic value associated with model parameters means the subject would update her beliefs about reward contingencies more quickly than a subject who has precise prior beliefs about reward distributions. Similarly, subjects with a high precision over prior preferences or extrinsic value can be read as having more precise beliefs that she will be rewarded. The free parameters for the reinforcement learning model included..."

      Response 2: We deeply thank you for your comments and corresponding suggestions about our description of the behavioral modelling. In response to your identified weaknesses and suggestions, we have added corresponding content in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) (Vrieze 2012) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be seen in Eq.S1-11 and the details for the model-based reinforcement learning model can be seen Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python (Frazire 2018), first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.

      Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.

      Comment 3:

      In terms of the time-dependent correlations with expected free energy - and its constituent terms - I think the report would benefit from overviewing these analyses with something like the following:

      "In the final analysis of the neuronal correlates of belief updating - as quantified by the epistemic and intrinsic values of expected free energy - we present a series of analyses in source space. These analyses tested for correlations between constituent terms in expected free energy and neuronal responses in source space. These correlations were over trials (and subjects). Because we were dealing with two-second timeseries, we were able to identify the periods of time during decision-making when the correlates were expressed.

      In these analyses, we focused on the induced power of neuronal activity at each point in time, at each brain source. To illustrate the functional specialisation of these neuronal correlates, we present whole-brain maps of correlation coefficients and pick out the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses are presented in a descriptive fashion to highlight the nature and variety of the neuronal correlates, which we unpack in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations."

      Response 3: We deeply thank you for your comments and corresponding suggestions about our description of the regression analysis in the source space. In response to your suggestions, we have added corresponding content in the Results section (EEG results at source level, line 331-347):

      “In the final analysis of the neural correlates of the decision-making process, as quantified by the epistemic and intrinsic values of expected free energy, we presented a series of linear regressions in source space. These analyses tested for correlations over trials between constituent terms in expected free energy (the value of avoiding risk, the value of reducing ambiguity, extrinsic value, and expected free energy itself) and neural responses in source space. Additionally, we also investigated the neural correlate of (the degree of) risk, (the degree of) ambiguity, and prediction error. Because we were dealing with a two-second time series, we were able to identify the periods of time during decision-making when the correlates were expressed. The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).

      In these analyses, we focused on the induced power of neural activity at each time point, in the brain source space. To illustrate the functional specialization of these neural correlates, we presented whole-brain maps of correlation coefficients and picked out the brain region with the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses were presented in a descriptive fashion to highlight the nature and variety of the neural correlates, which we unpacked in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations.”

      Comment 4:

      There was a slight misdirection in the discussion of priors in the active inference framework. The notion that active inference requires a pre-specification of priors is a common misconception. Furthermore, it misses the point that the utility of Bayesian modelling is to identify the priors that each subject brings to the table. This could be easily addressed with something like the following in the discussion:

      "It is a common misconception that Bayesian approaches to choice behaviour (including active inference) are limited by a particular choice of priors. As illustrated in our fitting of choice behaviour above, priors are a strength of Bayesian approaches in the following sense: under the complete class theorem [5, 6], any pair of choice behaviours and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of choice behaviour in terms of some priors. This means that one can, in principle, characterise any given behaviour in terms of the priors that explain that behaviour. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy."

      Response 4: We deeply thank you for your comments and corresponding suggestions about the prior of Bayesian methods. In response to your suggestions, we have added corresponding content in the Discussion section (The strength of the active inference framework in decision-making, line 447-453):

      “However, it may be the opposite. As illustrated in our fitting results, priors can be a strength of Bayesian approaches. Under the complete class theorem (Wald 1947; Brown 1981), any pair of behavioral data and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of behavioral data in terms of some priors. This means that one can, in principle, characterize any given behavioral data in terms of the priors that explain that behavior. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy.”

      Wald, A. (1947). An essentially complete class of admissible decision functions. The Annals of Mathematical Statistics, 549-555.

      Brown, L. D. (1981). A complete class theorem for statistical problems with finite sample spaces. The Annals of Statistics, 1289-1300.

      Reviewer #2 (Public Review):

      Summary:

      Zhang and colleagues use a combination of behavioral, neural, and computational analyses to test an active inference model of exploration in a novel reinforcement learning task.

      Strengths:

      The paper addresses an important question (validation of active inference models of exploration). The combination of behavior, neuroimaging, and modeling is potentially powerful for answering this question.

      Response: We want to express our sincere gratitude for your thorough review of our work and for the valuable comments you have provided. Your attention to detail and dedication to improving the quality of the work are truly commendable. Your feedback has been invaluable in guiding us towards revisions that will strengthen the work. We have made targeted modifications based on most of the comments. However, due to factors such as time and energy constraints, we have not added corresponding analyses for several comments.

      Comment 1:

      The paper does not discuss relevant work on contextual bandits by Schulz, Collins, and others. It also does not mention the neuroimaging study of Tomov et al. (2020) using a risky/safe bandit task.

      Response 1:

      We deeply thank you for your suggestions about the relevant work. We now discussion and cite these representative papers in the Introduction section (line 42-55):

      “The decision-making process frequently involves grappling with varying forms of uncertainty, such as ambiguity - the kind of uncertainty that can be reduced through sampling, and risk - the inherent uncertainty (variance) presented by a stable environment. Studies have investigated these different forms of uncertainty in decision-making, focusing on their neural correlates (Daw et al., 2006; Badre et al., 2012; Cavanagh et al., 2012).

      These studies utilized different forms of multi-armed bandit tasks, e.g the restless multi-armed bandit tasks (Daw et al., 2006; Guha et al., 2010), risky/safe bandit tasks (Tomov et al., 2020; Fan et al., 2022; Payzan et al., 2013), contextual multi-armed bandit tasks (Schulz et al., 2015; Schulz et al., 2015; Molinaro et al., 2023). However, these tasks either separate risk from ambiguity in uncertainty, or separate action from state (perception). In our work, we develop a contextual multi-armed bandit task to enable participants to actively reduce ambiguity, avoid risk, and maximize rewards using various policies (see Section 2.2) and Figure 4(a)). Our task makes it possible to study whether the brain represents these different types of uncertainty distinctly (Levy et al., 2010) and whether the brain represents both the value of reducing uncertainty and the degree of uncertainty. The active inference framework presents a theoretical approach to investigate these questions. Within this framework, uncertainties can be reduced to ambiguity and risk. Ambiguity is represented by the uncertainty about model parameters associated with choosing a particular action, while risk is signified by the variance of the environment's hidden states. The value of reducing ambiguity, the value of avoiding risk, and extrinsic value together constitute expected free energy (see Section 2.1).”

      Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

      Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595-607.

      Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral cortex, 22(11), 2575-2586.

      Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 1-50.

      Tomov, M. S., Truong, V. Q., Hundia, R. A., & Gershman, S. J. (2020). Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications, 11(1), 2371.

      Fan, H., Gershman, S. J., & Phelps, E. A. (2023). Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nature Human Behaviour, 7(1), 102-113.

      Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural representation of unexpected uncertainty during value-based decision making. Neuron, 79(1), 191-201.

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, April). Exploration-exploitation in a contextual multi-armed bandit task. In International conference on cognitive modeling (pp. 118-123).

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, November). Learning and decisions in contextual multi-armed bandit tasks. In CogSci.

      Molinaro, G., & Collins, A. G. (2023). Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biology, 21(7), e3002201.

      Levy, I., Snell, J., Nelson, A. J., Rustichini, A., & Glimcher, P. W. (2010). Neural representation of subjective value under risk and ambiguity. Journal of neurophysiology, 103(2), 1036-1047.

      Comment 2:

      The statistical reporting is inadequate. In most cases, only p-values are reported, not the relevant statistics, degrees of freedom, etc. It was also not clear if any corrections for multiple comparisons were applied. Many of the EEG results are described as "strong" or "robust" with significance levels of p<0.05; I am skeptical in the absence of more details, particularly given the fact that the corresponding plots do not seem particularly strong to me.

      Response 2: We deeply thank you for your comments about our statistical reporting. We have optimized the fitting model and rerun all the statistical analyses. As can be seen (Figure 6, 7, 8, S3, S4, S5), the new regression results are significantly improved compared to the previous ones. Due to the limitation of space, we place the other relevant statistical results, including t-values, std err, etc., on our GitHub (https://github.com/andlab-um/FreeEnergyEEG). Currently, we have not conducted multiple comparison corrections based on Reviewer 1’s comments (Comments 3) “Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations”.

      Author response image 1.

      Comment 3:

      The authors compare their active inference model to a "model-free RL" model. This model is not described anywhere, as far as I can tell. Thus, I have no idea how it was fit, how many parameters it has, etc. The active inference model fitting is also not described anywhere. Moreover, you cannot compare models based on log-likelihood, unless you are talking about held-out data. You need to penalize for model complexity. Finally, even if active inference outperforms a model-free RL model (doubtful given the error bars in Fig. 4c), I don't see how this is strong evidence for active inference per se. I would want to see a much more extensive model comparison, including model-based RL algorithms which are not based on active inference, as well as model recovery analyses confirming that the models can actually be distinguished on the basis of the experimental data.

      Response 3: We deeply thank you for your comments about the model comparison details. We previously omitted some information about the comparison model, as classical reinforcement learning is not the focus of our work, so we put the specific details in the supplementary materials. Now we have placed relevant information in the main text (see the part we have highlighted in yellow). We have now added the relevant information regarding the model comparison in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be found in Eq.S1-11 and the details for the model-based reinforcement learning model can be found in Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python, first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      We have now incorporated model-based reinforcement learning into our comparison models and placed the descriptions of both model-free and model-based reinforcement learning algorithms in the supplementary materials. We have also changed the criterion for model comparison to Bayesian Information Criterion. As indicated by the results, the performance of the active inference model significantly outperforms both comparison models.

      Sorry, we didn't do model recovery before, but now we have placed the relevant results in the supplementary materials. From the result figures, we can see that each model fits its own generated simulated data well:

      “To demonstrate how reliable our models are (the active inference model, model-free reinforcement learning model, and model-based reinforcement learning model), we run some simulation experiments for model recovery. We use these three models, with their own fitting parameters, to generate some simulated data. Then we will fit all three sets of data using these three models.

      The model recovery results are shown in Fig.S6. This is the confusion matrix of models: the percentage of all subjects simulated based on a certain model that is fitted best by a certain model. The goodness-of-fit was compared using the Bayesian Information Criterion. We can see that the result of model recovery is very good, and the simulated data generated by a model can be best explained by this model.”

      Author response image 2.

      Comment 4:

      Another aspect of the behavioral modeling that's missing is a direct descriptive comparison between model and human behavior, beyond just plotting log-likelihoods (which are a very impoverished measure of what's going on).

      Response 4: We deeply thank you for your comments about the comparison between the model and human behavior. Due to the slight differences between our simulation experiments and real behavioral experiments (the "you can ask" stage), we cannot directly compare the model and participants' behaviors. However, we can observe that in the main text's simulation experiment (Figure 3), the active inference agent's behavior is highly consistent with humans (Figure 4), exhibiting an effective exploration strategy and a desire to reduce uncertainty. Moreover, we have included two additional simulation experiments in the supplementary materials, which demonstrate that active inference may potentially fit a wide range of participants' behavioral strategies.

      Author response image 3.

      (An active inference agent with AL=AI=EX=0. It can accomplish tasks efficiently like a human being, reducing the uncertainty of the environment and maximizing the reward.)

      Author response image 4.

      (An active inference agent with AL=AI=0, EX=10. It will only pursue immediate rewards (not choosing the "Cue" option due to additional costs), but it can also gradually optimize its strategy due to random effects.)

      Author response image 5.

      (An active inference agent with EX=0, AI=AL=10. It will only pursue environmental information to reduce the uncertainty of the environment. Even in "Context 2" where immediate rewards are scarce, it will continue to explore.) (a) shows the decision-making of active inference agents in the Stay-Cue choice. Blue corresponds to agents choosing the "Cue" option and acquiring "Context 1"; orange corresponds to agents choosing the "Cue" option and acquiring "Context 2"; purple corresponds to agents choosing the "Stay" option and not knowing the information about the hidden state of the environment. The shaded areas below correspond to the probability of the agents making the respective choices. (b) shows the decision-making of active inference agents in the Stay-Cue choice. The shaded areas below correspond to the probability of the agents making the respective choices. (c) shows the rewards obtained by active inference agents. (d) shows the reward prediction errors of active inference agents. (e) shows the reward predictions of active inference agents for the "Risky" path in "Context 1" and "Context 2".

      Comment 5:

      The EEG results are intriguing, but it wasn't clear that these provide strong evidence specifically for the active inference model. No alternative models of the EEG data are evaluated.

      Overall, the central claim in the Discussion ("we demonstrated that the active inference model framework effectively describes real-world decision-making") remains unvalidated in my opinion.

      Response 5: We deeply thank you for your comments. We applied the active inference model to analyze EEG results because it best fit the participants' behavioral data among our models, including the new added results. Further, our EEG results serve only to verify that the active inference model can be used to analyze the neural mechanisms of decision-making in uncertain environments (if possible, we could certainly design a more excellent reinforcement learning model with a similar exploration strategy). We aim to emphasize the consistency between active inference and human decision-making in uncertain environments, as we have discussed in the article. Active inference emphasizes both perception and action, which is also what we wish to highlight: during the decision-making process, participants not only passively receive information, but also actively adopt different strategies to reduce uncertainty and maximize rewards.

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes a shift from exploration to exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space." Their results show effects in various regions, which they take to indicate that the brain does perform this task through the theorised active inference scheme.

      Strengths:

      This is an interesting two-stage paradigm that incorporates several interesting processes of learning, exploration/exploitation, and information sampling. Although scalp/brain regions showing sensitivity to the active-inference-related quantities do not necessarily suggest what role they play, it can be illuminating and useful to search for such effects as candidates for further investigation. The aims are ambitious, and methodologically it is impressive to include extensive free-energy theory, behavioural modelling, and EEG source-level analysis in one paper.

      Response: We would like to express our heartfelt thanks to you for carefully reviewing our work and offering insightful feedback. Your attention to detail and commitment to enhancing the overall quality of our work are deeply admirable. Your input has been extremely helpful in guiding us through the necessary revisions to enhance the work. We have implemented focused changes based on a majority of your comments. Nevertheless, owing to limitations such as time and resources, we have not included corresponding analyses for a few comments.

      Comment 1:

      Though I could surmise the above general aims, I could not follow the important details of what quantities were being distinguished and sought in the EEG and why. Some of this is down to theoretical complexity - the dizzying array of constructs and terms with complex interrelationships, which may simply be part and parcel of free-energy-based theories of active inference - but much of it is down to missing or ambiguous details.

      Response 1: We deeply thank you for your comments about our work’s readability. We have significantly revised the descriptions of active inference, models, research questions, etc. Focusing on active inference and the free energy principle, we have added relevant basic descriptions and unified the terminology. We have added information related to model comparison in the main text and supplementary materials. We presented our regression results in clearer language. Our research focused on the brain's representation of decision-making in uncertain environments, including expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, ambiguity, and risk.

      Comment 2:

      In general, an insufficient effort has been made to make the paper accessible to readers not steeped in the free energy principle and active inference. There are critical inconsistencies in key terminology; for example, the introduction states that aim 1 is to distinguish the EEG correlates of three different types of uncertainty: ambiguity, risk, and unexpected uncertainty. But the abstract instead highlights distinctions in EEG correlates between "uncertainty... and... risk" and between "expected free energy .. and ... uncertainty." There are also inconsistencies in mathematical labelling (e.g. in one place 'p(s|o)' and 'q(s)' swap their meanings from one sentence to the very next).

      Response 2: We deeply thank you for your comments about the problem of inconsistent terminology. First, we have unified the symbols and letters (P, Q, s, o, etc.) that appeared in the article and described their respective meanings more clearly. We have also revised the relevant expressions of "uncertainty" throughout the text. In our work, uncertainty refers to ambiguity and risk. Ambiguity can be reduced through continuous sampling and is referred to as uncertainty about model parameters in our work. Risk, on the other hand, is the inherent variance of the environment and cannot be reduced through sampling, which is referred to as uncertainty about hidden states in our work. In the analysis of the results, we focused on how the brain encodes the value of reducing ambiguity (Figure 8), the value of avoiding risk (Figure 6), and (the degree of) ambiguity (Figure S5) during action selection. We also analyzed how the brain encodes reducing ambiguity and avoiding risk during belief update (Figure 7).

      Comment 3:

      Some basic but important task information is missing, and makes a huge difference to how decision quantities can be decoded from EEG. For example:

      - How do the subjects press the left/right buttons - with different hands or different fingers on the same hand?

      Response 3: We deeply thank you for your comments about the missing task information. We have added the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 251-253):

      “Each stage was separated by a jitter ranging from 0.6 to 1.0 seconds. The entire experiment consists of a single block with a total of 120 trials. The participants are required to use any two fingers of one hand to press the buttons (left arrow and right arrow on the keyboard).”

      Comment 4:

      - Was the presentation of the Stay/cue and safe/risky options on the left/right sides counterbalanced? If not, decisions can be formed well in advance especially once a policy is in place.

      Response 4: The presentation of the Stay/cue and safe/risky options on the left/right sides was not counterbalanced. It is true that participants may have made decisions ahead of time. However, to better study the state of participants during decision-making, our choice stages consist of two parts. In the first two seconds, we ask participants to consider which option they would choose, and after these two seconds, participants are allowed to make their choice (by pressing the button).

      We also updated the figure of the experiment procedure as below (We circled the time that the participants spent on making decisions).

      Author response image 6.

      Comment 5:

      - What were the actual reward distributions ("magnitude X with probability p, magnitude y with probability 1-p") in the risky option?

      Response 5: We deeply thank you for your comments about the missing task information. We have placed the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 188-191):

      “The actual reward distribution of the risky path in "Context 1" was [+12 (55%), +9 (25%), +6 (10%), +3 (5%), +0 (5%)] and the actual reward distribution of the risky path in "Context 2" was [+12 (5%), +9 (5%), +6 (10%), +3 (25%), +0 (55%)].”

      Comment 6:

      The EEG analysis is not sufficiently detailed and motivated.

      For example,

      - why the high lower-filter cutoff of 1 Hz, and shouldn't it be acknowledged that this removes from the EEG any sustained, iteratively updated representation that evolves with learning across trials?

      Response 6: We deeply thank you for your comments about our EEG analysis. The 1Hz high-pass filter may indeed filter out some useful information. We chose a 1Hz high-pass filter to filter out most of the noise and prevent the noise from affecting our results analysis. Additionally, there are also many decision-related works that have applied 1Hz high-pass filtering in EEG data preprocessing (Yau et al., 2021; Cortes et al., 2021; Wischnewski et al., 2022; Schutte et al., 2017; Mennella et al., 2020; Giustiniani et al., 2020).

      Yau, Y., Hinault, T., Taylor, M., Cisek, P., Fellows, L. K., & Dagher, A. (2021). Evidence and urgency related EEG signals during dynamic decision-making in humans. Journal of Neuroscience, 41(26), 5711-5722.

      Cortes, P. M., García-Hernández, J. P., Iribe-Burgos, F. A., Hernández-González, M., Sotelo-Tapia, C., & Guevara, M. A. (2021). Temporal division of the decision-making process: An EEG study. Brain Research, 1769, 147592.

      Wischnewski, M., & Compen, B. (2022). Effects of theta transcranial alternating current stimulation (tACS) on exploration and exploitation during uncertain decision-making. Behavioural Brain Research, 426, 113840.

      Schutte, I., Kenemans, J. L., & Schutter, D. J. (2017). Resting-state theta/beta EEG ratio is associated with reward-and punishment-related reversal learning. Cognitive, Affective, & Behavioral Neuroscience, 17, 754-763.

      Mennella, R., Vilarem, E., & Grèzes, J. (2020). Rapid approach-avoidance responses to emotional displays reflect value-based decisions: Neural evidence from an EEG study. NeuroImage, 222, 117253.

      Giustiniani, J., Nicolier, M., Teti Mayer, J., Chabin, T., Masse, C., Galmès, N., ... & Gabriel, D. (2020). Behavioral and neural arguments of motivational influence on decision making during uncertainty. Frontiers in Neuroscience, 14, 583.

      Comment 7:

      - Since the EEG analysis was done using an array of free-energy-related variables in a regression, was multicollinearity checked between these variables?

      Response 7: We deeply thank you for your comments about our regression. Indeed, we didn't specify our regression formula in the main text. We conducted regression on one variable each time, so there was no need for a multicollinearity check. We have now added the relevant content in the Results section (“EEG results at source level” section, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).”

      Comment 8:

      - In the initial comparison of the first/second half, why just 5 clusters of electrodes, and why these particular clusters?

      Response 8: We deeply thank you for your comments about our sensor-level analysis. These five clusters are relatively common scalp EEG regions to analyze (left frontal, right frontal, central, left parietal, and right parietal), and we referred previous work analyzed these five clusters of electrodes (Laufs et al., 2006; Ray et al., 1985; Cole et al., 1985). In addition, our work pays more attention to the analysis in source space, exploring the corresponding functions of specific brain regions based on active inference models.

      Laufs, H., Holt, J. L., Elfont, R., Krams, M., Paul, J. S., Krakow, K., & Kleinschmidt, A. (2006). Where the BOLD signal goes when alpha EEG leaves. Neuroimage, 31(4), 1408-1418.

      Ray, W. J., & Cole, H. W. (1985). EEG activity during cognitive processing: influence of attentional factors. International Journal of Psychophysiology, 3(1), 43-48.

      Cole, H. W., & Ray, W. J. (1985). EEG correlates of emotional tasks related to attentional demands. International Journal of Psychophysiology, 3(1), 33-41.

      Comment 9:

      How many different variables are systematically different in the first vs second half, and how do you rule out less interesting time-on-task effects such as engagement or alertness? In what time windows are these amplitudes being measured?

      Response 9 (and the Response for Weaknesses 11): There were no systematic differences between the first half and the second half of the trials, with the only difference being the participants' experience. In the second half, participants had a better understanding of the reward distribution of the task (less ambiguity). The simulation results can well describe these.

      Author response image 7.

      As shown in Figure (a), agents can only learn about the hidden state of the environment ("Context 1" (green) or "Context 2" (orange)) by choosing the "Cue" option. If agents choose the "Stay" option, they will not be able to know the hidden state of the environment (purple). The risk of agents is only related to wh

      ether they choose the "Cue" option, not the number of rounds. Figure (b) shows the Safe-Risky choices of agents, and Figure (e) is the reward prediction of agents for the "Risky" path in "Context 1" and "Context 2". We can see that agents update the expected reward and reduce ambiguity by sampling the "Risky" path. The ambiguity of agents is not related to the "Cue" option, but to the number of times they sample the "Risky" path (rounds).

      In our choosing stages, participants were required to think about their choices for the first two seconds (during which they could not press buttons). Then, they were asked to make their choices (press buttons) within the next two seconds. This setup effectively kept participants' attention focused on the task. And the two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Comment 10:

      In the comparison of asked and not-asked trials, what trial stage and time window is being measured?

      Response 10: We have added relevant descriptions in the main text. The two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Author response image 8.

      Comment 11:

      Again, how many different variables, of the many estimated per trial in the active inference model, are different in the asked and not-asked trials, and how can you know which of these differences is the one reflected in the EEG effects?

      Response 11: The difference between asked trials and not-asked trials lies only in whether participants know the specific context of the risky path (the level of risk for the participants). A simple comparison indeed cannot tell us which of these differences is reflected in the EEG effects. Therefore, we subsequently conducted model-based regression analysis in the source space.

      Comment 12:

      The authors choose to interpret that on not-asked trials the subjects are more uncertain because the cue doesn't give them the context, but you could equally argue that they don't ask because they are more certain of the possible hidden states.

      Response 12: Our task design involves randomly varying the context of the risky path. Only by choosing to inquire can participants learn about the context. Participants can only become increasingly certain about the reward distribution of different contexts of the risky path, but cannot determine which specific context it is. Here are the instructions for the task that we will tell the participants (line 226-231).

      "You are on a quest for apples in a forest, beginning with 5 apples. You encounter two paths: 1) The left path offers a fixed yield of 6 apples per excursion. 2) The right path offers a probabilistic reward of 0/3/6/9/12 apples, and it has two distinct contexts, labeled "Context 1" and "Context 2," each with a different reward distribution. Note that the context associated with the right path will randomly change in each trial. Before selecting a path, a ranger will provide information about the context of the right path ("Context 1" or "Context 2") in exchange for an apple. The more apples you collect, the greater your monetary reward will be."

      Comment 13:

      - The EEG regressors are not fully explained. For example, an "active learning" regressor is listed as one of the 4 at the beginning of section 3.3, but it is the first mention of this term in the paper and the term does not arise once in the methods.

      Response 13: We have accordingly revised the relevant content in the main text (as in Eq.8). Our regressors now include expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, prediction error, (the degree of) ambiguity, reducing ambiguity, and avoiding risk.

      Comment 14:

      - In general, it is not clear how one can know that the EEG results reflect that the brain is purposefully encoding these very parameters while implementing this very mechanism, and not other, possibly simpler, factors that correlate with them since there is no engagement with such potential confounds or alternative models. For example, a model-free reinforcement learning model is fit to behaviour for comparison. Why not the EEG?

      Response 14: We deeply thank you for your comments. Due to factors such as time and effort, and because the active inference model best fits the behavioral data of the participants, we did not use other models to analyze the EEG data. At both the sensor and source level, we observed the EEG signal and brain regions that can encode different levels of uncertainties (risk and ambiguity). The brain's uncertainty driven exploration mechanism cannot be explained solely by a simple model-free reinforcement learning approach.

      Recommendations for the authors:

      Response: We have made point-to-point revisions according to the reviewer's recommendations, and as these revisions are relatively minor, we have only responded to the longer recommendations here.

      Reviewer #1 (Recommendations For The Authors)

      I enjoyed reading this sophisticated study of decision-making. I thought your implementation of active inference and the subsequent fitting to choice behaviour - and study of the neuronal (EEG) correlates - was impressive. As noted in my comments on strengths and weaknesses, some parts of your manuscript with difficult to read because of slight collapses in grammar and an inconsistent use of terms when referring to the mathematical quantities. In addition to the paragraphs I have suggested, I would recommend the following minor revisions to your text. In addition, you will have to fill in some of the details that were missing from the current version of the manuscript. For example:

      Recommendation 1:

      Which RL model did you use to fit the behavioural data? What were its free parameters?

      Response 1: We have now added information related to the comparison models in the behavioral results and supplementary materials. We applied both simple model-free reinforcement learning and model-based reinforcement learning. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ, while the free parameters for the model-based approach are the learning rate α, the temperature parameter γ, and the prior.

      Recommendation 2:

      When you talk about neuronal activity in the final analyses (of time-dependent correlations) what was used to measure the neuronal activity? Was this global power over frequencies? Was it at a particular frequency band? Was it the maximum amplitude within some small window et cetera? In other words, you need to provide the details of your analysis that would enable somebody to reproduce your study at a certain level of detail.

      Response 2: In the final analyses, we used the activity amplitude at each point in the source space for our analysis. Previously, we had planned to make our data and models available on GitHub to facilitate easier replication of our work.

      Reviewer #3 (Recommendations For The Authors)

      Recommendation 1:

      It might help to explain the complex concepts up front, to use the concrete example of the task itself - presumably, it was designed so that the crucial elements of the active inference framework come to the fore. One could use hypothetical choice patterns in this task to exemplify different factors such as expected free energy and unexpected uncertainty at work. It would also be illuminating to explain why behaviour on this task is fit better by the active inference model than a model-free reinforcement learning model.

      Response 1: Thank you for your suggestions. We have given clearer explanations to the three terms in the active inference formula: the value of reducing ambiguity, the value of avoiding risk, and the extrinsic value (Eq.8), which makes it easier for readers to understand active inference.

      In addition, we can simply view active inference as a computational model similar to model-based reinforcement learning, where the expected free energy represents a subjective value, without needing to understand its underlying computational principles or neurobiological background. In our discussion, we have argued why the active inference model fits the participants' behavior better than our reinforcement learning model, as the active inference model has an inherent exploration mechanism that is consistent with humans, who instinctively want to reduce environmental uncertainty (line 435-442).

      “Active inference offers a superior exploration mechanism compared with basic model-free reinforcement learning  (Figure 4 (c)). Since traditional reinforcement learning models determine their policies solely on the state, this setting leads to difficulty in extracting temporal information (Laskin et al., 2020) and increases the likelihood of entrapment within local minima. In contrast, the policies in active inference are determined by both time and state. This dependence on time (Wang et al., 2016) enables policies to adapt efficiently, such as emphasizing exploration in the initial stages and exploitation later on. Moreover, this mechanism prompts more exploratory behavior in instances of state ambiguity. A further advantage of active inference lies in its adaptability to different task environments (Friston et al., 2017). It can configure different generative models to address distinct tasks, and compute varied forms of free energy and expected free energy.”

      Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., & Srinivas, A. (2020). Reinforcement learning with augmented data. Advances in neural information processing systems, 33, 19884-19895.

      Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

      Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: a process theory. Neural computation, 29(1), 1-49.

      Recommendation 2:

      Figure 1A provides a key example of the lack of effort to help the reader understand. It suggests the possibility of a concrete example but falls short of providing one. From the caption and text, applied to the figure, I gather that by choosing either to run or to raise one's arms, one can control whether it is daytime or nighttime. This is clearly wrong but it is what I am led to think by the paper.

      Response 2: Thank you for your suggestion, which we had not considered before. In this figure, we aim to illustrate that "the agent receives observations and optimizes his cognitive model by minimizing variational free energy → the agent makes the optimal action by minimizing expected free energy → the action changes the environment → the environment generates new observations for the agent." We have now modified the image to be simpler to prevent any possible confusion for readers. Correspondingly, we removed the figure of a person raising their hand and the shadowed house in Figure a.

      Author response image 9.

      Recommendation 3:

      I recommend an overhaul in the labelling and methodological explanations for consistency and full reporting. For example, line 73 says sensory input is 's' and the cognitive model is 'q(s),' and the cause of the sensory input is 'p(s|o)' but on the very next line, the cognitive model is 'p(s|o)' and the causes of sensory input are 'q(s).' How this sensory input s relates to 'observations' or 'o' is unclear, and meanwhile, capital S is the set of environmental states. P seems to refer to the generative distribution, but it also means probability.

      Response 3: Thank you for your advice. Now we have revised the corresponding labeling and methodological explanations in our work to make them consistent. However, we are not sure how to make a good modification to P here. In many works, P can refer to a certain probability distribution or some specific probabilities.

      Recommendation 4:

      Even the conception of a "policy" is unclear (Figure 2B). They list 4 possible policies, which are simply the 4 possible sequences of steps, stay-safe, cue-risky, etc, but with no contingencies in them. Surely a complete policy that lists 'cue' as the first step would entail a specification of how they would choose the safe or risky option BASED on the information in that cue

      Response 4: Thank you for your suggestion. In active inference, a policy actually corresponds to a sequence of actions. The policy of "first choosing 'Cue' and then making the next decision based on specific information" differs from the meaning of policy in active inference.

      Recommendation 5:

      I assume that the heavy high pass filtering of the EEG (1 Hz) is to avoid having to baseline-correct the epochs (of which there is no mention), but the authors should directly acknowledge that this eradicates any component of decision formation that may evolve in any way gradually within or across the stages of the trial. To take an extreme example, as Figure 3E shows, the expected rewards for the risky path evolve slowly over the course of 60 trials. The filter would eliminate this.

      Response 5: Thank you for your suggestion. The heavy high pass filtering of the EEG (1 Hz) is to minimize the noise in the EEG data as much as possible.

      Recommendation 6:

      There is no mention of the regression itself in the Methods section - the section is incomplete.

      Response 6: Thank you for your suggestion. We have now added the relevant content in the Results section (EEG results at source level, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ∼ Regressor + Intercept, Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned).”

      Recommendation 7:

      On Lines 260-270 the same results are given twice.

      Response 7: Thank you for your suggestion. We have now deleted redundant content.

      Recommendation 8:

      Frequency bands are displayed in Figure 5 but there is no mention of those in the Methods. In Figure 5b Theta in the 2nd half is compared to Delta in the 1st half- is this an error?

      Response 8: Thank you for your suggestion. It indeed was an error (they should all be Theta) and now we have corrected it.

      Author response image 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. The authors used Hoxb5 reporter mice to isolate LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How the hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in the disease context as well. However, their study is descriptive with remaining questions.

      Weaknesses:

      Comment #1-1: The authors may need conceptual re-framing of their main argument because whether the ST-HSCs used in this study are functionally indeed short-term "HSCs" is questionable. The data presented in this study and their immunophenotypic definition of ST-HSCs (Lineage negative/Sca-1+/c-Kit+/Flk2-/CD34-/CD150+/Hoxb5-) suggest that authors may find hematopoietic stem cell-like lymphoid progenitors as previously shown for megakaryocyte lineage (Haas et al., Cell stem cell. 2015) or, as the authors briefly mentioned in the discussion, Hoxb5- HSCs could be lymphoid-biased HSCs.

      The authors disputed the idea that Hoxb5- HSCs as lymphoid-biased HSCs based on their previous 4 weeks post-transplantation data (Chen et al., 2016). However, they overlooked the possibility of myeloid reprogramming of lymphoid-biased population during regenerative conditions (Pietras et al., Cell stem cell., 2015). In other words, early post-transplant STHSCs (Hoxb5- HSCs) can be seen as lacking the phenotypic lymphoid-biased HSCs.

      Thinking of their ST-HSCs as hematopoietic stem cell-like lymphoid progenitors or lymphoidbiased HSCs makes more sense conceptually as well.

      Response #1-1: We appreciate this important suggestion and recognize the significance of the debate on whether Hoxb5- HSCs are ST-HSCs or lymphoid-biased HSCs.

      HSCs are defined by their ability to retain hematopoietic potential after a secondary transplantation1-2. If Hoxb5- HSCs were indeed lymphoid-biased HSCs, they would exhibit predominantly lymphoid hematopoiesis even after secondary transplantation. However, functional experiments demonstrate that these cells lose their hematopoietic output after secondary transplantation3 (see Fig. 2 in this paper). Based on the established definition of HSCs in this filed, it is appropriate to classify Hoxb5- HSCs as ST-HSCs rather than lymphoid-biased HSCs.

      Additionally, it has been reported that myeloid reprogramming may occur in the early posttransplant period, around 2-4 weeks after transplantation, even in lymphoid-biased populations within the MPP fraction, due to high inflammatory conditions4. However, when considering the post-transplant hematopoiesis of Hoxb5- HSC fractions as ST-HSCs, they exhibit almost the same myeloid hematopoietic potential as LT-HSCs not only during the early 4 weeks after transplantation but also at 8 weeks post-transplantation3, when the acute inflammatory response has largely subsided. Therefore, it is difficult to attribute the myeloid production by ST-HSCs post-transplant solely to myeloid reprogramming.

      References

      (1) Morrison, S. J. & Weissman, I. L. The long-term repopulating subset of hematopoietic stem cells is deterministic and isolatable by phenotype. Immunity 1, 661–673 (1994).

      (2) Challen, G. A., Boles, N., Lin, K. K. Y. & Goodell, M. A. Mouse hematopoietic stem cell identification and analysis. Cytom. Part A 75, 14–24 (2009).

      (3) Chen, J. Y. et al. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature 530, 223–227 (2016).

      (4) Pietras, E. M. et al. Functionally Distinct Subsets of Lineage-Biased Multipotent Progenitors Control Blood Production in Normal and Regenerative Conditions. Cell Stem Cell 17, 35–46 (2015).

      Comment #1-2: ST-HSCs come from LT-HSCs and further differentiate into lineage-biased multipotent progenitor (MPP) populations including myeloid-biased MPP2 and MPP3. Based on the authors' claim, LT-HSCs (Hoxb5- HSCs) have no lineage bias even in aged mice. Then these LT-HSCs make ST-HSCs, which produce mostly memory T cells. These memory T cell-producing ST-HSCs then produce MPPs including myeloid-biased MPP2 and MPP3.

      This differentiation trajectory is hard to accept. If we think Hoxb5- HSCs (ST-HSCs by authors) as a sub-population of immunophenotypic HSCs with lymphoid lineage bias or hematopoietic stem cell-like lymphoid progenitors, the differentiation trajectory has no flaw.

      Response #1-2: Thank you for this comment, and we apologize for the misunderstanding regarding the predominance of memory T cells in ST-HSCs after transplantation. 

      Our data show that ST-HSCs are not biased HSCs that predominantly produce memory T cells, but rather, ST-HSCs are multipotent hematopoietic cells. ST-HSCs lose their ability to self-renew within a short period, resulting in the cessation of ST-HSC-derived hematopoiesis. As a result, myeloid lineage with a short half-life disappears from the peripheral blood, and memory lymphocytes with a long half-life remain (see Figure 5 in this paper). 

      Comment #1-3: Authors' experimental designs have some caveats to support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs can faithfully represent the old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Figure 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of the inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      Response #1-3: We appreciate the reviewer for the comments. We acknowledge that using ten HSCs may not capture the heterogeneity of aging HSCs.

      However, although most of our experiments have used a small number of transplanted cells (e.g., 10 cells), we have conducted functional experiments across Figures 2, 3, 5, 6, S3, and S6, totaling n = 126, equivalent to over 1260 cells. Previous studies have reported that myeloid-biased HSCs constitute more than 50% of the aged HSC population1-2. If myeloidbiased HSCs increase with age, they should be detectable in our experiments. Our functional experiments have consistently shown that Hoxb5+ HSCs exhibit unchanged lineage output throughout life. In contrast, the data presented in this paper indicate that changes in the ratio of LT-HSCs and ST-HSCs may contribute to myeloid-biased hematopoiesis.

      We believe that transplanting aged HSCs into aged recipient mice is crucial to analyzing not only the differentiation potential of aged HSCs but also the changes in their engraftment and self-renewal abilities. We aim to clarify further findings through these experiments in the future.

      References

      (1) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (2) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Comment #1-4: The authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Figure 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Figure 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggests that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. The authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      Response #1-4: Thank you for pointing out that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid

      or lymphoid gene set enrichment, although aged bulk HSCs showed a tendency towards enrichment of myeloid-related genes.

      The actual GSEA result had an FDR > 0.05. Therefore, we cannot claim that bulk HSCs showed significant enrichment of myeloid-related genes with age. Consequently, we have revised the following sentences:

      [P11, L251] Neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid/lymphoid gene set enrichment, while shared myeloid-related genes tended to be enriched in aged bulk-HSCs, although this enrichment was not statistically significant (Fig. 4, F and G).

      In addition to the above, we also found that the GSEA results differ among myeloid gene sets (Fig. 4, D-F; Fig. 4S, C-D). These findings suggest that discussing lineage bias in HSCs using GSEA is challenging. We believe that functional experimental data is crucial. From our functional experiments, when the ratio of LT-HSC to ST-HSC was reconstituted to match the ratio in young Bulk-HSCs (LT= 2:8) or aged bulk-HSCs (LT= 5:5), myeloid-biased hematopoiesis was observed with the aged bulk-HSC ratio. Based on this data, the authors concluded that age-related changes in the ratio between LT-HSCs and ST-HSCs in bulkHSCs cause myeloid-biased hematopoiesis rather than an increase in myeloid gene expression in the aged bulk-HSCs.

      Comment #1-5: Some data are too weak to fully support their claims. The authors claimed that age-associated extramedullary changes are the main driver of myeloid-biased hematopoiesis based on no major differences in progenitor populations upon transplantation of 10 young HSCs into young or old recipient mice (Figure 7F) and relatively low donor-derived cells in thymus and spleen in aged recipient mice (Figure 7G-J). However, they used selected mice to calculate the progenitor populations in recipient mice (8 out of 17 from young recipients denoted by * and 8 out of 10 from aged recipients denoted by * in Figure 7C). In addition, they calculated the progenitor populations as frequency in c-kit positive cells. Given that they transplanted 10 LT-HSCs into "sub-lethally" irradiated mice and 8.7 Gy irradiation can have different effects on bone marrow clearance in young vs old mice, it is not clear whether this data is reliable enough to support their claims. The same concern applies to the data Figure 7G-J. Authors need to provide alternative data to support their claims.

      Response #1-5: Thank you for useful comments. Our claim regarding Fig. 7 is that age-associated extramedullary changes are merely additional drivers for myeloid-biased hematopoiesis are not the main drivers. But we will address the issues pointed out.

      Regarding the reason for analyzing the asterisk mice

      We performed two independent experiments for Fig. 7. In the first experiment, we planned to analyze the BM of recipients 16 weeks after transplantation. However, as shown in Fig. 7B, many of the aged mice died before 16 weeks. Therefore, we decided to examine the BM of the recipient mice at 12 weeks in the second experiment. Below are the peripheral blood results 11-12 weeks after transplantation for the mice used in the second experiment.

      Author response image 1.

      For the second experiment, we analyzed the BM of all eight all eight aged recipients. Then, we selected the same number of young recipients for analysis to ensure that the donor myeloid output would be comparable to that of the entire young group. Indeed, the donor myeloid lineage output of the selected mice was 28.1 ± 22.9%, closely matching the 23.5 ± 23.3% (p = 0.68) observed in the entire young recipient population. 

      That being said, as the reviewer pointed out, it is considerable that the BM, thymus, and spleen of all mice were not analyzed. Hence, we have added the following sentences:

      [P14, L327] We performed BM analysis for the mice denoted by † in Figure 7C because many of the aged mice had died before the analysis.

      [P15, L338] The thymus and spleen analyses were also performed on the mice denoted by † in Figure 7C.

      Regarding the reason for 8.7 Gy.

      Thank you for your question about whether 8.7 Gy is myeloablative. In our previous report1, we demonstrated that none of the mice subjected to pre-treatment with 8.7 Gy could survive when non-LKS cells were transplanted, suggesting that 8.7 Gy is enough to be myeloablative with the radiation equipment at our facility.

      Author response image 2.

      Reference

      (1)  Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      Regarding the normalization of c-Kit in Figure 7F.  

      Firstly, as shown in Supplemental Figures S1B and S1C, we analyze the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in different panels. Therefore, normalization is required to assess the differentiation of HSCs from upstream to downstream. Additionally, the reason for normalizing by c-Kit+ is that the bone marrow analysis was performed after enrichment using the Anti-c-Kit antibody for both upstream and downstream fractions. Based on this, we calculated the progenitor populations as a frequency within the c-Kit positive cells.

      Next, the results of normalizing the whole bone marrow cells (live cells) are shown below. 

      Author response image 3.

      Similar to the results of normalizing c-Kit+ cells, myeloid progenitors remained unchanged, including a statistically significant decrease in CMP in aged mice. Additionally, there were no significant differences in CLP. In conclusion, we obtained similar results between the normalization with c-Kit and the normalization with whole bone marrow cells (live cells).

      However, as the reviewer pointed out, it is necessary to explain the reason for normalization with c-Kit. Therefore, we will add the following description.

      [P21, L502] For the combined analysis of the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in Figures 1B and 7F, we normalized by c-Kit+ cells because we performed a c-Kit enrichment for the bone marrow analysis.

      Reviewer #2:

      Summary:  

      Nishi et al, investigate the well-known and previously described phenomenon of ageassociated myeloid-biased hematopoiesis. Using a previously established HoxB5mCherry mouse model, they used HoxB5+ and HoxB5- HSCs to discriminate cells with long-term (LTHSCs) and short-term (ST-HSCs) reconstitution potential and compared these populations to immunophenotypically defined 'bulk HSCs' that consists of a mixture of LT-HSC and STHSCs. They then isolated these HSC populations from young and aged mice to test their function and myeloid bias in non-competitive and competitive transplants into young and aged recipients. Based on quantification of hematopoietic cell frequencies in the bone marrow, peripheral blood, and in some experiments the spleen and thymus, the authors argue against the currently held belief that myeloid-biased HSCs expand with age. 

      Comment #2-1: While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Figure 3; Figure 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      Response #2-1: Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows:

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 ± 8.9 vs. 42.1 ± 35.5%, p = 0.01), even though n = 10.

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high self-renewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3.

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4±31.5% vs 47.4±39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased.

      Regarding Figure 6, we obtained a statistically significant difference and consider the sample size to be sufficient. 

      In addition, we have performed various functional experiments (Figures 2, 5, 6 and S6), and have obtained consistent results that expansion of myeloid biased HSCs does not occur with aging in Hoxb5+HSCs fraction. Based on the above, we conclude that the LT-HSC fraction does not differ in myeloid differentiation potential with aging.

      Comment #2-2: As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.

      Response #2-2: Thank you for the comments. As the reviewer pointed out, we hope we could reconfirm our results using single-cell level technology in the future.

      On the other hand, we have reported that the ratio of myeloid to lymphoid cells in the peripheral blood changes when the number of HSCs transplanted, or the number of supporting cells transplanted with HSCs, is varied1-2. Therefore, single-cell transplant data need to be interpreted very carefully to determine differentiation potential.

      From this viewpoint, future experiments will combine the Hoxb5 reporter system with a lineage tracing system that can track HSCs at the single-cell level over time. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. We have reflected this comment by adding the following sentences in the manuscript.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system3-4. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Sakamaki T, Kao KS, Nishi K, Chen JY, Sadaoka K, Fujii M, et al. Hoxb5 defines the heterogeneity of self-renewal capacity in the hematopoietic stem cell compartment. Biochem Biophys Res Commun [Internet]. 2021;539:34–41. Available from: https://doi.org/10.1016/j.bbrc.2020.12.077

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (4) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      Comment #2-3: It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation.

      Response #2-3: Thank you for your comment. We apologize for the insufficient explanation. Our data, as shown in Figures 3 and 4, demonstrate that the differentiation potential of LT-HSCs remains unchanged with age. Therefore, rather than suggesting that an increase in LT-HSCs with a consistent differentiation capacity leads to myeloid-biased hematopoiesis, it seems more accurate to highlight that the relative decrease in the proportion of ST-HSCs, which remain in peripheral blood as lymphocytes, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, if we focus on the increase in the ratio of LT-HSCs, it is also plausible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Comment #2-4: Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as<br /> a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std!); b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HScs in competitive transplants (mind low n-numbers and large std!).

      Response #2-4: We appreciate the comments. As mentioned above, we will correct the manuscript regarding the sample size.

      Regarding the interpreting of the lack of increase in the percentage of myeloid progenitor cells in the bone marrow with age, it is instead possible that various confounding factors, such as differentiation shortcuts or changes in the microenviroment, are involved.

      However, even when aged LT-HSCs and young LT-HSCs are transplanted into the same recipient mice, the timing of the appearance of different cell fractions in peripheral blood is similar (Figure 3 of this paper). Therefore, we have not obtained data suggesting that clear shortcuts exist in the differentiation process of aged HSCs into neutrophils or monocytes. Additionally, it is currently consensually accepted that myeloid cells, including neutrophils and monocytes, differentiate from GMPs1. Since there is no changes in the proportion of GMPs in the bone marrow with age, we concluded that the differentiation potential into myeloid cells remains consistent with aging.

      Reference

      (1) Akashi K and others, ‘A Clonogenic Common Myeloid Progenitor That Gives Rise to All Myeloid Lineages’, Nature, 404.6774 (2000), 193–97.

      Strengths: 

      The authors present an interesting observation and offer an alternative explanation of the origins of aged-associated myeloid-biased hematopoiesis. Their data regarding the role of the microenvironment in the spleen and thymus appears to be convincing. 

      Weaknesses: 

      Comment #2-5: "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Figure 3, B and C)."<br /> Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity.

      Response #2-5: Thank you for providing these insights. Regarding the sample size, we have addressed this in Response #2-1.

      Comment #2-6: Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones."<br /> Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

      Response #2-6: Thank you for pointing this out. We apologize for the insufficient explanation. We will explain using Figure 8 from the paper.

      First, our data show that LT-HSCs maintain their differentiation capacity with age, while ST-HSCs lose their self-renewal capacity earlier, so that only long-lived memory lymphocytes remain in the peripheral blood after the loss of self-renewal capacity in ST-HSCs (Figure 8, upper panel). In mouse bone marrow, the proportion of LT-HSCs increases with age, while the proportion of STHSCs relatively decreases (Figure 8, lower panel and Figure S5). 

      Our data show that merely reproducing the ratio of LT-HSCs to ST-HSCs observed in aged mice using young LT-HSCs and ST-HSCs can replicate myeloid-biased hematopoiesis. This suggests that the increase in LT-HSC and the relative decrease in ST-HSC within the HSC compartment with aging are likely to contribute to myeloid-biased hematopoiesis.

      As mentioned earlier, since the differentiation capacity of LT-HSCs remain unchaged with age, it seems more accurate to describe that the relative decrease in the proportion of STHSCs, which retain long-lived memory lymphocytes in peripheral blood, leads to a relative increase in myeloid cells in peripheral blood and thus causes myeloid-biased hematopoiesis.

      However, focusing on the increase in the proportion of LT-HSCs, it is also possible to explain that “with aging, the proportion of LT-HSCs capable of long-term myeloid hematopoiesis increases. As a result, from 16 weeks after transplantation, the influence of LT-HSCs maintaining the long-term ability to produce myeloid cells becomes relatively more significant, leading to an increase in the ratio of myeloid cells in the peripheral blood and causing myeloid-biased hematopoiesis.”

      Reviewer #3:

      Summary:

      In this manuscript, Nishi et al. propose a new model to explain the previously reported myeloid-biased hematopoiesis associated with aging. Traditionally, this phenotype has been explained by the expansion of myeloid-biased hematopoietic stem cell (HSC) clones during aging. Here, the authors question this idea and show how their Hoxb5 reporter model can discriminate long-term (LT) and short-term (ST) HSC and characterized their lineage output after transplant. From these analyses, the authors conclude that changes during aging in the LT/ST HSC proportion explain the myeloid bias observed. 

      Although the topic is appropriate and the new model provides a new way to think about lineage-biased output observed in multiple hematopoietic contexts, some of the experimental design choices, as well as some of the conclusions drawn from the results could be substantially improved. Also, they do not propose any potential mechanism to explain this process, which reduces the potential impact and novelty of the study. Specific concerns are outlined below. 

      Major 

      Comment #3-1: As a general comment, there are experimental details that are either missing or not clear. The main one is related to transplantation assays. What is the irradiation dose? The Methods sections indicates "recipient mice were lethally irradiated with single doses of 8.7 or 9.1 Gy". The only experimental schematic indicating the irradiation dose is Figure 7A, which uses 8.7 Gy. Also, although there is not a "standard", 11 Gy split in two doses is typically considered lethal irradiation, while 9.5 Gy is considered sublethal.

      Response #3-1: We agree with reviewer’s assessment about whether 8.7 Gy is myeloablative. To confirm this, it would typically be necessary to irradiate mice with different dose and observe if they do not survive. However, such an experiment is not ethically permissible at our facility. Instead, in our previous report1, we demonstrated that none of the mice subjected to pretreatment with 8.7 Gy could survive when non-LKS cells were transplanted, suggesting that

      8.7 Gy is enough to be myeloablative with the radiation equipment at our facility.

      Reference

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      Comment #3-2:  Is there any reason for these lower doses? Same question for giving a single dose and for performing irradiation a day before transplant. 

      Response #3-2: We appreciate the reviewer for these important comments. Although the 8.7 Gy dose used at our facility is lower than in other reports, we selected this dose to maintain consistency with our previous experiments. For the same reason, we used a single irradiation, not split.  Regarding the timing of irradiation, the method section specifies that irradiation timing is 12-24 hours prior to transplantation. In most experiments, irradiation is performed at 12 hours. However, due to experimental progress, there were occasional instances where nearly 24 hours elapsed between irradiation and transplantation. We provide this information to ensure accuracy.

      Comment #3-3: The manuscript would benefit from the inclusion of references to recent studies discussing hematopoietic biases and differentiation dynamics at a single-cell level (e.g., Yamamoto et. al 2018; Rodriguez-Fraticelli et al., 2020). Also, when discussing the discrepancy between studies claiming different biases within the HSC pool, the authors mentioned that Montecino-Rodriguez et al. 2019 showed preserved lymphoid potential with age. It would be good to acknowledge that this study used busulfan as the conditioning method instead of irradiation.

      Response #3-3: We agree with this comment and have incorporated this suggestion into the manuscript

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. Additionally, in this report we purified LT-HSCs by Hoxb5 reporter system. In contrast, various LT-HSC markers have been previously reported2-3.  Therefore, it is ideal to validate our findings using other LT-HSC makers.

      [P16, L368] Other studies suggest that blockage of lymphoid hematopoiesis in aged mice results in myeloid-skewed hematopoiesis through alternative mechanisms. However, this result should be interpreted carefully, since Busulfan was used for myeloablative treatment in this study4.   

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      (3) Sanjuan-Pla A, Macaulay IC, Jensen CT, Woll PS, Luis TC, Mead A, et al. Plateletbiased stem cells reside at the apex of the haematopoietic stem-cell hierarchy. Nature. 2013;502(7470):232–6. 

      (4) Montecino-Rodriguez E, Kong Y, Casero D, Rouault A, Dorshkind K, Pioli PD. Lymphoid-Biased Hematopoietic Stem Cells Are Maintained with Age and Efficiently Generate Lymphoid Progeny. Stem Cell Reports. 2019 Mar 5;12(3):584–96. 

      Comment #3-4: When representing the contribution to PB from transplanted cells, the authors show the % of each lineage within the donor-derived cells (Figures 3B-C, 5B, 6B-D, 7C-E, and S3 B-C). To have a better picture of total donor contribution, total PB and BM chimerism should be included for each transplantation assay. Also, for Figures 2C-D and Figures S2A-B, do the graphs represent 100% of the PB cells? Are there any radioresistant cells?

      Response #3-4: Thank you for highlighting this point. Indeed, donor contribution to total peripheral blood (PB) is important information. We have included the donor contribution data for each figure above mentioned.

      Author response image 4.

      In Figure 2C-D and Figure S2A-B, the percentage of donor chimerism in PB was defined as the percentage of CD45.1-CD45.2+ cells among total CD45.1-CD45.2+ and CD45.1+CD45.2+ cells as described in method section.

      Comment #3-5: For BM progenitor frequencies, the authors present the data as the frequency of cKit+ cells. This normalization might be misleading as changes in the proportion of cKit+ between the different experimental conditions could mask differences in these BM subpopulations. Representing this data as the frequency of BM single cells or as absolute numbers (e.g., per femur) would be valuable.

      Response #3-5: We appreciate the reviewer's comment on this point. 

      Firstly, as shown in Supplemental Figures S1B and S1C, we analyze the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in different panels. Therefore, normalization is required to assess the differentiation of HSCs from upstream to downstream. Additionally, the reason for normalizing by c-Kit+ is that the bone marrow analysis was performed after enrichment using the Anti-c-Kit antibody for both upstream and downstream fractions. Based on this, we calculated the progenitor populations as a frequency within the c-Kit positive cells. Next, the results of normalizing the whole bone marrow cells (live cells) are shown in Author response image 2. 

      Similar to the results of normalizing c-Kit+ cells, myeloid progenitors remained unchanged, including a statistically significant decrease in CMP in aged mice. Additionally, there were no significant differences in CLP. In conclusion, similar results were obtained between the normalization with c-Kit and the normalization with whole bone marrow cells (live cells).

      However, as the reviewer pointed out, it is necessary to explain the reason for normalization with c-Kit. Therefore, we will add the following description.

      [P21, L502] For the combined analysis of the upstream (HSC, MPP, Flk2+) and downstream (CLP, MEP, CMP, GMP) fractions in Figures 1B and 7F, we normalized by c-Kit+ cells because we performed a c-Kit enrichment for the bone marrow analysis.

      Comment #3-6: Regarding Figure 1B, the authors argue that if myeloid-biased HSC clones increase with age, they should see increased frequency of all components of the myeloid differentiation pathway (CMP, GMP, MEP). This would imply that their results (no changes or reduction in these myeloid subpopulations) suggest the absence of myeloid-biased HSC clones expansion with age. This reviewer believes that differentiation dynamics within the hematopoietic hierarchy can be more complex than a cascade of sequential and compartmentalized events (e.g., accelerated differentiation at the CMP level could cause exhaustion of this compartment and explain its reduction with age and why GMP and MEP are unchanged) and these conclusions should be considered more carefully.

      Response #3-6: We wish to thank the reviewer for this comment. We agree with that the differentiation pathway may not be a cascade of sequential events but could be influenced by various factors such as extrinsic factors.

      In Figure 1B, we hypothesized that there may be other mechanisms causing myeloidbiased hematopoiesis besides the age-related increase in myeloid-biased HSCs, given that the percentage of myeloid progenitor cells in the bone marrow did not change with age. However, we do not discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B. 

      Our newly proposed theories—that the differentiation capacity of LT-HSCs remains unchanged with age and that age-related myeloid-biased hematopoiesis is due to changes in the ratio of LT-HSCs to ST-HSCs—are based on functional experiment results. As the reviewer pointed out, to discuss the presence or absence of myeloid-biased HSCs based on the data in Figure 1B, it is necessary to apply a system that can track HSC differentiation at single-cell level. The technology would clarify changes in the self-renewal capacity of individual HSCs and their differentiation into progenitor cells and peripheral blood cells. The authors believe that those single-cell technologies will be beneficial in understanding the differentiation of HSCs. Based on the above, the following statement has been added to the text.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty cell transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

      Comment #3-7: Within the few recipients showing good donor engraftment in Figure 2C, there is a big proportion of T cells that are "amplified" upon secondary transplantation (Figure 2D). Is this expected?

      Response #3-7: We wish to express our deep appreciation to the reviewer for insightful comment on this point. As the reviewers pointed out, in Figure 2D, a few recipients show a very high percentage of T cells. The authors had the same question and considered this phenomenon as follows:

      (1) One reason for the very high percentage of T cells is that we used 1 x 107 whole bone marrow cells in the secondary transplantation. Consequently, the donor cells in the secondary transplantation contained more T-cell progenitor cells, leading to a greater increase in T cells compared to the primary transplantation.

      (2) We also consider that this phenomenon may be influenced by the reduced selfrenewal capacity of aged LT-HSCs, resulting in decreased sustained production of myeloid cells in the secondary recipient mice. As a result, long-lived memory-type lymphocytes may preferentially remain in the peripheral blood, increasing the percentage of T cells in the secondary recipient mice.

      We have discussed our hypothesis regarding this interesting phenomenon. To further clarify the characteristics of the increased T-cell count in the secondary recipient mice, we will analyze TCR clonality and diversity in the future.

      Comment #3-8: Do the authors have any explanation for the high level of variability within the recipients of Hoxb5+ cells in Figure 2C?

      Response #3-8: We appreciate the reviewer's comment on this point. As noted in our previous report, transplantation of a sufficient number of HSCs results in stable donor chimerism, whereas a small number of HSCs leads to increased variability in donor chimerism1. Additionally, other studies have observed high variability when fewer than 10 HSCs are transplanted2-3. Based on this evidence, we consider that the transplantation of a small number of cells (10 cells) is the primary cause of the high level of variability observed.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Comment #3-9: Can the results from Figure 2E be interpreted as Hoxb5+ cells having a myeloid bias? (differences are more obvious/significant in neutrophils and monocytes).

      Response #3-9: Thank you for your insightful comments. Firstly, we have not obtained any data indicating that young LT-HSCs are myeloid biased HSCs so far. Therefore, we classify young LT-HSCs as balanced HSCs1. Secondly, our current data demonstrate no significant difference in differentiation capacity between young and aged LT-HSCs (see Figure 3 in this paper). Based on these findings, we interpret that aged LT-HSCs are balanced HSCs, similar to young LT-HSCs.

      Reference

      (1)  Chen JY, Miyanishi M, Wang SK, Yamazaki S, Sinha R, Kao KS, et al. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature. 2016 Feb 10;530(7589):223–7. 

      Comment #3-10: Is Figure 2G considering all primary recipients or only the ones that were used for secondary transplants? The second option would be a fairer comparison.

      Response #3-10: We appreciate the reviewer's comment on this point. We considered all primary recipients in Figure 2G to ensure a fair comparison, given the influence of various factors such as the radiosensitivity of individual recipient mice1. Comparing only the primary recipients used in the secondary transplantation would result in n = 3 (primary recipient) vs. n = 12 (secondary recipient). Including all primary recipients yields n = 11 vs. n = 12, providing a more balanced comparison. Therefore, we analyzed all primary recipient mice to ensure the reliability of our results.

      Reference

      (1) Duran-Struuck R, Dysko RC. Principles of bone marrow transplantation (BMT): providing optimal veterinary and husbandry care to irradiated mice in BMT studies. J Am Assoc Lab Anim Sci. 2009; 48:11–22

      Comment #3-11: When discussing the transcriptional profile of young and aged HSCs, the authors claim that genes linked to myeloid differentiation remain unchanged in the LT-HSC fraction while there are significant changes in the ST-HSCs. However, 2 out of the 4 genes shown in Figure S4B show ratios higher than 1 in LT-HSCs.

      Response #3-11: Thank you for highlighting this important point. As the reviewer pointed out, when we analyze the expression of myeloid-related genes, some genes are elevated in aged LT-HSCs compared to young LT-HSCs. However, the GSEA analysis using myeloid-related gene sets, which include several hundred genes, shows no significant difference between young and aged LT-HSCs (see Figure S4C in this paper). Furthermore, functional experiments using the co-transplantation system show no difference in differentiation capacity between young and aged LT-HSCs (see Figure 3 in this paper). Based on these results, we conclude that LT-HSCs do not exhibit any change in differentiation capacity with aging.

      Comment #3-12: When determining the lymphoid bias in ST-HSCs, the authors focus on the T-cell subtype, not considering any other any other lymphoid population. Could the authors explain this?

      Response #3-12: We thank the reviewer for this comment. We conducted the experiments in Figure 5 to demonstrate that the hematopoiesis observed 16 weeks post-transplantation—when STHSCs are believed to lose their self-renewal capacity—is not due to de novo production of T cells from ST-HSCs. Instead, it is attributed to long-lived memory cells which can persistently remain in the peripheral blood.

      As noted by the reviewer, various memory cell types are present in peripheral blood. Our analysis focused on memory T cells due to the broad consensus on memory T cell markers1. 

      Our findings show that transplanted Hoxb5- HSCs do not continuously produce lymphoid cells, unlike lymphoid-biased HSCs. Rather, the loss of self-renewal capacity in Hoxb5- HSCs makes the presence of long-lived memory cells in the peripheral blood more apparent.

      Reference

      (1)  Yenyuwadee S, Sanchez-Trincado Lopez JL, Shah R, Rosato PC, Boussiotis VA. The evolving role of tissue-resident memory T cells in infections and cancer. Sci Adv. 2022;8(33). 

      Comment #3-13: Based on the reduced frequency of donor cells in the spleen and thymus, the authors conclude "the process of lymphoid lineage differentiation was impaired in the spleens and thymi of aged mice compared to young mice". An alternative explanation could be that differentiated cells do not successfully migrate from the bone marrow to these secondary lymphoid organs. Please consider this possibility when discussing the data.

      Response #3-13: We strongly appreciate the reviewer's comment on this point. In accordance with the reviewer's comment, we have incorporated this suggestion into our manuscript.

      [P15, L343] These results indicate that the process of lymphoid lineage differentiation is impaired in the spleens and thymi of aged mice compared to young mice, or that differentiating cells in the bone marrow do not successfully migrate into these secondary lymphoid organs. These factors contribute to the enhanced myeloid-biased hematopoiesis in peripheral blood due to a decrease in de novo lymphocyte production.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Recommendation #2-1: To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      Response to Recommendation #2-1: Thank you for your important remarks. The power analysis for this experiment shows that power = 0.319, suggesting that more number may be needed. On the other hand, our method for determining the sample size in Figure 3 is as follows:

      (1) First, we checked whether myeloid biased change is detected in the bulk-HSC fraction (Figure S3). The results showed that the difference in myeloid output at 16 weeks after transplantation was statistically significant (young vs. aged = 7.2 ± 8.9 vs. 42.1 ± 35.5%, p = 0.01), even though n = 10.

      (2) Next, myeloid biased HSCs have been reported to be a fraction with high self-renewal ability (2004, Blood). If myeloid biased HSCs increase with aging, the increase in myeloid biased HSCs in LT-HSC fraction would be detected with higher sensitivity than in the bulk-HSC fraction used in Figure S3.

      (3) However, there was no difference not only in p-values but also in the mean itself, young vs aged = 51.4±31.5% vs 47.4±39.0%, p = 0.82, even though n = 8 in Figure 3. Since there was no difference in the mean itself, it is highly likely that no difference will be detected even if n is further increased.

      Regarding Figure S3, 5, 6, S6 and 7, we obtained a statistically significant difference and consider the sample size to be sufficient. 

      Recommendation #2-2: As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.

      Response to Recommendation #2-2: Thank you for the comments. As the reviewer pointed out, we hope we could reconfirm our results using single-cell level technology in the future.

      On the other hand, we have reported that the ratio of myeloid to lymphoid cells in the peripheral blood changes when the number of HSCs transplanted, or the number of supporting cells transplanted with HSCs, is varied1-2. Therefore, single-cell transplant data need to be interpreted very carefully to determine differentiation potential.

      From this viewpoint, future experiments will combine the Hoxb5 reporter system with a lineage tracing system that can track HSCs at the single-cell level over time. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. We have reflected this comment by adding the following sentences in the manuscript.

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty transplantation assays. Therefore, the current theory should be revalidated using single-cell technology. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Sakamaki T, Kao KS, Nishi K, Chen JY, Sadaoka K, Fujii M, et al. Hoxb5 defines the heterogeneity of self-renewal capacity in the hematopoietic stem cell compartment. Biochem Biophys Res Commun [Internet]. 2021;539:34–41. Available from: https://doi.org/10.1016/j.bbrc.2020.12.077

      Minor points:

      Recommendation #2-3: Figure 1: "Comprehensive analysis of hematopoietic alternations with age shows a discrepancy of age-associated changes between peripheral blood and bone marrow"

      [Comment to the authors]: For clarity, the nature of the discrepancy should be stated clearly.

      Response to Recommendation #2-3: Thank you for this important comment. Following the reviewer’s recommendation, we have revised the manuscript as follows

      [P7, L139] Our analysis of hematopoietic alternations with age revealed that age-associated transition patterns of immunophenotypically defined HSC and CMP in BM were not paralleled with myeloid cell in PB (Fig. 1 C).

      Recommendation #2-4: Figure 1B "(B) Average frequency of immunophenotypically defined HSC and progenitor cells in BM of 2-3-month mice (n = 6), 6-month mice (n = 6), 12-13-month mice (n = 6), {greater than or equal to} 23-month mice (n = 7).

      [Comment to the authors]: It should be stated in the figure and legend that the values are normalized to the 2-3-month-old mice.

      Response to Recommendation #2-4: Thank you for this comment. Figure 1B presents the actual measured values of each fraction in c-Kit positive cells in the bone marrow, without any normalization.

      Recommendation #2-5: "We 127 found that the frequency of immunophenotypically defined HSC in BM rapidly increased 128 up to the age of 12 months. After the age, they remained plateaued throughout the 129 observation period (Fig. 1 B)."

      [Comment to the authors]: The evidence for a 'plateau', where HSC numbers don't change after 12 months is weak. It appears that the numbers increase continuously (although less steep) after 12 months. I thus recommend adjusting the wording to better reflect the data.

      Response to Recommendation #2-5: We thank the reviewer for the comments above and have incorporated these suggestions in our revision as follows. 

      [P6, L126] We found that the frequency of immunophenotypically defined HSC in BM rapidly increased up to the age of 12 months. After the age, the rate of increase in their frequency appeared to slow down.

      Recommendation #2-6: Figure 2G: [Comment to the authors]: Please add the required statistics, please check carefully all figures for missing statistical tests.

      Response to Recommendation #2-6: Thank you for these important comments. In response, we have added the results of the significance tests for Figures 1A, 1C, 4C, and S5.

      Recommendation #2-7: "If bulk-HSCs isolated from aged mice are already enriched by myeloid-biased HSC clones, we should see more myeloid-biased phenotypes 16 weeks after primary and the secondary transplantation. However, we found that kinetics of the proportion of myeloid cells in PB were similar across primary and the secondary transplantation and that the proportion of myeloid cells gradually decreased over time (Fig. 2 G). These results suggest the following two possibilities: either myeloid-biased HSCs do not expand in the LT-HSC fraction, or the expansion of myeloid-biased clones in 2-year-old mice has already peaked."

      [Comment to the authors]: Other possible explanations include that the observed reduction in myeloid reconstitution over 16 weeks reflects the time required to return to homeostasis. In other words, it takes time until the blood system approaches a balanced output.

      Response to Recommendation #2-7: We agree with the reviewer's comment. As the reviewer pointed out, the gradual decrease in the proportion of myeloid cells over time is not related to our two hypotheses in this part of the manuscript but rather to the hematopoietic system's process of returning to a homeostatic state after transplantation. Therefore, the original sentence could be misleading, as it is part of the section discussing whether age-associated expansion of myeloid-biased HSCs is observed. Based on the above, we have revised the sentence as follows.

      [P8, L179] However, we found that kinetics of the proportion of myeloid cells in PB were similar across the primary and the secondary transplantation (Fig. 2 G). These results suggest the following two possibilities: either myeloid-biased HSCs do not expand in the LTHSC fraction, or the expansion of myeloid-biased clones in 2-year-old mice has already peaked.

      Recommendation #2-8: It is also important to consider that the transplant results are highly variable (see large standard deviation), therefore the sensitivity to detect smaller but relevant changes is low in the shown experiments. As the statistical analysis of these experiments is missing and the power seems low these results should be interpreted with caution. For instance, it appears that the secondary transplants on average produce more myeloid cells as expected and predicted by the classical clonal expansion model.

      Regarding "expansion of myeloid-biased clones in 2-year-old mice has already peaked". This is what the author suggested above. It might thus not be surprising that HSCs from 2-year-old mice show little to no increased myeloid expansion.

      Response to Recommendation #2-8: Thank you for providing these insights. The primary findings of our study are based on functional experiments presented in Figures 2, 3, 5, 6, and 7. In Figure 3, there was no significant difference between young and aged LT-HSCs, with mean values of 51.4±31.5% and 47.4±39.0%, respectively (p = 0.82). Given the lack of difference in the mean values, it is unlikely that increasing the sample size would reveal a significant change. For ethical reasons, to minimize the use of additional animals, we conclude that LT-HSCs exhibit no change in lineage output throughout life based on the data in Figure 3. Statistically significant differences observed in Figures 2, 5, 6, and 7 further support our conclusions.

      Additionally, because whole bone marrow cells were transplanted in the secondary transplantation, there may be various confounding factors beyond the differentiation potential of HSCs. Therefore, we consider that caution is necessary when evaluating the differentiation capacity of HSCs in the context of the second transplantation.

      Recommendation #2-9: Figure 7C: [Comment to the authors]: The star * indicates with analyzed BM. As stars are typically used as indicators of significance, this can be confusing for the reader. I thus suggest using another symbol.

      Response to Recommendation #2-9: We appreciate the reviewer for this comment and have incorporated the suggestion in the revised manuscript. We have decided to use † instead of the star*.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation #3.1: In Figure 1A, the authors show the frequency of PB lineages (lymphoid vs myeloid) in mice of different ages. It would be great if they could show the same data for each subpopulation including these two main categories individually (granulocytes, monocytes, B cells, T cells...).

      Response to Recommendation #3-1: We thank for this suggestion. We provide the frequency of PB lineages (granulocytes, monocytes, B cells, T cells, and NK cells) in mice of different ages.

      Author response image 5.

      Average frequency of neutrophils, monocytes, B cells, T cells, and NK cells in PB analyzed in Figure 1A. Dots show all individual mice. *P < 0.05. **P < 0.01. Data and error bars represent means ± standard deviation. 

      Recommendation #3.2: It would be great if data from young mice could be shown in parallel to the graphs in Figure 2A.

      Response to Recommendation #3-2: We thank the reviewer for the comments above and have incorporated these suggestions in Figure 2A. 

      [P34, L916] (A) Hoxb5 reporter expression in bulk-HSC, MPP, Flk2+, and Lin-Sca1-c-Kit+ populations in the 2-year-old Hoxb5-tri-mCherry mice (Upper panel) and 3-month-old Hoxb5_tri-mCherry mice (Lower panel). Values indicate the percentage of mCherry+ cells ± standard deviation in each fraction (_n = 3). 

      Recommendation #3.3: Do the authors have any explanation for the high level of variability within the recipients of Hoxb5+ cells in Figure 2C?

      Response to Recommendation #3-3: Thank you for providing these insights. As noted in our previous report, transplantation of a sufficient number of HSCs results in stable donor chimerism, whereas a small number of HSCs leads to increased variability in donor chimerism1. Additionally, other studies have observed high variability when fewer than 10 HSCs are transplanted2-3. Based on this evidence, we consider that the transplantation of a small number of cells (10 cells) is the primary cause of the high level of variability observed.

      References

      (1) Nishi K, Sakamaki T, Sadaoka K, Fujii M, Takaori-Kondo A, Chen JY, et al. Identification of the minimum requirements for successful haematopoietic stem cell transplantation. Br J Haematol. 2022;196(3):711–23. 

      (2) Dykstra B, Olthof S, Schreuder J, Ritsema M, Haan G De. Clonal analysis reveals multiple functional defects of aged murine hematopoietic stem cells. J Exp Med. 2011 Dec 19;208(13):2691–703. 

      (3) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      Recommendation #3.4: Are the differences in Figure 3D statistically significant? If yes, please add statistics. Same for Figure 4C.

      Response to Recommendation #3-4: Thank you for providing these insights. For Figure 3D, we performed an ANOVA analysis for each fraction; however, the results were not statistically significant. In contrast, for Figure 4C, we have added the results of significance tests for comparisons between Young LT-HSC vs. Young Bulk-HSC.

      Recommendation #3.5: As a general comment, although the results in this study are interesting, the use of a Hoxb5 lineage tracing mouse model would be more valuable for this purpose than the Hoxb5 reporter used here. The lineage tracing model would allow for the assessment of lineage bias without the caveats introduced by the transplantation assays.

      Response to Recommendation #3-5: We appreciate the reviewer for the important comments. Following the reviewer’s recommendation, we have revised the manuscript as follows

      [P19, L451] In contrast, our findings should be considered in light of some limitations. In this report, we primarily performed ten to twenty transplantation assays. Therefore, the current theory should be revalidated using single-cell technology with lineage tracing system1-2. This approach will investigate changes in the self-renewal capacity of individual HSCs and their subsequent differentiation into progenitor cells and peripheral blood cells. 

      References

      (1) Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, et al. LargeScale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell [Internet]. 2018;22(4):600-607.e4. Available from: https://doi.org/10.1016/j.stem.2018.03.013

      (2) Rodriguez-Fraticelli AE, Weinreb C, Wang SW, Migueles RP, Jankovic M, Usart M, et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature [Internet]. 2020;583(7817):585–9. Available from: http://dx.doi.org/10.1038/s41586-020-2503-6

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer#1:

      Comment #1: It is unclear how the fraction of NK cell populations is quantified in the spatial-seq datasets. Figures display spatial data with expression scores, but the method for calculating the score and determining NK cell presence in tumor tissue is ambiguous. Clarification is needed on whether the identification relied solely on visual inspection or if quantitative analyses using other criteria were conducted.

      Thank you for your questions. We removed the background and made the accordingly modifications according to your demand. We used the AddModuleScore function in Seurat to quantify the main immune subpopulations in spatial-seq using the gene sets identified in single-cell-seq. Additionally, the tumor and non-tumor region was identified by immunohistochemistry as well as cell clusters in spatial-seq, it is rough that we can't quantify the NK cell presence in each region precisely. The consolation is that the differences of NK cell presence in tumor and non-tumor region is observable by visual inspection. The methodology has been supplemented in the revised manuscript (line 190-193).

      Comment #2: The authors do not provide a clear definition of "resting" NK cells. It remains unclear whether they refer to a senescent state or a non-matured NK cell population. Furthermore, the criteria used to define resting and activated cells based on the expression of KIR2DL4, GPR183, GRP171, CD69, IFNG, GZMK, TTC38, CD160, and PLEKNF1 in Figure 4 are not well-defined. The expression patterns of these genes in Figure 4D are not distinct, and it is unclear which combination of genes was used to classify the populations. Clarification is needed on whether the presence of GZMK alone defines resting NK cells, or if the presence of any of the described genes (GZMK, TTC38, or CD160) is sufficient. Additionally, the method used for this classification, whether visual or algorithm-based, should be described.

      Thank you for your question. The resting and activated NK cells was defined by the preferential expression of the described resting genes (AZU, BPI, CAMP, CD160,CD2, CDHR1, CEACAM8, DEFA4, ELANE, GFI1, GZMK, KLRC4, MGAM, MS4A3, NME8, PLEKHF1, TEP1, TRBC1, TTC38, ZNF135) and activated NK genes (APOBEC3G, APOL6, CCL4, CCND2, CD69, CDK6, CSF2, DPP4, FASLG, GPR171, GPR18, GRAP2, IFNG, KIR2DL4, KIR2DS4, LTA, LTB, NCR3, OSM, PTGER2, SOCS1, TNFSF14) in CIBERSORT. Actually, these marker genes were not specifically expressed in a single NK cells subset. On the other hand, combined with further flow cytometric analysis verification, the resting NK cell tend to be a decidual-like NK cells and tumor- infiltrated NK cells with higher expression of CD9, CD49a and PD-1.

      Comment #3: Criteria used to define high or low NK cell presence/infiltration in Figure 5 are not described in the main text or figure legend. Since, the claim that the presence of the resting or activated NK cells predicts cancer prognosis is based on this figure, this needs to be clearly described.

      Thank you for your questions. The activated and resting NK cell percentage in TCGA and GSE29623 was determined by CIBERSORT. Additionally, the infiltration of activated and resting NK cell was also determined by the AddModuleScore function using the gene sets of activated and resting NK cell identified in single-cell-seq, the differences of activated and resting NK cell presence in tumor and non-tumor region is also determined by visual inspection. We have amended in the main text and figure legend in the revised manuscript.

      Comment #4: The absence of FMO controls for KIR2DL4 or GZMK and the lack of increase in GZMK expression during co-culture with tumour lines raises concerns since GZMK was used as a defining feature of resting NK cells.

      Thank you for your questions. We did a new batch of flow experiments and FMO controls of all the markers used in the experiments were set up to define the precise positive gate locations.

      Author response image 1.

      The positive gate locations of CD56, GZMK, KIR2DL4, CD9, CD49a, PD-1 defined according to the FMO control.

      Comment #5: All the co-cultures were performed with tumour cell line only and no healthy cells, such as human foreskin fibroblasts, were used as control. In the absence of a non-tumour cell line, it is very difficult to draw any conclusions. Furthermore, to claim that resting or activated NK cells are responsible for tumour migration or proliferation, it is important to at least isolate resting and activated NK cells ex vivo and culture with tumour lines, instead of NK cell lines.

      Thank you for your questions. According to your suggestion, NK cells were co-cultured with human foreskin fibroblasts, the phenotype was identified by Flow cytometry. When co-cultured with HFF in direct contact (CN group), NK cells were also tending towards tissue infiltration state (high expression of CD9). However, the domestication effect is significantly reduced compared to co-culturing with tumor cells. Additionally, unlike supernatant of CNS group (NK and HCT were in contact) from NK and HCT co-culture system could significantly increase the migration of fresh HCT, fresh HCT underwent a limited increase (no statistical significance was found) in migration when cultured in the supernatant from the co-culture system in which NK and HFF were in contact (CNS group), but not when co-cultures were performed in the cell supernatant (SNS group) and fresh medium (MNS group). Finally, we tried to isolate resting and activated NK cells from fresh colon cancer surgical specimen. Unfortunately, the NK cells were too few to perform further functional experiments such as migration and proliferation.

      Author response image 2.

      Phenotype switch of NK cells in different co-cultured system and the corresponding NK cell-mediated effect on cell migration of fresh colon cancer cell (HCT-116). A-B: NK cells underwent phenotype switch (high expression of CD9) when cocultured with HCT and HFF, the phenotype switch was more obvious when co-cultured with HCT. CN: NK cells cocultured with HCT/HFF; SN: NK cells cocultured with supernatant of HCT/HFF; MN: NK cells cocultured in fresh medium. C-E: Transwell assay showed the only tumor co-cultured NK mediated the inductive effect on cell migration of colon cancer cell (HCT-116). CNS: Colon cancer cells were cultured in the supernatant from co-culture system that NK and HCT/HFF were cultured in direct contact; SNS: Colon cancer cells were cultured in the supernatant from co-culture system that NK cocultured with supernatant of HCT/HFF; MNS: Colon cancer cells were cultured in the fresh medium.

      Comment #6: It seems that flow cytometric analyses and GZMK and KIR2DL4 staining were performed without cell permeabilization. Could authors confirm if this is accurate, or if they performed intracellular staining instead?

      Thank you for your questions. For GZMK, which known as the secretory protein, flow cytometric analyses were performed both with (Fig.3) and without cell fixation and permeabilization, no significant differences were found among each group. The difference is that GZMK was nearly all negative without fixation and permeabilization while it is all positive with fixation and permeabilization. Conditions of flow cytometry analyses for GZMK may need further optimization or GZMK may not be a suitable flow cytometric marker for resting NK cells. On the other hand, for membrane protein such as CD56, CD9, CD49a, KIR2DL4, PD-1, staining was performed without cell permeabilization.

      Author response image 3.

      Phenotype switch (CD56+, GZMK+) of NK cells was analyzed by FACS after fixation and permeabilization in different co-cultured groups. CN: NK cells cocultured with colon cancer cells; SN: NK cells cocultured with supernatant of cancer cells; MN: NK cells cocultured in fresh medium.

      Comment #7: The identity of the published datasets used for analysis is not provided, and references are not cited in the results section.

      Thank you for your questions. We are sorry for the neglect of our previous work. We have added the information in the revised manuscript (section of Materials and Methods) (Line 123-128).

      Comment #8: References are difficult to locate, as the main text follows APA style while the reference section is organized numerically with no clear order.

      Thank you for your questions. We have modified the format of the references in the revised manuscript.

      Comment #9: Figure 3 shows volcano plots showing DEG genes between tumor and healthy tissue NK cells are not described clearly, and authors did not discuss the significance of these genes, highlighted in the plot.

      Thank you for your questions. Volcano plots of Figure 3 showed the DEGs between colon cancer with metastasis and without metastasis in TCGA database. We focused on the genes which were enriched in the pathway of “Natural killer cell mediated cytotoxicity” and found nearly all the genes enriched in the pathway were down-regulated in the colon cancer with metastasis. We have modified the description in the result section and added the description of importance of these genes in the discussion section in the revise manuscript (Line 322-326).

      Comment #10: The meaning of "M0" and "M1" in Figures 5A and 5B is unclear and should be defined in the text.

      Thank you for your questions. "M0" and "M1" in Figure 5A and 5B means “colon cancer without metastasis” and “colon cancer with metastasis”, respectively. We have modified in the revise manuscript (Line 350-354).

      Comment #11: Terms such as "dynamic remodelling of NK cells" and "landscape of NK cells" are used without explanation, necessitating clarification of their meaning.

      Thank you for your questions. We have modified in the revise manuscript (Line 331-334).

      Comment #12: In vitro assays are described vaguely, making it difficult for readers to understand. More clarity is needed in describing these assays.

      Thank you for your questions. We have added clarification in the revise manuscript (Line 205-211).

      Reviewer #2:

      Comment #1: This manuscript investigates the role of the abundant NK cells that are observed in colon cancer liver metastasis using sequencing and spatial approaches in an effort to clarify the pro and anti-tumorigenic properties of NK cells. This descriptive study characterises different categories of NK cells in tumor and tumor-adjacent tissues and some correlations. An attempt has been made using pseudotime trajectory analysis but no models around how these NK cells might be regulated are provided.

      Thank you for your questions. The single-cell sequencing data enrolled in this study are CD45 positive immune cells and do not involve tumor cells, cellular communication analysis between NK cells and tumor cells cannot be conducted. The change process of NK can only be predicted through pseudotime trajectory analysis. Our hypothesis is that tumor cells domesticate NK cells into a tumor- infiltrated NK cells through direct contact, and flow cytometry experiments have also confirmed that tumor cells can only have such domestication through direct contact with NK cells (with prominent high expression of CD9). However, the detailed mechanism remained unclear.

      Comment #2: A small number of patients are analyzed in this study. The descriptive gene markers, while interesting, need to be further validated to understand how strong this analysis might be and its potential application.

      Thank you for your questions. The sample size included in this study is indeed a bit small, which is also a limitation of our study. However, this is the only large sample single-cell sequencing dataset could be found that includes primary colon cancer tissues, paired paratumor normal colon tissues, paired liver metastatic cancer tissue, and paired paratumor normal liver tissues. We will expand the sample size to further verify the current conclusion in subsequent experiments. In addition, the marker genes of different NK groups used in this study refer to the CIBERSORT's classification of activated NK cells and resting NK cells, which is a widely recognized indicator. We will verify the expression and clinical application value of the screened genes in tissues in subsequent studies.

      Comment #3: Figure 1C and other figures throughout the paper. It is not clear how marker genes were selected.

      Thank you for your questions. The marker genes displayed in the Figure.3C were the highly variable genes of each cell group as well as the marker genes of each immune cells, such as T cells (CD3D, CD3E), NK cells (NKG7, KLRD1), monocytes (LYZ, S100A8, S100A9), B cells (CD79A), plasma cells (JCHAIN, IGHA1, IGHA2), Neutrophils (CXCL8, FCGR3B).

      Comment #4: Figure 1E. P and T have not been defined. Lines should not connect the datasets as they are independent assessments.

      Thank you for your questions. P and T means paratumor normal tissues and tumor tissues, respectively. Which have been added in the caption of Figure 1E. Additionally, the single cell sequencing samples included in the study were paired, with primary colon cancer tissues, paired normal tissues adjacent to colon cancer, paired liver metastatic cancer tissue, and paired normal liver tissues from 20 colon cancer patients with liver metastasis, paired test analysis was thus performed.

      Comment #5: Figure 2C. It is unclear what ST-P1 means. This is not a particularly informative figure.

      Thank you for your questions. We are sorry that it was our annotation error. Actually, it is the spatial transcriptome of the primary colon cancer tissue and liver metastasis tissue of four patients. We have made the modifications in the revised manuscript.

      Comment #6: Multiple figures - abbreviations are used but not provided in the legend. They occur in the text but are not directly related to the figures where they are used to label axes or groups.

      Thank you for your questions. We have rechecked and made corresponding modifications in the revised manuscript.

      Comment #6: Patients: it is not clear what other drugs patients have been exposed to or basic data (sex, age, underlying conditions etc)

      Thank you for your questions. The baseline data of the patient of SC dataset and ST dataset were showed in the Table.1 and Table.2 followed, respectively. They were not presented before as no patients characteristics related analysis was performed in the current study.

      Author response table 1.

      The baseline data of patient from single cell sequencing database.

      Author response table 2.

      The baseline data of patient from spatial transcriptome database.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) In the "Introduction" section, an important aspect that requires attention pertains to the discussion surrounding the heterodimerization of CXCR4 and CCR5. Notably, the manuscript overlooks a recent study (https://doi.org/10.1038/s41467-023-42082-z) elucidating the mechanism underlying the formation of functional dimers within these G protein-coupled receptors (GPCRs)…The inclusion of this study within the manuscript would significantly enrich the contextual framework of the work, offering readers a comprehensive understanding of the current knowledge surrounding the structural dynamics and functional implications of CXCR4 and CCR5 heterodimerization.

      We thank the reviewer for his/her recommendation to enrich the contextual framework of our study. The Nature Communications paper by Di Marino et al. was published after we sent the first version of our manuscript to eLife, and therefore was not included in the discussion. As the reviewer rightly indicates, this paper elucidates the mechanism underlying the formation of functional dimers within CCR5 and CXCR4. Using metadynamics approaches, the authors emphasize the importance of distinct transmembrane regions for dimerization of the two receptors. In particular, CXCR4 shows two low energy dimer structures and the TMVI-TMVII helices are the preferred interfaces involved in the protomer interactions in both cases. Although the study uses in silico techniques, it also includes the molecular binding mechanism of CCR5 and CXCR4 in the membrane environment, as the authors generate a model in which the receptors are immersed in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) phospholipid bilayer with 10% cholesterol. This is an important point in this study, as membrane lipids also interact with membrane proteins, and the lipid composition affects CXCR4 oligomerization (Gardeta S.R. et al. Front. Immunol. 2023). In particular, Di Marino et al. find a cholesterol molecule placed in-between the two CXCR4 protomers where it engages a series of hydrophobic interactions with residues including Leu132, Val214, Leu216 and Phe249. Then, the polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes protomer binding. In our hands, the F249L mutation in CXCR4 reverted the antagonism of AGR1.137, suggesting that the compound binds, among others, this residue. We should, nonetheless, indicate that we analyzed receptor oligomerization and not CXCR4 dimerization, which was the main object of the Di Marino et al. study. It is therefore also plausible that other residues than those described as essential for CXCR4 dimerization might participate in receptor oligomerization. We can speculate that AGR1.137 might affect cholesterol binding to CXCR4 and, therefore, alter dimerization/oligomerization. Additionally, the CXCR4 x-ray structure with PDB code 3ODU (Wu B. et al. Science, 2010) experimentally shows the presence of two fatty acid molecules in contact with both TMV and TMVI. These molecules closely interact with hydrophobic residues in the protein, thereby stabilizing it in a hydrophobic environment. Although more experiments will be needed to clarify the mechanism involved, our results suggest that cholesterol and/or other lipids also play an important role in CXCR4 oligomerization and function, as seen for other GPCRs (Jakubik J. & ElFakahani E.E. Int J Mol Sci. 2021). However, we should also consider that other factors not included in the analysis by Di Marino et al. can also affect CXCR4 oligomerization; for instance, the co-expression of other chemokine receptors and/or other GPCRs that heterodimerize with CXCR4 might affect CXCR4 dynamics at the cell membrane, similar to other membrane proteins such as CD4, which also forms complexes with CXCR4 (Martinez-Muñoz L. et al. Mol. Cell 2018).

      The revised discussion contains references to the study by Di Marino et al. to enrich the contextual framework of our data.

      (2) In "various sections" of the manuscript, there appears to be confusion surrounding the terminology used to refer to antagonists. It is recommended to provide a clearer distinction between allosteric and orthosteric antagonists to enhance reader comprehension. An orthosteric antagonist typically binds to the same site as the endogenous ligand, directly blocking its interaction with the receptor. On the other hand, an allosteric antagonist binds to a site distinct from the orthosteric site, inducing a conformational change in the receptor that inhibits the binding of the endogenous ligand. By explicitly defining the terms "allosteric antagonist" and "orthosteric antagonist" within the manuscript, readers will be better equipped to discern the specific mechanisms discussed in the context of the study.

      The behavior of the compounds described in our manuscript (AGR1.35 and AGR1.137) fits with the definition of allosteric antagonists, as they bind on a site distinct from the orthosteric site, although they only block some ligand-mediated functions and not others. This would mean that they are not formally antagonists and should be not considered as allosteric compounds, as their binding on CXCR4 does not alter CXCL12 binding, although they might affect its affinity. In this sense, our compounds respond much better to the concept of negative allosteric modulators (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). They act by binding on a site distinct from the orthosteric site and selectively block some downstream signaling pathways but not others induced by the same endogenous agonist.

      To avoid confusion and to clarify the role of the compounds described in this study, we now refer to them as negative allosteric modulators along the manuscript.

      (3) In the Results section, the computational approach employed for "screening small compounds targeting CXCR4, particularly focusing on the inhibition of CXCL12-induced CXCR4 nanoclustering", requires clarification due to several points of incomprehension. The following recommendations aim to address these concerns and enhance the overall clarity of the section:

      (1) Computational Approach and Binding Mode Description: 

      -Explicitly describe the methodology for identifying the pocket/clef area in angstroms (Å) on the CXCR4 protein structure. Include details on how the volume of the cleft enclosed by TMV and TMVI was determined, as this information is not readily apparent in the provided reference (https://doi.org/10.1073/pnas.1601278113).

      The identification of the cleft was based on the observations by Wu et al. (Wu B. et al. Science 2010) who described the presence of bound lipids in the area formed by TMV and VI, and those of Wescott et al. (Wescott M.P. et al. Proc. Natl. Acad. Sci. 2016) on the importance of TMVI in the transmission of conformational changes promoted by CXCL12 on CXCR4 towards the cytoplasmic surface of the receptor to link the binding site with signaling activation. Collectively, these results, and our previous data on the critical role of the N-terminus region of TMVI for CXCR4 oligomerization (Martinez-Muñoz L. et al. Mol. Cell 2018), focused our in silico screening to this region. Once we detected that several compounds bound CXCR4 in this region, the cleavage properties were calculated by subtracting the compound structure. The resulting PDB was analyzed using the PDBsum server (Laskowski R.A. et. al. Protein Sci. 2018). Volume calculations were obtained using the server analyzing surface clefts by SURFNET (Laskowski R. A. J. Mol. Graph. 1995). The theoretical interaction surface between the selected compounds and CXCR4 and the atomic distances between the protein residues and the compounds was calculated using the PISA server (Krissinel E. & Henrick K. J. Mol. Biol. 2007) (Fig. I, only for review purposes). The analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1,381 Å3 that were not connected to the orthosteric site. In the case of AGR1.137, the data revealed two distinct clefts of 790 Å3 and 580 Å3 (Fig. I, only for review purposes). These details have been included in the revised manuscript (New Fig. 1A, Supplementary Fig 8A, B).

      (4) Clarify the statement regarding the cleft being "surface exposed for interactions with the plasma membrane," particularly in the context of its embedding within the membrane.

      For GPCRs, transmembrane domains represent binding sites for bioactive lipids that play important functional and physiological roles (Huwiler A. & Zangemeister-Wittke U. Pharmacol. Ther. 2018). The channel between TMV and TMVI connects the orthosteric chemokine binding pocket to the lipid bilayer and is occupied by an oleic acid molecule, according to the CXCR4 structure published in 2010 (Wu B. et al. Science 2010). In addition, the target region contains residues involved in cholesterol (and perhaps other lipids) engagement (Di Marino et al. Nat. Commun. 2023). Taken together, these data support our statement that the cleft supports interactions between CXCR4 molecules and the plasma membrane. 

      Moreover, the data of Di Marino et al. also support that CCR5 and CXCR4 have a symmetric and an asymmetric binding mode. Therefore, either dimeric structure has the possibility to form trimers, tetramers, and even oligomers by using the free binding interface to complex with another protomer. This hypothesis suggests that the interaction of dimers to form oligomers should involve residues distinct from those included in the dimeric conformation.

      The sentence has been modified in the revised manuscript to clarify comprehension.

      (5) Discuss the rationale behind targeting the allosteric binding pocket instead of the orthosteric pocket, outlining potential advantages and disadvantages.

      The advantages and disadvantages of using negative allosteric modulators vs orthosteric antagonists have been now included in the revised discussion. 

      The majority of GPCR-targeted drugs function by binding to the orthosteric site of the receptor, and are agonists, partial agonists, antagonists or inverse agonists. These orthosteric compounds can have off-target effects and poor selectivity due to highly homologous receptor orthosteric sites and to abrogation of spatial and/or temporal endogenous signaling patterns. 

      The alternative is to use allosteric modulators, which can tune the functions associated with the receptors without affecting the orthosteric site. They can be positive, negative or neutral modulators, depending on their effect on the functionality of the receptor (Foster D.J. & Conn P.J. Neuron 2017). For example, the use of a negative allosteric modulator of a chemokine receptor to dampen pathological signaling events, while retaining full signaling for non-pathological activities might limit adverse effects (Kohout T.A.et al. J. Biol. Chem. 2004). In this case, the negative allosteric modulator 873140 blocks CCL3 binding on CCR5 but does not alter CCL5 binding (Watson C. et al. Mol. Pharmacol. 2005). In other cases, allosteric modulators can stabilize a particular receptor conformation and block others. The mechanism of action of the anti-HIV-1, FDAapproved, CCR5 allosteric modulator, maraviroc (Jin J. et al. Sci. Signal. 2018) is attributed to its ability to modulate CCR5 dimer populations and their subsequent subcellular trafficking and localization to the cell membrane (Jin J .et al. Sci. Signal. 2018). Two CCR5 dimeric conformations that are imperative for membrane localization were present in the absence of maraviroc; however, an additional CCR5 dimer conformation was discovered after the addition of maraviroc, and all homodimeric conformations were further stabilized. This finding is consistent with the observation that CCR5 dimers and oligomers inhibit HIV host-cell entry, likely by preventing the HIV-1 co-receptor formation.

      It is well known that GPCRs activate G proteins, but they also recruit additional proteins (e.g., β-arrestins) that induce signaling cascades which, in turn, can direct specific subsets of cellular responses independent of G protein activation (Eichel K. et al. Nature 2018) and are responsible for either therapeutic or adverse effects. Allosteric modulators can thus be used to block these adverse effects without influencing the therapeutic benefits. This was the case in the design of G protein-biased agonists for the kappa opioid receptor, which maintain the desirable antinociceptive and antipruritic effects and eliminate the sedative and dissociative effects in rodent models (Brust T.F. et al. Sci. Signal 2016).

      (6) Provide the PDB ID of the CXCR4 structure used as a template for modeling with SwissModel. Explain the decision to model the structure from the amino acid sequence and suggest an alternative approach, such as utilizing AlphaFold structures and performing classical molecular dynamics with subsequent clustering for the best representative structure.

      The PDB used as a template for modeling CXCR4 was 3ODU. This information was already included in the material and methods section. At the time we performed these analyses, there were several crystallographic structures of CXCR4 in complex with different molecules and peptides deposited at the PDB. None of them included a full construct containing the complete receptor sequence to provide a suitable sample for Xray structure resolution, as the N- and C-terminal ends of CXCR4 are very flexible loops. In addition, the CXCR4 constructs contained T4 lysozyme inserted between helices TMV and TMVI to increase the stability of the protein––a common strategy used to facilitate crystallogenesis of GPCRs (Zou Y. et al. PLoS One 2012). Therefore, we generated a CXCR4 homology model using the SWISS-MODEL server (Waterhouse A. et al. Nucleic Acids Res. 2018). This program reconstructed the loop between TMV and TMVI, a domain particularly important in this study that was not present in any of the crystal structure available in PDB. The model structure was, nonetheless, still incomplete, as it began at P27 and ended at S319 because the terminal ends were not resolved in the crystal structure used as a template. Nevertheless, we considered that these terminal ends were not involved in CXCR4 oligomerization. 

      As Alphafold was not available at the time we initiated this project, we didn’t use it. However, we have now updated our workflow to current methods and predicted the structure of the target using AlphaFold (Jumper J. et al. Nature 2021) and the sequence available under UniProt entry P61073. We prepared the ligands using OpenBabel (O’Boyle N.M. et al., J. Cheminformatics 2011), with a gasteiger charge assignment, and generated 10 conformers for each input ligand using the OpenBabel genetic algorithm. We then prepared the target structure with Openmm, removing all waters and possible heteroatoms, and adding all missing atoms. We next predicted the target binding pockets with fPocket (Le Guilloux V. et al. BMC Bioinformatics 2009), p2rank (Krivak R. & Hoksza, J. Cheminformatics 2018), and AutoDock autosite (Ravindranath P.A. & Sanner M.F. Bioinformatics 2016). We chose only those pockets between TMV and TMVI (see answer to point 3). We merged the results of the three programs into so-called consensus pockets, as two pockets are said to be sufficiently similar if at least 75% of their surfaces are shared (del Hoyo D. et al. J. Chem. Inform. Model. 2023). From the consensus pockets, there was one pocket that was significantly larger than the others and was therefore selected. We then docked the ligand conformers in this pocket using AutoDock GPU (Santos-Martins D. et al. J. Chem. Theory Comput. 2021), LeDock (Liu N & Xu Z., IOP Conf. Ser. Earth Environ. Sci. 2019), and Vina (Eberhardt J. et al. J. Chem. Inf. Model. 2021). The number of dockings varied from 210 to 287 poses. We scored each pose with the Vina score using ODDT (Wójcikowski M. et al. J. Cheminform. 2015). Then, we clustered the different solutions into groups whose maximum RMSD was 1Å. This resulted in 40 clusters, the representative of each cluster was the one with maximum Vina score and confirmed that the selected compounds bound this pocket (Author response image 1). When required, we calculated the binding affinity using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013), in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. This information has now been included in the revised version of the manuscript.

      Author response image 1.

      AGR1.135 docking in CXCR4 using the updated protocol for ligand docking. Cartoon representation colored in gray with TMV and TMVI shown in blue and pink, respectively. AGR1.135 is shown in stick representation with carbons in yellow, oxygens in red and nitrogens in blue.

      (7) Specify the meaning of "minimal interaction energy" and where (if present) the interaction scores are reported in the text.

      We refer to minimal interaction energy, the best docking score, that is, the best score obtained in our docking studies. These data were not included in the previous manuscript due to space restrictions but are now included in the reviewed manuscript.

      (8) You performed docking studies using GLIDE to identify potential binding sites for the small compounds on the CXCR4 protein. The top-scoring binders were then subjected to further refinement using PELE simulations. However, I realize that a detailed description of the specific binding modes of these compounds was not provided in the text. Please make the description of binding poses more detailed

      Firstly, to assess the reliability of this method, a PELE study was carried out for the control molecule IT1t, which is a small drug-like isothiourea derivative that has been crystallized in complex with CXCR4 (PDB code: 3ODU). IT1t is a CXCR4 antagonist that binds to the CXCL12 binding cavity and inhibits HIV-1 infection (Das D. Antimicrob. Agents Chemother. 2015; Dekkers S. et al. J. Med. Chem. 2023). From the best five trajectories, two of them had clearly better binding energies, and corresponded to almost the same predicted pose of the molecule. Although the predicted binding mode was not exactly the same as the one in the crystal structure, the approximation was very good, giving validation to the approach. Although PELE is a suitable technique to find potential binding sites, the predicted poses must be subsequently refined using docking programs.

      Analyzing the best trajectories for the remaining ligands, at least one of the best-scored poses was always located at the orthosteric binding site of CXCR4. Even though these poses showed good binding energies, they were discarded as the in vitro biological experiments indicated that the compounds were unable to block CXCL12 binding or CXCL12-mediated inhibition of cAMP release or CXCR4 internalization. Collectively, these data indicated that the selected compounds did not behave as orthosteric inhibitors of CXCR4. The CXCL12 binding pocket is the biggest cavity in CXCR4, and so PELE may tend to place the molecules near it. However, all the compounds presented other feasible binding sites with a comparable binding energy.

      AGR1.135 and AGR1.137 showed interesting poses between TMV and TMVI with very good binding energy (-51.4 and -37.2 kcal/mol, respectively). This was precisely the region we had previously selected for the in silico screening, as previously described (see response to point 3).

      AGR1.131 showed two poses with low binding energy that were placed between helices TMI and TMVII (-43.6 kcal/mol) and between helices TMV and TMVI (-39.8 kcal/mol). This compound was unable to affect CXCL12-mediated chemotaxis and was therefore used as an internal negative control as it was selected in the in silico screening with the same criteria as the other compounds but failed to alter any CXCL12-mediated functions. PELE studies nonetheless provided different binding sites for each molecule, which had to be further studied using docking to obtain a more accurate binding mode. In agreement with the previous commentary, we repeated the analysis using AlphaFold and the rest of the procedure described (see our response to point 6) and calculated the binding energies for all the compounds using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013). Calculations were performed in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. The results using the first method indicated that AGR1.135 and AGR1.137 showed poses between TMV and TMVI with - 56.4 and -62.4 kcal/mol, respectively and AGR1.131 had a pose between TMI and TMVII with -61.6kcal/mol.  In the second method AGR1.135 and AGR1.137 showed poses between TMV and TMVI with -57.9, and -67.6 kcal/mol, respectively, and AGR1.131 of -62.2 kcal/mol between TMI and TMVII.

      This information is now included in the text.

      (9) (2) Experimental Design:-Justify the choice of treating Jurkat cells with a concentration of 50 μM of the selected compound. Consider exploring different concentrations and provide a rationale for the selected dosage. Additionally, clearly identify the type of small compound used in the initial experiment.

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the Jurkat migration experiments. In all cases, 100 µM nearly completely abrogated cell migration, but in order to reduce the amount of DMSO added to the cells we selected 50 µM for further experiments, as it was the concentration that inhibits 50-75% of ligand-induced cell migration. Regarding the type of small compounds used in the initial experiments, they were compounds included in the library described in reference #24 (Sebastian-Pérez V. et al Med. Biol. Chem. 2017), which contains heterocyclic compounds. We would note that we do not consider AGR1.137 a final compound. We think that there is scope to develop AGR1.137-based second-generation compounds with greater solubility in water, greater specificity or affinity for CXCR4, and to evaluate delivery methods to hopefully increase activity.  

      (10) Avoid reporting details in rounded parentheses within the text; consider relocating such information to the Materials and Methods section or figure captions for improved readability.

      Most of the rounded parentheses within the text have been eliminated in the revised version of the manuscript to improve readability.

      (11) Elaborate on the virtual screening approach using GLIDE software, specifying the targeted site and methodology employed.

      For the virtual screening, we used the Glide module (SP and XP function scoring) included in the Schrödinger software package, utilizing the corresponding 3D target structure and our MBC library (Sebastián-Pérez V et al. J. Chem. Inf. Model. 2017).  The center of the catalytic pocket was selected as the centroid of the grid. In the grid generation, a scaling factor of 1.0 in van der Waals radius scaling and a partial charge cutoff of 0.25 were used. A rescoring of the SP poses of each compound was then performed with the XP scoring function of the Glide. The XP mode in Glide was used in the virtual screening, the ligand sampling was flexible, epik state penalties were added and an energy window of 2.5 kcal/mol was used for ring sampling. In the energy minimization step, the distance-dependent dielectric constant was 4.0 with a maximum number of minimization steps of 100,000. In the clustering, poses were considered as duplicates and discarded if both RMS deviation is less than 0.5 Å and maximum atomic displacement is less than 1.3 Å.

      (12) Provide clarity on the statement that AGR1.131 "theoretically" binds the same motif, explaining the docking procedure used for this determination.

      In the in silico screening, AGR1.131 was one of the 40 selected compounds that showed, according to the PELE analysis (see answer to point 8), a pose with low binding energy (-39.8 kcal/mol) between TMV and TMVI helices, which is the selected area for the screening. It, nonetheless, also showed a best pose placed between helices TM1 and TM7 (-43.7 kcal/mol) using the initial workflow. In conclusion, although AGR1.131 also faced to the TMV-TMVI, the most favorable pose was in the area between TMI and TMVII. In addition, the compound was included in the biological screening, where it did not affect CXCL12-mediated chemotaxis. We thus decided to use it as an internal negative control, as it has a skeleton very similar to AGR1.135 and AGR1.137 and can interact with the TM domains of CXCR4 without promoting biological effects. This statement has been clarified in the revised text.

      (13) Toxicity Testing:

      -Enhance the explanation of the approach to testing the toxicity of the compound in Jurkat cells. Consider incorporating positive controls to strengthen the assessment and clarify the experimental design.

      All the selected compounds in the in silico screening were initially tested for propidium iodide incorporation in treated cells in a toxicity assay, and some of them were discarded for further experiments (e.g., AGR1.103 and VSP3.1).

      Further evaluation of Jurkat cell viability was determined by cell cycle analysis using propidium iodide.  Supplementary Fig. 1B included the percentage of each cell cycle phase, and data indicated no significant differences between the treatments tested. Nevertheless, at the suggestion of the reviewer, and to clarify this issue, positive controls inducing Jurkat cell death (staurosporine and hydrogen peroxide) have also been included in the new Supplementary Fig. 2. The new figure also includes a table showing the percentage of cells in each cell-cycle phase.  

      (14) In the Results section concerning "AGR1.135 and AGR1.137 blocking CXCL12-mediated CXCR4 nanoclustering and dynamics", several points can be improved to enhance clarity and coherence: 1. Specificity of Low Molecular Weight Compounds:  

      -Clearly articulate how AGR1.135 and AGR1.137 specifically target homodimeric CXCR4 and provide an explanation for their lack of impact on heterodimeric CXCR4-CCR5 in that region.

      First of all, we should clarify that when we talk about receptor nanoclustering, oligomers refer to complexes including 3 or more receptors and, therefore, the residues involved in these interactions can differ from those involved in receptor dimerization. Moreover, our FRET experiments did not indicate that the compounds alter receptor dimerization (see new Supplementary Fig. 7). Of note, mutant receptors unable to oligomerize can still form dimers (Martínez-Muñoz L. et al. Mol. Cell 2018; García-Cuesta E.M .et al. Proc. Natl. Acad. Sci. USA 2022). Additionally, we believe that these oligomers can also include other chemokine receptors/proteins expressed at the cell membrane, which we are currently studying using different models and techniques.

      We have results supporting the existence of CCR5/CXCR4 heterodimers (Martínez-Muñoz L et al. Proc. Natl. Acad. Sci. USA 2014), in line with the data published by Di Marino et al. However, in the current study we have not evaluated the impact of the selected compounds on other CXCR4 complexes distinct from CXCR4 oligomers. Our Jurkat cells do not express CCR5 and, therefore, we cannot discuss whether AGR1.137 affects CCR5/CXCR4 heterodimers. The chemokine field is very complex and most receptors can form dimers (homo- and heterodimers) as well as oligomers (Martinez-Muñoz L., et al Pharmacol & Therap. 2011) when co-expressed. To evaluate different receptor combinations in the same experiment is a complex task, as the number of potential combinations between distinct expressed receptors makes the analysis very difficult. We started with CXCR4 as a model, to continue later with other possible CXCR4 complexes. In addition, for the analysis of CCR5/CXCR4 dynamics, it is much better to use dual-TIRF techniques, which allow the simultaneous detection of two distinct molecules coupled to different fluorochromes.

      Regarding the data of Di Marino et al., it is possible that the compounds might also affect heterodimeric conformations of CXCR4. This aspect has also been broached in the revised discussion. We would again note that we evaluated CXCR4 oligomers and not monomers or dimers; this is especially relevant when we compare the residues involved in these processes as they might differ depending on the receptor conformation considered. This issue was also hypothesized by Di Marino et al. (see our response to point 4).

      (15) When referring to "unstimulated" cells, provide a more detailed explanation to elucidate the experimental conditions and cellular state under consideration.

      Unstimulated cells refer to the cells in basal conditions, that is, cells in the absence of CXCL12. For TIRF-M experiments, transiently-transfected Jurkat cells were plated on glass-bottomed microwell dishes coated with fibronectin; these are the unstimulated cells. To observe the effect of the ligand, dishes were coated as above plus CXCL12 (stimulated cells). We have clarified this point in the material and methods section of the revised version.

      (16) 2. Paragraph Organization

      -Reorganize the second paragraph to eliminate redundancy and improve overall flow. A more concise and fluid presentation will facilitate reader comprehension and engagement.

      The second paragraph has been reorganized to improve overall flow.

      (17) Ensure that each paragraph contributes distinct information, avoiding repetition and redundancy.

      We have carefully revised each paragraph of the manuscript to avoid redundancy.

      (18) 3. Claim of Allosteric Antagonism:

      -Exercise caution when asserting that "AGR1.135 and AGR1.137 behave as allosteric antagonists of CXCR4" based on the presented results. Consider rephrasing to reflect that the observed effects suggest the potential allosteric nature of these compounds, acknowledging the need for further investigations and evidence.

      To avoid misinterpretations on the effect of the compounds on CXCR4, as we have commented in our response to point 2, we have substituted the term allosteric inhibitors with negative allosteric modulators, which refer to molecules that act by binding a site distinct from the orthosteric site, and selectively block some downstream signaling pathways, whereas others induced by the same endogenous or orthosteric agonist are unaffected (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). Our data indicate that the selected small compounds do not block ligand binding or G protein activation or receptor internalization, but inhibit receptor oligomerization and ligand-mediated directed cell migration.

      (19) In the Results section discussing the "incomplete abolition of CXCR4-mediated responses in Jurkat cells by AGR1.135 and AGR1.137", several points can be refined for better clarity and completeness:  1. Inclusion of Positive Controls: 

      -Consider incorporating positive controls in relevant experiments to provide a comparative benchmark for assessing the impact of AGR1.135 and AGR1.137. This addition will strengthen the interpretation of results and enhance the experimental rigor. 

      The in vivo experiments (Fig. 7E,F) used AMD3100, an orthosteric antagonist of CXCR4, as a positive control. We also included AMD3100, as a positive control of inhibition when evaluating the effect of the compounds on CXCL12 binding (Fig. 3, new Supplementary Fig. 3). The revised version of the manuscript also includes the effect of this inhibitor on other relevant CXCL12-mediated responses such as cell migration (Fig. 1B), receptor internalization (Fig. 3A), cAMP production (Fig. 3C), ERK1/2 and AKT phosphorylation (Supplementary Fig. 4), actin polymerization (Fig. 4A), cell polarization (Fig. 4B, C) and cell adhesion (Fig. 4D), to facilitate the interpretation of the results and improve the experimental rigor.

      (20) 2. Clarification of Terminology: 

      -Clarify the term "CXCR4 internalizes" by providing context, perhaps explaining the process of receptor internalization and its relevance to the study.

      We refer to CXCR4 internalization as a CXCL12-mediated endocytosis process that results in reduction of CXCR4 levels on the cell surface. We use CXCR4 internalization in this study with two purposes: First, for CXCR4 and other chemokine receptors, internalization processes are mediated by ligand-induced clathrin vesicles (Venkatesan et al 2003) a process that triggers CXCR4 aggregation in these vesicles. We have previously determined that the oligomers of receptors detected by TIRF-M remain unaltered in cells treated with inhibitors of clathrin vesicle formation and of internalization processes (Martinez-Muñoz L. et al. Mol. Cell 2018). Moreover, we have described a mutant CXCR4 that cannot form oligomers but internalizes normally in response to CXCL12 (Martinez-Muñoz L. et al. Mol. Cell 2018). The observation in this manuscript of normal CXCL12-mediated endocytosis in the presence of the negative allosteric inhibitors of CXCR4 that abrogate receptor oligomerization reinforces the idea that the oligomers detected by TIRF are not related to receptor aggregates involved in endocytosis; Second, receptor internalization is not affected by the allosteric compounds, indicating that they downregulate some CXCL12-mediated signaling events but not others (new Fig. 3).

      All these data have been included in the revised discussion of the manuscript.

      (21) Elaborate on the meaning of "CXCL12 triggers normal CXCR4mut internalization" to enhance reader understanding.

      We have previously described a triple-mutant CXCR4 (K239L/V242A/L246A; CXCR4mut). The mutant residues are located in the N-terminal region of TMVI, close to the cytoplasmic region, thus limiting the CXCR4 pocket described in this study (see our response to point 3). This mutant receptor dimerizes but neither oligomerizes in response to CXCL12 nor supports CXCL12-induced directed cell migration, although it can still trigger some Ca2+ flux and is internalized after ligand activation (Martinez-Muñoz L. et al. Mol. Cell 2018).  We use the behavior of this mutant (CXCR4mut) to show that the CXCR4 oligomers and the complexes involved in internalization processes are not the same and to explain why we evaluated CXCR4 endocytosis in the presence of the negative allosteric modulators.

      As we indicated in a previous answer to the reviewer, these issues have been re-elaborated in the revised version.

      (22) 3. Discrepancy in CXCL12 Concentration:

      -Address the apparent discrepancy between the text stating, "...were stimulated with CXCL12 (50 nM, 37{degree sign}C)," and the figure caption (Fig. 3A) reporting a concentration of 12.5 nM. Rectify this inconsistency and provide an accurate and clear explanation.

      We apologize for this error, which is now corrected in the revised manuscript. With the exception of the cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in the remaining experiments the optimal concentration of CXCL12 employed was 50 nM. These concentrations were optimized in previous works of our laboratory using the same type of experiment. We should also remark that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration of the ligand that is retained in the surface of the plates after the washing steps performed prior to adding the cells. In addition, we use 100 nM CXCL12 to create the gradient in the chambers used to perform the directed-cell migration experiments.

      (23) 4. Speculation on CXCL12 Binding:

      -Refrain from making speculative statements, such as "These data suggest that none of the antagonists alters CXCL12 binding to CXCR4," unless there is concrete evidence presented up to that point. Clearly outline the results that support this conclusion.

      Figure 3B and Supplementary Figure 3 show CXCL12-ATTO700 binding by flow cytometry in cells pretreated with the negative allosteric modulators. We have also included AMD3100, the orthosteric antagonist, as a control for inhibition. While these experiments showed no major effect of the compounds on CXCL12 binding, we cannot discard small changes in the affinity of the interaction between CXCL12 and CXCR4. In consequence we have re-written these statements.

      (24) 5. Corroboration of Data:

      -Specify where the corroborating data from immunostaining and confocal analysis are reported, ensuring readers can access the relevant information to support the conclusions drawn in this section.

      In agreement with the suggestion of the reviewer, the revised manuscript includes data from immunostaining and confocal analysis to complement Fig. 4B (new Fig. 4C). The revised version also includes some representative videos for the TIRF experiments showed in Figure 2 to clarify readability.

      (25) In the Results section concerning "AGR1.135 and AGR1.137 antagonists and their direct binding to CXCR4", several aspects need clarification and refinement for a more comprehensive and understandable presentation: 1. Workflow Clarification:

      -Clearly articulate the workflow used for assessing the binding of AGR1.135 and AGR1.137 to CXCR4. Address the apparent contradiction between the inability to detect a direct interaction and the utilization of Glide for docking in the TMV-TMVI cleft.

      To address the direct interaction of the compounds with CXCR4, we intentionally avoided the modification of the small compounds with different labels, which could affect their properties. We therefore attempted a fluorescence a spectroscopy strategy to formally prove the ability of the small compounds to bind CXCR4, but this failed because the AGR1.135 is yellow in color, which interfered with the determinations. We also tried a FRET strategy (see new Supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers when AGR1.135 was evaluated, but again the yellow color interfered with FRET determinations. Moreover, AGR1.137 did not modify FRET efficiency of CXCR4 dimers. Therefore, we were unable to detect the interaction of the compounds with CXCR4.

      We elected to develop an indirect strategy; in silico, we evaluated the binding-site using docking and molecular dynamics to predict the most promising CXCR4 binding residues involved in the interaction with the selected compounds. Next, we generated point mutant receptors of the predicted residues and re-evaluated the behavior of the allosteric antagonists in a CXCL12-induced cell migration experiment. Obviously, we first discarded those CXCR4 mutants that were not expressed on the cell membrane as well as those that were not functional when activated with CXCL12. Using this strategy, we eliminated the interference due to the physical properties of the compounds and demonstrated that if the antagonism of a compound is reversed in a particular CXCR4 mutant it is because the mutated residue participates or interferes with the interaction between CXCR4 and the compound, thus assuming (albeit indirectly) that the compound binds CXCR4. 

      To select the specific mutations included in the analysis, our strategy was to generate point mutations in residues present in the TMV-TMVI pocket of CXCR4 that were not directly proposed as critical residues involved in chemokine engagement, signal initiation, signal propagation, or G protein-binding, based on the extensive mutational study published by Wescott MP et. al. (Wescott M.P. et. al. Proc. Natl. Acad. Sci. U S A. 2016).

      (26) Provide a cohesive explanation of the transition from docking evaluation to MD analysis, ensuring a transparent representation of the methodology.

      Based on the aim of this work, the workflow shown in Author response image 2, was proposed to predict the binding mode of the selected molecules. Firstly, a CXCR4 model was generated to reconstruct some unresolved parts of the protein structure; then a binding site search using PELE software was performed to identify the most promising binding sites; subsequently, docking studies were performed to refine the binding mode of the molecules; and finally, molecular dynamics simulations were run to determine the most stable poses and predict the residues that we should mutate to test that the compounds interact with CXCR4. 

      Author response image 2.

      Workflow followed to determine the binding mode of the  studied compounds.

      (27) 2. Choice of Software and Techniques:

      -Justify the use of "AMBER14" and the PELE approach, considering  their potential obsolescence.

      These experiments were performed five years ago when the project was initiated. As the reviewer indicates, AMBER14 and PELE approaches might perhaps be considered obsolescent. Thus, we have predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and the sequence available under UniProt entry P61073. The complete analysis performed (see our response to point 4) confirmed that the compounds bound the selected pocket, as we had originally determined using PELE. These new analyses have been incorporated into the revised manuscript.

      (28)-Discuss the role of the membrane in the receptor-ligand interac7on. Elaborate on how the lipidic double layer may influence the binding of small compounds to GPCRs embedded in the membrane.

      Biological membranes are vital components of living organisms, providing a diffusion barrier that separates cells from the extracellular environment, and compartmentalizing specialized organelles within the cell. In order to maintain the diffusion barrier and to keep it electrochemically sealed, a close interaction of membrane proteins with the lipid bilayer is necessary. It is well known that this is important, as many membrane proteins undergo conformational changes that affect their transmembrane regions and that may regulate their activity, as seen with GPCRs (Daemen F.J. & Bonting S.L., Biophys. Struct. Mech. 1977; Gether U. et al. EMBO J. 1997). The lateral and rotational mobility of membrane lipids supports the sealing function while allowing for the structural rearrangement of membrane proteins, as they can adhere to the surface of integral membrane proteins and flexibly adjust to a changing microenvironment. In the case of the first atomistic structure of CXCR4 (Wu B. et al. Science 2010), it was indicated that for dimers, monomers interact only at the extracellular side of helices V and VI, leaving at least a 4-Å gap between the intracellular regions, which is presumably filled by lipids. In particular, they indicated that the channel between TMV and TMVI that connects the orthosteric chemokine binding pocket to the lipid bilayer is occupied by an oleic acid molecule. Recently, Di Marino et al., analyzing the dimeric structure of CXCR4, found a cholesterol molecule placed in between the two protomers, where it engages a series of hydrophobic interactions with residues located in the area between TMI and TMVI (Leu132, Val214, Leu216, Leu246, and Phe249). The polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes its binding mode. This finding confirms that cholesterol might play an important role in mediating and stabilizing receptor dimerization, as seen in other GPCRs (Pluhackova, K., et al. PLoS Comput. Biol. 2016). In addition, we have previously observed that, independently of the structural changes on CXCR4 triggered by lipids, the local lipid environment also regulates CXCR4 organization, dynamics and function at the cell membrane and modulates chemokine-triggered directed cell migration. Prolonged treatment of T cells with bacterial sphingomyelinase promoted the complete and sustained breakdown of sphingomyelins and the accumulation of the corresponding ceramides, which altered both membrane fluidity and CXCR4 nanoclustering and dynamics. Under these conditions, CXCR4 retained some CXCL12-mediated signaling activity but failed to promote efficient directed cell migration (Gardeta S.R. et al. Front. Immunol. 2022). Collectively, these data demonstrate the key role that lipids play in the stabilization of CXCR4 conformations and in regulating its lateral mobility, influencing their associated functions. These considerations have been included in the revised version of the manuscript. 

      (29) 3. Stable Trajectories and Binding Mode Superimposi7on -Specify the criteria for defining "stable trajectories" to enhance reader understanding

      There could be several ways to describe the stability of a MD simulation, based on the convergence of energies, distances or ligand-target interactions, among others. In this work, we use the expression “stable trajectories” to refer to simulations in which the ligand trajectory converges and the ligand RMSD does not fluctuate more than 0.25Å. This definition is now included in the revised text.

      (30)  Clarify the meaning behind superimposing the two small compounds and ensure that the statement in the figure caption aligns with the information presented in the main text.

      We apologize for the error in the previous Fig. 5A and in its legend. The figure was created by superimposing the protein component of the poses for the two compounds, AGR1.135 and AGR1.137, rather than the compounds themselves. As panel 5A was confusing, we have modified all Fig. 5 in the revised manuscript to improve clarity.

      (31) 4. Volume Analysis and Distances:

      -Provide details on how the volume analysis was computed and how distances were accounted for. Consider adding a figure to illustrate these analyses, aiding reader comprehension.

      The cleft search and analysis were performed using the default settings of SURFNET (Laskowski R.A. J. Mol. Graph. 1995) included in the PDBsum server (Laskowski R.A. et. al. Trends Biochem. Sci. 1997). The first run of the input model for CXCR4 3ODU identified a promising cleft of 870 Å3 in the lower half of the region flanked by TMV and TMVI, highlighting this area as a possible small molecule binding site (Fig. I, only for review purposes). Analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1381 Å3 that were not connected to the orthosteric site. The same procedure for AGR1.137 revealed two distinct clefts of 790 Å3 and 580 Å3, respectively (Fig. I, only for review purposes). Analysis of the atomic distances between the protein residues and the compounds was performed using the PISA server. Krissinel E. & Henrick K. J. Mol. Biol. 2007). (Please see our response to point 3 and the corresponding figure).

      (32) 5. Mutant Selection and Relevance:

      -Clarify the rationale behind selecting the CXCR4 mutants used in the study. Consider justifying the choice and exploring the possibility of performing an alanine (ALA) scan for a more comprehensive mutational analysis.  

      The selection of the residues to be mutated along the cleft was first based on their presence in the proposed cleft and the direct interaction of the compounds with them, either by hydrogen bonding or by hydrophobic interactions. Secondly, all mutated residues did not belong to any of the critical residues involved in transmitting the signal generated by the interaction of CXCL12 with the receptor. In any case, mutants producing a non-functional CXCR4 at the cell membrane were discarded after FACS analysis and chemotaxis experiments. Finally, the length and nature of the resulting mutations were designed mainly to occlude the cleft in case of the introduction of long residues such as lysines (I204K, L208K) or to alter hydrophobic interactions by changing the carbon side chain composition of the residues in the cleft. Indeed, we agree that the alanine scan mutation analysis would have been an alternative strategy to evaluate the residues involved in the interactions of the compounds. 

      (33) Reevaluate the statement regarding the relevance of the Y256F muta7on for the binding of AGR1.137. If there is a significant impact on migra7on in the mutant (Fig. 6B), elaborate on the significance in the context of AGR1.137 binding.

      In the revised discussion we provide more detail on the relevance of Y256F mutation for the binding of AGR1.137 as well as for the partial effect of G207I and R235L mutations. The predicted interactions for each compound are depicted in new Fig. 6 C, D after LigPlot+ analysis (Laskowski R.A. & Swindells M.B. J. Chem. Inf. Model. 2011), showing that AGR1.135 interacted directly with the receptor through a hydrogen bond with Y256. When this residue was mutated to F, one of the anchor points for the compound was lost, weakening the potential interaction in the region of the upper anchor point.

      It is not clear how the Y256F mutation will affect the binding of AGR1.137, but other potential contacts cannot be ruled out since that portion of the compound is identical in both AGR1.135 and AGR1.137. This is especially true for its neighboring residues in the alpha helix, F249, L208, as shown in 3ODU structure (Fig. 6D), which are shown to be directly implicated in the interaction of both compounds. Alternatively, we cannot discard that Y256 interacts with other TMs or lipids stabilizing the overall structure, which could reverse the effect of the mutant at a later stage (Author response image 3).

      Author response image 3.

      Cartoon representation of Y256 and its intramolecular interactions in the CXCR4 Xray solved structure 3ODU. TMV helix is colored in blue and TMVI in pink.

      (34) Address the apparent discrepancy in residue involvement between AGR1.135 and AGR1.137, particularly if they share the same binding mode in the same clef.

      AGR1.135 and AGR1.137 exhibit comparable yet distinct binding modes, engaging with CXCR4 within a molecular cavity formed by TMV and TMVI. AGR1.135 binds to CXCR4 through three hydrogen bonds, two on the apical side of the compound that interact with residues TMV-G207 and TMVI-Y256 and one on the basal side that interacts with TMVI-R235 (Fig. 5A). This results in a more extended and rigid conformation when sharing hydrogen bonds, with both TMs occupying a surface area of 400 Å2 and a length of 20 Å in the cleft between TMV and TMVI (Supplementary Fig. 8A). AGR1.137 exhibits a distinct binding profile, interacting with a more internal region of the receptor. This interaction involves the formation of a hydrogen bond with TMIIIV124, which induces a conformational shift in the TMVI helix towards an active conformation (Fig. 5B; Supplementary Fig. 13). Moreover, AGR1.137 may utilize the carboxyl group of V124 in TMIII and overlap with AGR1.135 binding in the cavity, interacting with the other 19 residues dispersed between TMV and VI to create an interaction surface of 370 Å2 along 20 Å (Supplementary Fig. 8B). This is illustrated in the new Fig. 5B. AGR1.137 lacks the phenyl ring present in AGR1.135, resulting in a shorter compound with greater difficulty in reaching the lower part of TMVI where R235 sits. 

      Author response image 4.

      AGR1.135 and AGR1.137 interaction with TMV and TMVI.  The model shows the location of the compounds within the TMV-VI cleft, illustrated by a ribbon and stick representation. The CXCR4 segments of TMV and TMVI are represented in blue and pink ribbons respectively, and side chains for some of the residues defining the cavity are shown in sticks. AGR1.135 and AGR1.137 are shown in stick representation with carbon in yellow, nitrogen in blue, oxygen in red, and fluorine in green. Hydrogen bonds are indicated by dashed black lines, while hydrophobic interactions are shown in green. The figure reproduces the panels A, B of Fig. 5 in the revised manuscript.

      (35) In the Results sec7on regarding "AGR1.137 treatment in a zebrafish xenograf model", the following points can be refined for clarity and completeness: 1. Cell Line Choice for Zebrafish Xenograft Model:

      -Explain the rationale behind the choice of HeLa cells for the zebrafish xenograft model when the previous experiments primarily focused on Jurkat cells. Address any specific biological or experimental considerations that influenced this decision.

      As far as we know, there are no available models of tumors in zebrafish using Jurkat cells. We looked for a tumoral cell system that expresses CXCR4 and could be transplanted into zebrafish. HeLa cells are derived from a human cervical tumor, express a functional CXCR4, and have been previously used for tumorigenesis analyses in zebrafish (Brown H.K. et al. Expert Opin. Drug Discover. 2017; You Y. et al Front. Pharmacol. 2020). These cells grow in the fish and disseminate through the ventral area and can be used to determine primary tumor growth and metastasis. Nonetheless, we first analyzed in vitro the expression of a functional CXCR4 in these cells (Supplementary Fig. 10A), whether AGR1.137 treatment specifically abrogated CXCL12-mediated direct cell migration (Fig. 7A, B), as whether it affected cell proliferation (Supplementary Fig. 10B). As HeLa cells reproduce the in vitro effects detected for the compounds in Jurkat cells, we used this model in zebrafish. These issues were already discussed in the first version of our manuscript. 

      (36) 2. Toxicity Assessment in Zebrafish Embryos: 

      -Clarify the basis for stating that AGR1.137 is not toxic to zebrafish embryos. Consider referencing the Zebrafish Embryo Acute Toxicity Test (ZFET) and provide relevant data on lethal concentration (LC50) and non-lethal toxic phenotypes such as pericardial edema, head and tail necrosis, malformation, brain hemorrhage, or yolk sac edema.

      Tumor growth and metastasis kinetics within the zebrafish model have been extensively evaluated in many publications (White R. et al. Nat. Rev. Cancer. 2013; Astell K.R. and Sieger D. Cold Spring Harb. Perspect. Med. 2020; Chen X. et al. Front. Cell Dev. Biol. 2021; Weiss JM. Et al. eLife 2022; Lindhal G. et al NPJ Precis. Oncol. 2024). Our previous experience using this model shows that tumors start having a more pronounced proliferation and lower degree of apoptosis from day 4 onwards, but we cannot keep the tumor-baring larvae for that long due to ethical reasons and also because we don’t see much scientific benefit of unnecessarily extending the experiments. Anti-proliferative or pro-apoptotic effects of drugs can still be observed within the three days, even if this is then commonly seen as larger reduction (instead of a smaller growth as it is commonly seen in for example mouse tumor models) compared to controls. Initially we characterized the evolution of implanted tumors in our system and how much they metastasize over time in the absence of treatment before to test the compounds (Author response image 5).

      The in vivo experiments were planned to validate efficacious concentrations of the investigated drugs rather than to derive in vivo IC50 or other values, which require testing of multiple doses. We have, however, included an additional concentration to show concentration-dependence and therefore on-target specificity of the drugs in the revised version of the manuscript (data also being elaborated in ongoing experiments). At this stage, we believe that adding the LC50 does not provide interesting new knowledge, and it is standard to only show results from the experimental endpoint (in our case 3 days post implantation). We agree that showing these new data points strengthens the manuscript and facilitates independent evaluation and conclusions to be drawn from the presented data. We have created new graphs where datapoints for each compound dose are shown.  

      Author response image 5.

      Evolution of the tumors and metastasis along the time in the absence of any treatment. HeLa cells were labeled with 8 µg/mL Fast-DiI™ oil and then implanted in the dorsal perivitelline space of 2-days old zebrafish embryos. Tumors were imaged within 2 hours of implantation and re-imaged each 24 h for three days. Changes in tumor size was evaluated as tumor area at day 1, 2 and 3 divided by tumor area at day 0, and metastasis was evaluated as the number of cells disseminated to the caudal hematopoietic plexus at day 1, 2 and 3 divided by the number of cells at day  3.

      Regarding the statement that AGR1.137 was not toxic, this was based on visual inspection of the zebrafish larvae at the end of the experiment, which also revealed a lack of drug-related mortality in these experiments. There are a number of differences in how our experiment was run compared with the standardized ZFET. ZFET evaluates toxicity from 0 hours post-fertilization to 1 or 2 days post-fertilization, whereas here we exposed zebrafish from 2 days post-fertilization to 5 days post-fertilization. The ZFET furthermore requires that the embryos are raised at 26ºC whereas kept the temperature as close as possible to a physiologically relevant temperature for the tumor cells (36ºC). In the ZFET, embryos are incubated in 96-well plates whereas for our studies we required larger wells to be able to manipulate the larvae and avoid well edge-related imaging artefacts, and we therefore used 24-well plates. As such, the ZFET was for various reasons not applicable to our experimental settings. As we were not interested in rigorously determining the LD50 or other toxicity-related measurements, as our focus was instead on efficacy and we found that the targeted dose was tolerated, we did not evaluate multiple doses, including lethal doses of the drug, and are therefore not able to determine an LD50/LC50. We also did not find drug-induced non-lethal toxic phenotypes in this study, and so we cannot elaborate further on such phenotypes other than to simply state that the drug is well tolerated at the given doses. Therefore, the reference to ZFET in the manuscript was eliminated.

      (37) If supplementary information is available, consider providing it for a comprehensive understanding of toxicity assessments. 

      The effective concentration used in the zebrafish study was derived from the in vitro experiments. That being said, and as elaborated in our response to comment 36, we have added data for one additional dose to show the dose-dependent regulation of tumor growth and metastasis. 

      (38) 3. Optimization and Development of AGR1.137: 

      -Justify the need for further optimization and development of AGR1.137 if it has a comparable effect to AMD3100. Explain the specific advantages or improvements that AGR1.137 may offer over AMD3100. 

      AGR1.137 is highly hydrophobic and is very difficult to handle, particularly in in vivo assays; thus, for the negative allosteric modulators to be used clinically, it would be very important to increase their solubility in water. Contrastingly, AMD3100 is a water-soluble compound. Before using the zebrafish model, we performed several experiments in mice using AGR1.137, but the inhibitory results were highly variable, probably due to its hydrophobicity. We also believe that it would be important to increase the affinity of AGR1.137 for CXCR4, as the use of lower concentrations of the negative allosteric modulator would limit potential in vivo side effects of the drug. On the other hand, we are also evaluating distinct administration alternatives, including encapsulation of the compounds in different vehicles. These alternatives may also require modifications of the compounds. 

      AMD3100 is an orthosteric inhibitor and therefore blocks all the signaling cascades triggered by CXCL12. For instance, we observed that AMD3100 treatment blocked CXCL12 binding, cAMP inhibition, calcium flux, cell adhesion and cell migration (Fig. 3, Fig. 4), whereas the effects of AGR1.137 were restricted to CXCL12-mediated directed cell migration. Although AMD3100 was well tolerated by healthy volunteers in a singledose study, it also promoted some mild and reversible events, including white blood cells count elevations and variations of urine calcium just beyond the reported normal range (Hendrix C.W. et al. Antimicrob. Agents Chemother. 2000). To treat viral infections, continuous daily dosing requirements of AMD3100 were impractical due to severe side effects including cardiac arrhythmias (De Clercq E. Front Immunol. 2015). For AMD3100 to be used clinically, it would be critical to control the timing of administration. In addition, side effects after long-term administration have potential problems. Shorter-term usage and lower doses would be fundamental keys to its success in clinical use (Liu T.Y. et al. Exp. Hematol. Oncol. 2016). The use of a negative allosteric modulator that block cell migration but do not affect other signaling pathways triggered by CXCL12 would be, at least in theory, more specific and produce less side effects. These ideas have been incorporated into the revised discussion to reflect potential advantages or improvements that AGR1.137 may offer over AMD3100.

      (39) 4. Discrepancy in AGR1.137 and AMD3100 Effects:

      -Discuss the observed discrepancy where AGR1.137 exhibits similar effects to AMD3100 but only after 48 hours. Provide insights into the temporal dynamics of their actions and potential implications for the experimental design.

      Images and data shown in Fig. 7E, F correspond to days 0 and 3 after HeLa cell implantation (tumorigenesis) and only to day 3 in the case of metastasis data. The revised version contains the effect of two distinct doses of the compounds (10 and 50 µM, for AGR1.135 and AGR1.137 and 1 and 10 µM for AMD3100). 

      (40) In the "Discussion" section, there are several points that require clarifica7on and refinement to enhance the overall coherence and depth of the analysis:  1. Reduction of Side-Effects: 

      -Provide a more detailed explanation of how the identified compounds, specifically AGR1.135 and AGR1.137, contribute to the reduction of side effects. Consider discussing specific mechanisms or characteristics that differentiate these compounds from existing antagonists.

      The sentence indicating that AGR1.135 and AGR1.137 contribute to reduce side effects is entirely speculative, as we have no experimental evidence to support it. We have therefore corrected this in the revised version. The origin of the sentence was that orthosteric antagonists typically bind to the same site as the endogenous ligand, thus blocking its interaction with the receptor. Therefore, orthosteric inhibitors (i.e. AMD3100) block all signaling cascades triggered by the ligand and therefore their functional consequences. However, the compounds described in this project are essentially negative allosteric modulators, that is, they bind to a site distinct from the orthosteric site, inducing a conformational change in the receptor that does not alter the binding of the endogenous ligand, and therefore block some specific receptor-associated functions without altering others. We observed that AGR1.137 blocked receptor oligomerization and directed cell migration whereas CXCL12 still bound CXCR4, triggered calcium mobilization, did not inhibit cAMP release or promoted receptor internalization. This is why we speculated on the limitation of side effects. The statements have been nonetheless revised in the new version of the manuscript.

      (41) 2. Binding Site Clarification:

      -Address the apparent discrepancy between docking the small compounds in a narrow cleft formed by TMV and TMVI helices and the statement that AGR1.131 binds elsewhere. Clarify the rationale behind this assertion

      After the in silico screening, a total of 40 compounds were selected.  These compounds showed distinct degrees of interaction with the cleft formed by TMV and TMVI and even with other potential interaction sites on CXCR4, with the exception of the ligand binding site according to the data described by Wescott et al. (PNAS 2016 113:9928-9933), as this possibility was discarded in the initial approach of the in silico screening. According to PELE analysis, AGR1.131 was one of the 40 selected compounds that showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 through the selected area for the screening. It nonetheless also showed a best pose placed between helices TMI and TMVII, -43.7 kcal/mol. In any case, the compound was included in the biological screening, where it was unable to impact CXCL12-mediated chemotaxis (Fig. 1B). We then focused on AGR1.135 and AGR1.137, as showed a higher inhibitory effect on CXCL12-mediated migration, and on AGR1.131 as an internal negative control. AGR1.131 has a skeleton very similar to the other compounds (Fig. 1C) and can interact with the TM domains of CXCR4 without promoting effects. None of the three compounds affected CXCL12 binding, or CXCL12mediated inhibition of cAMP release, or receptor internalization. However, whereas AGR1.135 and AGR1.137, blocked CXCL12-mediated CXCR4 oligomerization and directed cell migration towards CXCL12 gradients, AGR1.131 had no effect in these experiments (Fig. 3, Fig.  4). 

      Next, we performed additional theoretical calculations (PELE, docking, MD) to inspect in detail the potential binding modes of active and inactive molecules. Based on these additional calculations, we identified that whereas AGR1.135 and AGR1.137 showed preferent binding on the molecular pocket between TMV and TMVI, the best pose for AGR1.131 was located between TMI and TMVII, as the initial experiments indicated.  These observations and data have been clarified in the revised discussion. 

      (42) 3. Impact of Chemical Modifications:

      -Discuss the consequences of the distinct chemical groups in AGR1.135, AGR1.137, and AGR1.131, specifically addressing how variations in amine length and chemical nature may influence binding affinity and biological activity. Provide insights into the potential effects of these modifications on cellular responses and the observed outcomes in zebrafish. 

      The main difference between AGR1.131 and the other two compounds is the higher flexibility of AGR1.131 due to the additional CH2 linker, together with the lack of a piperazine ring. The additional CH2 linking the phenyl ring increases the flexibility of AGR1.131 when compared with AGR1.135 and AGR1.137, and the absence of the piperazine ring might be responsible for its lack of activity, as it makes this compound able to bind to CXCR4 (Fig. 1C).

      AGR1.137 was chosen in a second round. The additional presence of the tertiary amine (in the piperazine ring) allows the formation of quaternary ammonium salts in the aqueous medium and its substituents to increase its solubility (Fig 1C). This characteristic might be related to the absence of toxic effects of the compound in the zebrafish model.

      (43) 4. Existence of Distinct CXCR4 Conformational States: 

      -Provide more detailed support for the statement suggesting the "existence of distinct CXCR4 conformational states" responsible for activating different signaling pathways. Consider referencing relevant studies or experiments that support this claim.

      Classical models of GPCR allostery and activation, which describe an equilibrium between a single inactive and a single signaling-competent active conformation, cannot account for the complex pharmacology of these receptors. The emerging view is that GPCRs are highly dynamic proteins, and ligands with varying pharmacological properties differentially modulate the balance between multiple conformations.

      Just as a single photograph from one angle cannot capture all aspects of an object in movement, no one biophysical method can visualize all aspects of GPCR activation. In general, there is a tradeoff between high-resolution information on the entire protein versus dynamic information on limited regions. In the former category, crystal and cryo-electron microscopy (cryoEM) structures have provided comprehensive, atomic-resolution snapshots of scores of GPCRs both in inactive and active conformations, revealing conserved conformational changes associated with activation. However, different GPCRs vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Spectroscopic and computational approaches provide complementary information, highlighting the role of conformational dynamics in GPCR activation (Latorraca N.R.V. et al. Chem. Rev 2017). In the absence of agonists, the receptor population is typically dominated by conformations closely related to those observed in inactive-state crystal structures (Manglik A. et al. Cell 2015). While agonist binding drives the receptor population towards conformations similar to those in activestate structures, a mixture of inactive and active conformations remains, reflecting “loose” or incomplete allosteric coupling between the orthosteric and transducer pockets (Dror R.O. et al. Proc. Natl. Acad. Sci. USA 2011). Surprisingly, for some GPCRs, and under some experimental conditions, a substantial fraction of unliganded receptors already reside in an active-like conformation, which may be related to their level of basal or constitutive signaling (Staus D.P. et al. J. Biol. Chem. 2019);  Ye L. et al. Nature 2016).  In our case, the negative allosteric modulators, (Staus DP, et al. J. Biol. Chem 2019); Ye L. et al. Nature 2016) did not alter ligand binding and had only minor effects on specific CXCL12-mediated functions such as inhibition of cAMP release or receptor internalization, among others, but failed to regulate CXCL12-mediated actin dynamics and receptor oligomerization. Collectively, these data suggest that the described compounds alter the active conformation of CXCR4 and therefore support the presence of distinct receptor conformations that explain a partial activation of the signaling cascade.

      All these observations are now included in the revised discussion of the manuscript.

      (44) 5. Equilibrium Shift and Allosteric Ligands: 

      -Clarify the statement about "allosteric ligands shifting the equilibrium to favor a particular receptor conformation". Support this suggestion with references or experimental evidence

      In a previous answer (see our response to point 2), we explain why we define the compounds as negative allosteric modulators. These compounds do not bind the orthosteric binding site or a site distinct from the orthosteric site that alters the ligand-binding site. Their effect should be due to changes in the active conformation of CXCR4, which allow some signaling events whereas others are blocked. Our functional data thus support that through the same receptor the compounds separate distinct receptor-mediated signaling cascades, that is, our data suggest that CXCR4 has a conformational heterogeneity. It is known that GPCRs exhibit more than one “inactive” and “active” conformation, and the endogenous agonists stabilize a mixture of multiple conformations. Biased ligands or allosteric modulators can achieve their distinctive signaling profiles by modulating this distribution of receptor conformations. (Wingler L.M. & Lefkowitz R.J. Trends Cell Biol. 2020). For instance, some analogs of angiotensin II do not appreciably activate Gq signaling (e.g., increases in IP3 and Ca2+) but still induce receptor phosphorylation, internalization, and mitogen-activated protein kinase (MAPK) signaling (Wei H, et al. Proc. Natl. Acad. Sci. USA 2003). Some of these ligands activate Gi and G12 in bioluminescence resonance energy transfer (BRET) experiments (Namkung Y. et al. Sci. Signal. 2018). A similar observation was described in the case of CCR5, where some chemokine analogs promoted G protein subtype-specific signaling bias (Lorenzen E. et al. Sci. Signal 2018). Structural analysis of distinct GPCRs in the presence of different ligands vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Yet, these changes modify conserved motifs in the interior of the receptor core and induce common conformational changes in the intracellular site involved in signal transduction. That is, these modifications might be considered distinct receptor conformations. 

      The revised discussion contains some of these interpretations to support our statement about the stabilization of a particular receptor conformation triggered by the negative allosteric modulators. 

      (45) 6. Refinement of Binding Mode: 

      -Clarify the workflow for obtaining the binding mode, particularly the role of GLIDE and PELE. Clearly explain how these software tools were used in tandem to refine the binding mode. 

      The computational sequential workflow applied in this project included, i) Protein model construction, ii) Virtual screening (Glide), iii) PELE, iv) Docking (AutoDock and Glide) and v) Molecular Dynamics (AMBER).

      Glide was applied for the structure-based virtual screening to explore which compounds could fit and interact with the previously selected binding site.

      After the identification of theoretically active compounds (modulators of CXCR4), additional calculations were done to identify a potential binding site. PELE was used in this sense, to study how the compounds could bind in the whole surface of the target (TMV-TMVI). By applying PELE, we avoided biasing the calculation, and we found that the trajectories with better interaction energies identified the cleft between TMV and TMVI as the binding site for AGR1.135 and AGR1.137, and not for AGR1.131. AGR1.131 showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 in the selected area for the screening. But it also showed a better pose placed between helices TMI and TMVII, - 43.7 kcal/mol (see our response to point 41). These data have been now confirmed using Schrodinger’s MM-GBSA procedure (see our response to points 6 and 8). In any case, the compound was included in the biological screening, where it was unable to affect CXCL12-mediated chemotaxis (Fig. 1B). Docking and MD simulations were then performed to study and refine the specific binding mode in this cavity. These data were important to choose the mutations on CXCR4 required, to test whether the compounds reversed its behavior. In these experiments we also confirmed that AGR1.131 had a better pose on the TMI-TMVII region. 

      (46) 7. Impact of Compound Differences on CXCR4-F249L mutant: 

      -Provide visual aids, such as figures, and additional experiments to support the statement about differences in the behavior of AGR1.135 and AGR1.137 on cells expressing CXCR4-F249L mutant. Elaborate on the closer interaction suggested between the triazole group of AGR1.137 and the F249 residue

      At the reviewer’s suggestion, Fig. 5 has been modified to incorporate a closer view of the interactions identified and new panels in new Fig. 6 have been added to show in detail the effect of the mutations selected on the structure of the cleft between TMV and TMVI. The main difference between AGR1.135 and AGR1.137 is how the triazole group interacts with F249 and L216 (Author response image 6). In AGR1.137, the three groups are aligned in a parallel organization, which appears to be more effective: This might be due to a better adaptation of this compound to the cleft since there is only one hydrogen bond with V124. In AGR1.135, the compound interacts with the phenyl ring of F249 and has a stronger interaction at the apical edge to stabilize its position in the cleft. However, there is still an additional interaction present. When changing F249

      Author response image 6.

      Cartoon representation of the interaction of CXCR4 F249L mutant with AGR1.135 (A) and AGR1.137 (B). The two most probable conformations of Leucine rotamers are represented in cyan A and B conformations. Van der Waals interactions are depicted in blue cyan dashed lines, hydrogen bonds in black dashed lines. CXCR4 segments of TMV and TMVI are colored in blue and pink, respectively

      to L (Fig. VIIA, B, only for review purposes) and showing the two most likely rotamers resulting from the mutation, it is observed that rotamer B is in close proximity to the compound, which may cause the binding to either displace or adopt an alternative conformation that is easier to bind into the cleft. As previously mentioned, it is likely that AGR1.135 can displace the mutant rotamer and bind into the cleft more easily due to its higher affinity.

      (47) In the "Materials and Methods" section, the computational approach for the "discovery of CXCR4 modulators" requires significant revision and clarification. The following suggestions aim to address the identified issues: 1. Structural Modeling: 

      -Reconsider the use of SWISS-MODEL if there is an available PDB code for the entire CXCR4 structure. Clearly articulate the rationale for choosing one method over the other and explain any limitations associated with the selected approach. 

      The SWISS-model server allows for automated comparative modeling of 3D protein structures that was pioneered in the fields of automated modeling. At the time we started this project. it was the most accurate method to generate reliable 3D protein structure models.

      As explained above, we have now predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and performed several additional experiments that confirm that the small compounds bind the selected pocket as the original strategy indicated (see our response to point 6). (Fig. II, only for review purposes).

      (48) 2. Parametriza7on of Small Compounds: 

      -Provide a detailed description of the parametrization process for the small compounds used in the study. Specify the force field and parameters employed, considering the obsolescence of AMBER14 and ff14SB. Consider adopting more contemporary force fields and parameterization strategies. 

      When we performed these experiments, some years ago, the force fields applied (ff14SB, AMBER14 used in MD or OPLS2004 in docking with Glide) were well accepted and were gold standards. It is, however, true that the force fields have evolved in the past few years, Moreover, in the case of the MD simulations, to consider the parameters of the ligands that are not contained within the force field, we performed an additional parameterization as a standard methodology. We then generated an Ab initio optimization of the ligand geometry, defining as basis sets B3LYP 6-311+g(d), using Gaussian 09, Revision A.02, and then a single point energy calculation of ESP charges, with HF 6311+g(d) on the optimized structure. As the last step of the parametrization, the antechamber module was used to adapt these charges and additional parameters for MD simulations.

      (49) 3. Treatment of Lipids and Membrane: 

      -Elaborate on how lipids were treated in the system. Clearly describe whether a membrane was included in the simulations and provide details on its composition and structure. Address the role of the membrane in the study and its relevance to the interactions between CXCR4 and small compounds 

      To stabilize CXCR4 and more accurately reproduce the real environment in the MD simulation, the system was embedded in a lipid bilayer using the Membrane Builder tool (Sunhwan J. et al. Biophys. J. 2009) from the CHARMM-GUI server. The membrane was composed of 175 molecules of the fatty acid 1-palmitoyl-2-oleoyl-sn-glycero-3phosphocholine (POPC) in each leaflet. The protein-membrane complex was solvated with TIP3 water molecules. Chloride ions were added up to a concentration of 0.15 M in water, and sodium ions were added to neutralize the system. This information was previously described in detail.

      (50) 4. Molecular Dynamics Protocol: 

      -Provide a more detailed and coherent explanation of the molecular dynamics protocol. Clarify the specific steps, parameters, and conditions used in the simulations. Ensure that the protocol aligns with established best practices in the field.

      Simulations were calculated on an Asus 1151 h170 LVX-GTX-980Ti workstation, with an Intel Core i7-6500 K Processor (12 M Cache, 3.40 GHz) and 16 GB DDR4 2133 MHz RAM, equipped with a Nvidia GeForce GTX 980Ti available for GPU (Graphics Processing Unit) computations. MD simulations were performed using AMBER14 (Case D.A. et al. AMBERT 14, Univ. of California, San Francisco, USA, 2014) with ff14SB (Maier J.A. et al. J. Chem. Theory Comput. 2015) and lipid14 (Dickson C. J. et al. J. Chem. Theory Comput. 2014) force fields in the NPT thermodynamic ensemble (constant pressure and temperature). Minimization was performed using 3500 Steepest Descent steps and 4500 Conjugate Gradient steps three times, firstly considering only hydrogens, next considering only water molecules and ions, and finally minimizing all atoms. Equilibration raises system temperature from 0 to 300 K at a constant volume fixing everything but ions and water molecules. After thermalization, several density equilibration phases were performed. In the production phase, 50 ns MD simulations without position restraints were calculated using a time step of 2 fs. Trajectories of the most interesting poses were extended to 150 ns. All bonds involving hydrogen atoms were constrained with the SHAKE algorithm (Lippert R.A. et al. J. Chem. Phys. 2007). A cutoff of 8 Å was used for the Lennard-Jones interaction and the short-range electrostatic interactions. Berendsen barostat (Berendsen H.J. et al. J. Chem. Phys.  1984) and Langevin thermostat were used to regulate the system pression and temperature, respectively. All trajectories were processed using CPPTRAJ (Roe D.R. & Cheatham III T.E. J. Chem. Theory Comput. 2013) and visualized with VMD (Visual Molecular Dynamics) (Humphrey W. et al. J. Mol. Graphics. 1996). To reduce the complexity of the data, Principal Component Analysis (PCA) was performed on the trajectories using CPPTRAJ.

      (51) Consider updating the molecular dynamics protocol to incorporate more contemporary methodologies, considering advancements in simulation techniques and software.

      In our answer to points 6 and 47, we describe why we use the technology based on Swiss-model and PELE analysis and how we have now used Alphafold and other more contemporary methodologies to confirm that the small compounds bind the selected pocket.

      (52) Figure 1A: 

      •  Consider switching to a cavity representation for CXCL12 to enhance clarity and emphasize the cleft.

      Fig. 1A has been modified to emphasize the cleft.

      (53) Explicitly show the TMV-TMVI cleft in the figure for a more comprehensive visualization. 

      In Fig. 1A we have added an insert to facilitate TMV-TMVI visualization.

      (54) Figure 1B: 

      •  Clearly explain the meaning of the second DMSO barplot to avoid confusion. 

      To clarify this panel, we have modified the figure and the figure legend. Panel B now includes a complete titration of the three compounds analyzed in the manuscript.  The first bar shows cell migration in the absence of both treatment with AMD3100 and stimulation with CXCL12.  The second bar shows migration in response to CXCL12 in the absence of AMD3100. The third bar shows the effect of AMD3100 on CXCL12-induced migration, as a known control of inhibition of migration.  We hope that this new representation of the data results is clearer.

      (55) Figure 1C: 

      •  Provide a clear legend explaining the significance of the green shading on the small compounds. 

      The legend for Fig. 1C has been modified accordingly to the reviewer’s suggestion.

      (56) Figure 2: 

      •  Elaborate on the role of fibronectin in the experiment and explain the specific contribution of CD86-AcGFP.

      The ideal situation for TIRF-M determinations is to employ cells on a physiological substrate complemented with or without chemokines. Fibronectin is a substrate widely used in different studies that allows cell adhesion, mimicking a physiological situation. Jurkat cells express alpha4beta1 and alpha5beta1 integrins that mediate adhesion to fibronectin (Seminario M.C. et al. J. Leuk. Biol. 1999).

      Regarding the use of CD86-AcGFP in TIRF-M experiments. We currently determine the number of receptors in individual trajectories of CXCR4 using, as a reference, the MSI value of CD86-AcGFP that strictly showed a single photobleaching step (Dorsch S. et al. Nat Methods 2009).

      We preferred to use CD86-AcGFP in cells instead of AcGFP on glass, to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. In any case, this issue has been clarified in the revised version.

      (57) Figure 3D: 

      •  Include a plot for the respective band intensity to enhance data presentation 

      The plot showing the band intensity analysis of the experiments shown in Fig. 3D was already included in the original version (see old Supplementary Fig. 3). However, in the revised version, we include these plots in the same figure as panels 3E and 3F.  As a control of inhibition of CXCL12 stimulation, we have also included a new figure (Supplementary Fig. 4) showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (58) Consider adding AMD3100 as a control for comparison. 

      In agreement with the reviewer’s suggestion, we have added the effect of AMD3100 in most of the functional experiments performed.

      (59) Figure 4: 

      •  Address the lack of positive controls in Figure 4 and consider their inclusion for a more comprehensive analysis. 

      DMSO bars correspond to the control of the experiment, as they represent the effect of CXCL12 in the absence of any allosteric modulator. As previously described in this point-by-point reply, DMSO bars correspond to the control performed with the solvent with which the small compounds, at maximum concentration, are diluted.  Therefore, they show the effect of the solvent on CXCL12 responses. In any case, and in order to facilitate the comprehension of the figure we have also added the controls in the absence of DMSO to demonstrate that the solvent does not affect CXCL12-mediated functions, together with the effect of the orthosteric inhibitor AMD3100. In addition, we have also included representative images of the effect of the different compounds on CXCL12-induced polarization (Fig. 4C).

      (60) In Figure 4A, carefully assess overlapping error bars and ensure accurate interpreta7on. If necessary, consider alternative representation. 

      We have tried alternative representations of data in Fig. 4A, but in all cases the figure was unclear. We believe that the way we represent the data in the original manuscript is the most clear and appropriate.  Nevertheless, we have now included significance values as a table annexed to the figure, as well as the effect of AMD3100, as a control of inhibition

      (61) Supplementary Figure 1A: 

      •  Improve the clarity of bar plots for better understanding. Consider reordering them from the most significant to the least. 

      This was a good idea, and therefore Supplementary Fig. 1A has been reorganized to improve clarity.

      (62) Supplementary Figure 1C: 

      •  Clarify the rationale behind choosing the 12.5 nM concentration and explain if different concentrations of CXCL12 were tested. 

      In old Supplementary Fig. 1C, we used untreated cells, that is, CXCL12 was not present in the assay.  These experiments were performed to test the potential toxicity of DMSO (solvent) or the negative allosteric modulators on Jurkat cells. The 12.5 nM concentration of CXCL12 mentioned in the figure legend applied only to panels A and B, as indicated in the figure legend. We previously optimized this concentration for Jurkat cells using different concentrations of CXCL12 between 5 and 100 nM.  Nevertheless, we have reorganized old supplementary fig. 1 and clarified the figure legend to avoid misinterpretations (see Supplementary Fig 1A, B and Supplementary Fig. 2A, B).

      (63) Explain the observed reduction in fluorescence intensity for AGR1.135. 

      The cell cycle analysis has been moved from Supplementary Fig. 1C to a new Supplementary Fig. 2.  It now includes the flow cytometry panels to show fluorescence intensity as a function of the number of cells analyzed (Panel 1A) as well as a table (panel B) with the percentage of cells in each phase of the cell cycle. We believe that the apparent reduction in fluorescence that the reviewer observes is mainly due to the number of events analyzed. However, we have changed the flow cytometry panels for others that are more representative and included a table with the mean of the different results. When we determined the percentage of cells in each cell cycle phase, we observed that it looks very similar in all the experimental conditions. That is, none of the compounds affected any of the cell cycle phases. We have also included the effect of H2O2 and staurosporine as control compounds inducing cell death and cell cycle alteration of Jurkat cells.

      (64) Supplementary Table 1: 

      •  Include a column specifying the scoring for each compound to provide a clear reference for readers. 

      To facilitate references to readers, we have now included the inhibitory effect of each compound on Jurkat cell migration in the revised version of this table. 

      (65) Minor Points 

      Page 2 - Abstract: Rephrase the first sentence of the abstract to enhance fluidity. 

      Although the entire manuscript was revised by a professional English editor, we appreciate the valuable comments of this reviewer and we have corrected these issues accordingly.

      (66) Page 2 - Abstract: Explicitly define "CXCR4" as "C-X-C chemokine receptor type 4" the first time it appears.

      We have not used C-X-C chemokine receptor type 4 the first time it appears in the abstract. CXCR4 is an acronym normally accepted to identify this chemokine receptor, and it is used as CXCR4 in many articles published in eLife. However, we introduce the complete name the first time it appears in the introduction.

      (67) Page 2 - Abstract: Explicitly define "CXCL12" as "C-X-C motif chemokine 12" the first time it is mentioned. 

      As we have discussed in the previous response, we have not used C-X-C motif chemokine 12 the first time CXCL12 appears in the abstract, as it is a general acronym normally accepted to identify this specific chemokine, even in eLife papers. However, we introduce the complete name the first time it appears in the introduction section.

      (68) Page 2 - Abstract: Explicitly define "TMV and TMVI" upon its first mention.

      The acronym TM has been defined as “Transmembrane” in the revised version

      (69) Page 2 - Abstract: Review the use of "in silico" in the sentence for accuracy and consider revising if necessary.

      With the term “in silico” we want to refer to those experiments performed on a computer or via computer simulation software. We have carefully reviewed its use in the new version of the manuscript.

      (70) Page 2 - Abstract: Add a comma after "compound" in the sentence, "We identified AGR1.137, a small compound that abolishes...".

      A comma after “compound” has been added in the revised sentence.

      (71) Page 2 - Significance Statement: Rephrase the first sentence of the "Significance Statement" to avoid duplication with the abstract.

      The first sentence of the Significance Statement has been revised to avoid duplication with the abstract. 

      (72) Page 2 - Significance Statement: Break down the lengthy sentence, "Here, we performed in silico analyses..." for better readability. 

      The sentence starting by “Here, we performed in silico analyses…” has been broken down in the revised manuscript.

      (73) Page 2 - Introduction: Replace "Murine studies" with a more specific term for clarity.

      The term “murine studies” is normally used to refer to experimental studies developed in mice. We have nonetheless rephrased the sentence.

      (74) Page 3 - Introduction: Rephrase the sentence for clarity: "Finally, using a zebrafish model, ..."

      The sentence has been now rephrased for clarity.

      (75) Results-AGR1.135 and AGR1.137 block CXCL12-mediated CXCR4 nanoclustering and dynamics: 

      Rephrase the sentence for clarity: "Retreatment with AGR1.135 and AGR1.137, but not with AGR1.131, substantially impaired CXCL12-mediated receptor nanoclustering.”

      The sentence has been rephrased for clarity.

      (76) Results - AGR1.135 and AGR1.137 incompletely abolish CXCR4-mediated responses in Jurkat cells: Clarify the sentence: "In contrast to the effect promoted by AMD3100, a binding-site antagonist of CXCR4..."

      The sentence has been modified for clarity.

      (77) Consider using "orthosteric" instead of "binding-site" antagonist.

      The term orthosteric is now used throughout to refer to a binding site antagonist.

      (78) Discussion: Use the term "in silico" only when necessary.

      We have carefully reviewed the use of “in silico” in the manuscript.

      (79) Discussion: Clarify the sentence: "...not affect neither CXCR2-mediated cell migration...". Confirm if "CXCL12" is intended.

      The sentence refers to the chemokine receptor CXCR2, which binds the chemokine CXCL2. To test the specificity of the compounds for the CXCL12/CXCR4 axis, we evaluated CXCL2-mediated cell migration.  The results indicated that CXCL2/CXCR2 axis was not affected by the negative allosteric modulators, whereas CXCL12-mediated cell migration was blocked.  The sentence has been clarified in the new version of the manuscript.

      (80) Figure 4B: Bold the "B" in the figure label for consistency.

      The “B” in Fig. 4B has been bolded.

      Reviewer #2

      (1) Fig 2. The SPT data is sub-optimal in its presentation as well as analysis. Example images should be shown. The analysis and visualization of the data should be reconsidered for improvements. Graphs with several hundreds, in some conditions over 1000 tracks, per condition are very hard to compare. The same (randomly selected representative set) number of data points should be shown for better visualization. Also, more thorough analyses like MSD or autocorrelation functions are lacking - they would allow enhanced overall representation of the data.

      In agreement with the reviewer’s commentary, we have modified the representation of Fig. 2. We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for these type of data. We have also included as supplementary material representative videos for the TIRF-M experiments performed to allow readers to visualize the original images. Regarding the MSD analyses, they were developed to determine all D1-4 values. According to the data published by Manzo & García-Parajo (Manzo C. & García-Parajo M.F. Rep.Prog. Phys. 2015) due to the finite trajectory length the MSD curve at large tlag has poor statistics and deviates from linearity. However, the estimation of the Diffusion Coefficient (D1-4) can be obtained by fitting of the short tlag region of the MSD plot giving a more accurate idea of the behavior of particles. In agreement we show D1-4 values and not MSD data. 

      Due to the space restrictions, it is very difficult to include all the figures generated, but, only for review purposes, we included in this point-by-point reply some representative plots of the MSD values as a function of the time from individual trajectories showing different types of motion obtained in our experiments (Author response image 7).

      Author response image 7.

      Representative MSD plots from individual trajectories of CXCR4-AcGFP showing different types of motion: A) confined, B) Brownian/Free, C) direct transport of CXCR4-AcGFP particles diffusing at the cell membrane detected by SPT-TIRF in resting JKCD4 cells.

      Further analysis, such as the classification based on particle motion, has not been included in this article. This classification uses the moment scaling spectrum (MSS), described by Ewers H. et al. 2005 PNAS, and requires particles with longer trajectories (>50 frames). Only for review purposes, we include a figure showing the percentage of the MSS-based particle motion classification for each condition. As expected, most of long particles are confined, with a slight increase in the percentage upon CXCL12 stimulation in all conditions, except in cell treated with AGR1.137 (Author response image 8).

      Author response image 8.

      Effects of the negative allosteric modulators on the Types of Motion of CXCR4. Percentage of single trajectories with different types of motion, classified by MSS (DMSO: 58 particles in 59 cells on FN; 314 in 63 cells on FN+CXCL12; AGR1.131: 102 particles in 71 cells on FN; 258in 69 cells on FN+CXCL12; AGR1.135: 86 particles in 70 cells on FN; 120 in 77 cells on FN+CXCL12; AGR1.137: 47 particles in 66 cells on FN; 74 in 64 cells on FN+CXCL12) n = 3.

      (2) Fig 3. The figure legends have inadequate information on concentrations and incubation times used, both for the compounds and other treatments like CXCL12 and forskolin. For the Western blot data, also the quantification should be added to the main figure. The compounds, particularly AGR1.137 seem to lead to augmented stimulation of pAKT and pERK. This should be discussed

      The Fig. 3 legend has been corrected in the revised manuscript. Fig. 3D now contains representative western blots and the densitometry evaluation of these experiments. As the reviewer indicates, we also detected in the western blot included, augmented stimulation of pAKT and pERK in cells treated with AGR1.137. However, as shown in the densitometry analysis, no significant differences were noted between the data obtained with each compound. As a control of inhibition of CXCL12 stimulation we have included a new Supplementary Fig. 4 showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (3) Fig. 4 immunofluorescence data on polarization as well as the flow chamber data lack the representative images of the data. The information on the source of the T cells is missing. Not clear if this experiment was done on bilayers or on static surfaces.

      Representative images for the data shown in Figure 4B have been added in the revised figure (Fig. 4C). The experiments in Fig. 4B were performed on static surfaces. As indicated in the material and methods section, primary T cell blasts were added to fibronectin-coated glass slides and then were stimulated or not with CXCL12 (5 min at 37ºC) prior to fix permeabilize and stain them with Phalloidin. Primary T cell blasts were generated from PBMCs isolated from buffy coats that were activated in vitro with IL-2 and PHA as indicated in the material and methods section.

      (4) The data largely lacks titration of different concentrations of the compounds. How were the effective concentration and treatment times determined? What happens at higher concentrations? It is important to show, for instance, if the CXCR12 binding gets inhibited at higher concentrations. most experiments were performed with 50 uM, but HeLa cell data with 100 uM. Why and how was this determined? 

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the migration experiments using Jurkat cells. We choose 50 µM for further studies as it was the concentration that inhibits 50-75% of the ligand induced cell migration. 

      We have also included the effect of two doses of the compounds (10 and 50 µM) in the zebrafish model as well as AMD3100 (1 and 10 µM) as control (new Fig. 7D, E).  Tumors were imaged within 2 hours of implantation and tumor-baring embryos were treated with either vehicle (DMSO) alone, AGR1.131 or AGR1.137 at 10 and 50 µM or AMD3100 at 1 and 10 µM for three days, followed by re-imaging.

      Regarding the amount of CXCL12 used in these experiments, with the exception of cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in all the other experiments the optimal concentration of CXCL12 employed was 50 nM. In the case of the directional cell migration assays, we use 100 nM to create the chemokine gradient in the device. These concentrations have been optimized in previous works of our laboratory using these types of experiments. It should also be noted that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration that is retained in the surface after the washing steps performed prior adding the cells.

      (5) The authors state that they could not detect direct binding of the compounds and the CXCR14. It should be reported what approaches were tried and discussed why this was not possible. 

      We attempted a fluorescence spectroscopy strategy to formally prove the ability of AGR1.135 to bind CXCR4, but this strategy failed because the compound has a yellow color that interfered with the determinations. We also tried a FRET strategy (see supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers in cells treated with AGR1.135; this effect was due to the yellow color of this compound that interferes with FRET determinations. In the same assays, AGR1.137 did not modify FRET efficiency for CXCR4 homodimers and therefore we cannot assume that AGR1.137 binds on CXCR4. All these data have been considered in the revised discussion.

      (6) The proliferation data in Supplementary Figure 1 lacks controls that affect proliferation and indication of different cell cycle stages. What is the conclusion of this data? More information on the effects of the drug to cell viability would be important.

      Toxicity in Jurkat cells was first determined by propidium iodide incorporation. Some compounds (i.e., AGR1.103 and VSP3.1) were discarded from further analysis as they were toxic for cells. In a deeper analysis of cell toxicity, even if these compounds did not kill the cells, we checked whether they could alter the cell cycle of the cells. New Supplementary Fig. 2 includes a table (panel B) with the percentage of cells in each cell cycle phase, and no differences between any of the treatments tested were detected. 

      Nevertheless, to clarify this issue the revised version of the figure also includes H2O2 and staurosporine stimuli to induce cell death and cell cycle alterations as controls of these assays.

      (7) The flow data in Supplementary Figure 2 should be statistically analysed. 

      Bar graphs corresponding to the old Supplementary Fig. 2 (new Supplementary Fig. 3) are shown in Fig. 3B. We have also incorporated the corresponding statistical analysis to this figure. 

      (8) In general, the authors should revise the figure legends to ensure that critical details are added. 

      We have carefully revised all the figure legends in the new version of the manuscript.

      (9) Bar plots are very poor in showing the heterogeneity of the data. Individual data points should be shown whenever feasible. Superplot-type of representation is strongly advised (https://doi.org/10.1083/jcb.202001064).

      We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for our TIRF-M data (see revised Fig.  2).

    1. Author response:

      Reviewer #1 (Public Review):

      Summary: 

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation. 

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, we have provided the first comprehensive analysis of these datasets.

      Strengths: 

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans. 

      Weaknesses: 

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect. 

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We will revise the text to highlight what this study and others demonstrate about SMA-9’s role in body size. We also plan to analyze additional target genes to deepen our model for how SMA-3 and SMA-9 interact functionally to produce a given transcriptional response.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. The limitation in the previous work is that only a small number of target genes was analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale.  Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We will revise the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9. 

      We appreciate this suggestion and will clarify how SMA-9 and its target genes contribute to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion. 

      We thank the reviewer for this suggestion. We will add more context to the Discussion.

      Reviewer #2 (Public Review): 

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers. 

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data. 

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses? 

      We agree that these are intriguing questions and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which. 

      We thank the reviewer for this suggestion and will add the requested information in the text.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated. 

      rol-6 has been identified as a transcriptional target of this pathway. The level of ROL-6 protein, however, is not changed in sma-3 and sma-9 mutants, indicating that there is post-transcriptional compensation. We will include these data in the revised manuscript.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.). 

      We will add this information to the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Work by Brosseau et. al. combines NMR, biochemical assays, and MD simulations to characterize the influence of the C-terminal tail of EmrE, a model multi-drug efflux pump, on proton leak. The authors compare the WT pump to a C-terminal tail deletion, delta_107, finding that the mutant has increased proton leak in proteoliposome assays, shifted pH dependence with a new titratable residue, faster-alternating access at high pH values, and reduced growth, consistent with proton leak of the PMF.

      Strengths:

      The work combines thorough experimental analysis of structural, dynamic, and electrochemical properties of the mutant relative to WT proteins. The computational work is well aligned in vision and analysis. Although all questions are not answered, the authors lay out a logical exploration of the possible explanations.

      Weaknesses:

      There are a few analyses that are missing and important data left out. For example, the relative rate of drug efflux of the mutant should be reported to justify the focus on proton leak. Additionally, the correlation between structural interactions should be directly analyzed and the mutant PMF also analyzed to justify the claims based on hydration alone. Some aspects of the increased dynamics at high pH due to a potential salt bridge are not clear.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores the role of the C-terminal tail of EmrE in controlling uncoupled proton flux. Leakage occurs in the wild-type transporter under certain conditions but is amplified in the C-terminal truncation mutant D107. The authors use an impressive combination of growth assays, transport assays, NMR on WT and mutants with and without key substrates, classical MD, and reactive MD to address this problem. Overall, I think that the claims are well supported by the data, but I am most concerned about the reproducibility of the MD data, initial structures used for simulations, and the stochasticity of the water wire formation. These can all be addressed in a revision with more simulations as I point out below. I want to point out that the discussion was very nicely written, and I enjoyed reading the summary of the data and the connection to other studies very much.

      Strengths:

      The Henzler-Wildman lab is at the forefront of using quantitative experiments to probe the peculiarities in transporter biophysics, and the MD work from the Voth lab complements the experiments quite well. The sheer number of different types of experimental and computational approaches performed here is impressive.

      Weaknesses:

      The primary weaknesses are related to the reproducibility of the MD results with regard to the formation of water wires in the WT and truncation mutant. This could be resolved with simulations starting from structures built using very different loops and C-terminal tails.

      The water wire gates identified in the MD should be tested experimentally with site-directed mutagenesis to determine if those residues do impact leak.

      We appreciate the reviewers thoughtful consideration of our manuscript, and their recognition of the variety of experimental and computational approaches we have brought to bear in probing the very challenging question of uncoupled proton leak through EmrE.

      We did record SSME measurements with MeTPP+, a small molecule substrate at two different protein:lipid ratios. These experiments report the rate of net flux when both proton-coupled substrate antiport and substrate-gated proton leak are possible. We will add this data to the revision, including data acquired with different lipid:protein ratio that confirms we are detecting transport rather than binding. In brief, this data shows that the net flux is highly dependent on both proton concentration (pH) and drug-substrate concentration, as predicted by our mechanistic model. This demonstrates that both types of transport contribute to net flux when small molecule substrates are present.

      In the absence of drug-substrate, proton leak is the only possible transport pathway. The pyranine assay directly assesses proton leak under these conditions and unambiguously shows faster proton entry into proteoliposomes through the ∆107-EmrE mutant than through WT EmrE, with the rate of proton entry into ∆107-EmrE proteoliposomes matching the rate of proton entry achieved by the protonophore CCCP. We have revised the text to more clearly emphasize how this directly measures proton leak independently of any other type of transport activity. The SSME experiments with a proton gradient only (no small molecule substrate present) provide additional data on shorter timescales that is consistent with the pyranine data. The consistency of the data across multiple LPRs and comparison of transport to proton leak in the SSME assays further strengthens the importance of the C-terminal tail in determining the rate of flux.

      None of the current structural models have good resolution (crystallography, EM) or sufficient restraints (NMR) to define the loop and tail conformations sufficiently for comparison with this work. We are in the process of refining an experimental structure of EmrE with better resolution of the loop and tail regions implicated in proton-entry and leak. Direct assessment of structural interactions via mutagenesis is complicated because of the antiparallel homodimer structure of EmrE. Any point mutation necessarily affects both subunits of the dimer, and mutations designed to probe the hydrophobic gate on the more open face of the transporter also have the potential to disrupt closure on the opposite face, particularly in the absence of sufficient resolution in the available structures. Thus, mutagenesis to test specific predicted structural features is deferred until our structure is complete so that we can appropriately interpret the results.

      In our simulation setup, the MD results can be considered representative and meaningful for two reasons. First, the C-terminal tail, not present in the prior structure and thus modeled by us, is only 4 residues long. We will show in the revision and detailed response that the system will lose memory of its previous conformation very quickly, such that velocity initialization alone is enough for a diverse starting point. Second, our simulation is more like simulated annealing, starting from a high free energy state to show that, given such random initialization, the tail conformation we get in the end is consistent with what we reported. It is also difficult to sample back-and-forth tail motion within a realistic MD timescale. Therefore, it can be unconclusive to causally infer the allosteric motions with unbiased MD of the wildtype alone. The best viable way is to look at the equilibrium statistics of the most stable states between WT- and ∆107-EmrE and compare the differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is well done and well presented. In my opinion, the authors must address the following questions.

      (1) It is unclear to a non-SSME-expert, why the net charge translocated in delta_107 is larger than in WT. For such small pH gradients (0.5-1pH unit), it seems that only a few protons would leave the liposome before the internal pH is adjusted to be the same as the external. This number can be estimated given the size of the liposomes. What is it? Once the pH gradient is dissipated, no more net proton transport should be observed. So, why would more protons flow out of the mutant relative to WT?

      We appreciate the complexity of both the system and assay and have made revisions to both the main text and SI to address these points more clearly. While we can estimate liposomes size, we cannot easily quantify the number of liposomes on the sensor surface so cannot calculate the amount of charge movement as suggested by the reviewer. We have revised Fig. 3.2 and added additional data at low and high pH with different lipid to protein ratios to distinguish pre-steady state (proton release from the protein) and steady state processes (transport). An extended Fig. 3.2 caption and revised discussion in the main text clarify these points.

      We have also revised SI figure 3.2 to include an example of transport driven by an infinite drug gradient. Drug-proton antiport results in net charge build-up in the liposome since two protons will be driven out for every +1 drug transported in. This also creates a pH gradient is created (higher proton concentration outside). The negative inside potential inhibits further antiport of drug. However, both the negative-inside potential and proton gradient will drives protons back into the liposome if there is a leak pathway available. This is clearly visible with a reversal of current negative (antiport) to positive (proton backflow), and the magnitude of this back flow is larger for ∆107-EmrE which lacks the regulatory elements provided by the C-terminal tail. We have amended the main text and SI to include this discussion.

      (2) Given the estimated rate of transport, size of liposomes, and pH gradient, how quickly would the SSME liposomes reach pH balance?

      Since SSME measurements are due to capacitive coupling and will represent the net charge movement, including pre-steady state contributions, the current values will be incredibly sensitive to individual rates of alternating access, proton and drug on- and off-rates. Time to pH balance would, therefore, differ based on the construct, LPR, absolute pH or drug concentrations as well as the magnitude of the given gradients. For this reason, we necessarily use integrated currents (transported charge over time) when comparing mutants as it reflects kinetic differences inherent to the mutant without over-processing the data, for example, by normalizing to peak currents which would over emphasize certain properties that will differ across mutants. This process allows for qualitative comparisons by subjecting mutants to the same pH and substrate gradients when the same density of transporter construct is present, and care is given to not overstate the importance of the actual quantities of charges that are moving as they will be highly context dependent. This is clearly seen in Fig 3.2 where the current is not zero and the net transported charge is still changing at the end of 1 second. We have amended SI figure 3.2 and the main text to include this discussion.

      (3) Given that H110 and E14 would deprotonate when the external pH is elevated above 7 and that these protons would be released to external bulk, the external bulk pH would decrease twice as much for WT compared to delta107. This would decrease the pH gradient for WT relative to the mutant. Can these effects be quantified and accounted for? Would this ostensibly decrease the amount of charge that transfers into the liposomes for WT? How would this impact the current interpretation that the two systems are driven by the same gradient?

      The reviewer is correct that there will be differences in deprotonation of WT and ∆107 and the amount of proton release will also change with pH. We have amended Figure 3.2 to clarify this difference and its significance. For the proton gradient only conditions in Figure 3, each set of liposomes were equilibrated to the starting pH by repeated washings and incubation before measurement occurred. For example, for the pH 6.5 inside, pH 7 outside condition, both the inside and outside pH were equilibrated at 6.5, and both E14 residues will be predominantly protonated in WT and ∆107, and H110 will be predominantly protonated in WT-EmrE. Upon application of the external pH 7 solution, protons will be released from the E14 of either construct, with additional proton being released from H110 for WT-EmrE causing a large pre-steady state negative contribution to the signal (Fig. 3.2A). Under this pH condition, we the peak current correlates with the LPR, as this release of protons will depend on density of the transporter. However, we also see that the longer-time decay of the signal correlates with the construct (WT or ∆107) and is relatively independent of LPR, consistent with a transport process rather than a rapid pre-steady state release of protons. Therefore, when we look at the actual transported charge over time, despite the higher contribution of proton release to the WT-EmrE signal, the significant increase in uncoupled proton transport for the C-terminal deletion mutant dominates the signal.

      As a contrast, we apply this same analysis to the pH 8 inside, pH 8.5 outside condition where both sets of transports will be deprotonated from the start (Fig. 3.2B). Now the peak currents, decay rates, and transported charge over time are all consistent for a given construct (WT or ∆107). The two LPRs for an individual construct match within error, as the differences in overall charge movement and transported charge over time are independent of pre-steady-state proton release from the transporter at high pH.

      (4) A related question, how does the protonation of H110 influence the potential rate of proton transport between the two systems? Does the proton on H110 transfer to E14?

      The protonation of H110 will only influence the rate of transport of WT-EmrE as its protonation is required for formation of the hydrogen bonding network that coordinates gating. However, protonation of both E14s will influence the rate of proton transport of both systems as protonation state affects the rate of alternating access which is necessary for proton turnover. This is another reason we use the transported charge over time metric to compare mutants as it allows for a common metric for mutants with altered rates which are present in the same density and under the same gradient conditions. We do not have any evidence to support transfer of proton from H110 to E14, but there is also no evidence to exclude this possibility. We do not discuss this in the manuscript because it would be entirely speculative.

      (5) Is the pKa in the simulations (Figure 6B) consistent with the experiment?

      We calculated the pKa from this WT PMF and got a pKa of 7.1, which is in close proximity of the experimental value of 6.8

      (6) Why isn't the PMF for delta_107 compared to WT to corroborate the prediction that hydration sufficiently alters both the rate and pKa of E14?

      We appreciate the reviewer’s suggestion and agree that a direct comparison would be valuable. However, several factors limit the interpretability of such an analysis in this context:

      (a) Our data indicate that the primary difference in free energy barriers between WT and Δ107 lies in the hydration step rather than proton transport itself. To fully resolve this, a 2D PMF calculation via 2D umbrella sampling would be required which can be very expensive. Solely looking at the proton transport side of this PMF will not give much difference.

      (b) Given this, the aim for us to calculate this PMF is to support our conjecture that the bottleneck for such transport is the hydrophobic gate.

      (7) The authors suggest that A61 rotation 'controls the water wire formation' by measuring the distribution of water connectivity (water-water distances via logS) and average distances between A61 and I68/I67. Delta_107 has a larger inter-residue distance (Figure 6A) more probable small log S closer waters connecting E14 and two residues near the top of the protein (Figure 5A). However, it strikes me that looking at average distances and the distribution of log S is not the best way to do this. Why not quantify the correlation between log S and A61 orientation and/or A61-I68/I71 distances as well as their correlation to the proposed tail interactions (D84-R106 interactions) to directly verify the correlation (and suggest causation) of these interactions on the hydration in this region. Additionally, plotting the RMSD or probability of waters below I68 and I171 as a function of A61-I68 distances and/or numbers over time would support the log S analysis.

      The reviewer requested that we provide direct correlation analyses between A61 orientation, residue distances (A61-I68/I71), and water connectivity (logS) to better support the claim about water wire formation, rather than relying solely on average distances and distributions.

      We appreciate the reviewer’s suggestion to strengthen our analysis with direct correlations. However, due to the slow kinetics of hydration/dehydration events, unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses. Instead, our approach focuses on equilibrium statistics, which reveal the dominant conformational states of WT- and Δ107-EmrE and provide meaningful insights into shifts in hydration patterns.

      (8) It looks like the D84-R106 salt bridge controls this A61-I68 opening. Could this also be quantifiably correlated?

      As discussed in response to the previous question, the unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses.

      (9) The NMR results show that alternating access increases in frequency from ~4/s for WT at low and high pH to ~17/s for delta_107 only at high pH. They then go on to analyze potential titration changes in the delta_107 mutant, finding two residues with approximate pKa values of 5.6 and 7.1. The former is assigned to E14, consistent with WT. But the latter is suggested to be either D84, which salt bridges to R106, or the C-terminal carboxylate. If it is D84, why would deprotonation, which would be essential to form the salt bridge, increase the rate of alternating access relative to WT?

      We note that the faster alternating access rate was observed for TPP+-bound ∆107-EmrE, not the transporter in the absence of substrate. In the absence of substrate the relatively broad lines preclude quantitative determination of the alternating access rate by NMR making it difficult to judge the validity of the reviewers reasoning. Identification of which residue (D84 or H110) corresponds to the shifted pKa is ultimately of little consequence as this mutant does not reflect the native conditions of the transporter. It is far more important to acknowledge that both R106 and D84 are sensitive to this deprotonation as it indicates these residues are close in space and provides experimental support for the existence of the salt bridge identified in the MD simulations, as discussed in the manuscript.

      (10) In a more general sense, can the authors speculate why an efflux pump would evolve this type of secondary gate that can be thrown off by tight binding in the allosteric site such as that demonstrated by Harmane? What potential advantage is there to having a tail-regulated gate?

      This was likely a necessity to allow for better coupling as these transporters evolved to be more promiscuous. The C-terminal tail is absent in tightly coupled family members such as Gdx who are specific for a single substrate and have a better-defined transport stoichiometry. We have included this discussion in the main text and are currently investigating this phenomenon further. Those experiments are beyond the scope of the current manuscript.

      (11) It is hard to visualize the PT reaction coordinate. Is the e_PT unit vector defined for each window separately based on the initial steered MD pathway? If so, how reliant is the PT pathway on this initial approximate path? Also, how does this position for each window change if/when E14 rotates? This could be checked by plotting the x,y,z distributions for each window and quantifying the overlap between windows in cartesian space. These clouds of distributions could also be plotted in the protein following alignment so the reader can visualize the reaction coordinate. Does the CEC localization ever stray to different, disconnected regions of cartesian phase space that are hidden by the reaction coordinate definition?

      The unit vector e_PT is the same across all windows based on unbiased MD. Therefore, the reaction coordinate (a scalar) is the vector from the starting point to the CEC, projected on this unit vector. E14 rotation does not significantly change the window definition a lot unless the CEC is very close to E14, where we found this to be a better CV. For detailed discussions about this CV, especially a comparison between a curvilinear CV, please see J. Am. Chem. Soc. 2018, 140, 48, 16535–16543 “Simulations of the Proton Transport” and its SI Figure S1.In the Supplementary Information, we added figure 6.1 to show the average X, Y, Z coordinates of each umbrella window.

      (12) Lastly, perhaps I missed it, but it's unclear if the rate of substrate efflux is also increased in the delta_107 mutant. If this is also increased, then the overall rate of exchange is faster, including proton leak. This would be important to distinguish since the focus now is entirely on proton leaks. I.e., is it only leak or is it overall efflux and leak?

      We have amended SI figure 3.2 to include a gradient condition where an infinite drug gradient is created across the liposome. The infinite gradient allows for rapid transport of drug into the liposomes until charge build-up opposes further transport. This peak is at the same time for both LPRs of WT- and ∆107-EmrE suggesting the rate of substrate transport is similar. Differences in the peak heights across LPRs can be attributed to competition between drug and proton for the primary binding site such that more proton will be released for the higher density constructs as described above. This process does also create a proton gradient as drug moving in is coupled to two protons moving out so as charge build-up inhibits further drug movement, the building proton gradient will also begin to drive proton back in which is another example of uncoupled leak. Here, again we see that this back-flow of protons or leak is of greater magnitude for ∆107-EmrE proteoliposomes that for those with WT-EmrE. We have included this discussion in the SI and main text.

      Minor

      (1) Introduction - the authors describe EmrE as a model system for studying the molecular mechanism of proton-coupled transport. This is a rather broad categorization that could include a wide range of phenomena distal from drug transport across membranes or through efflux pumps. I suggest further specifying to not overgeneralize.

      We revised to note the context of multidrug efflux.

      Reviewer #2 (Recommendations for the authors):

      Simulations. The initial water wire analysis is based on 4 different 1 ms simulations presented in Figure 5. The 3 WT replicates show similar results for the tail-blocking water wire formation, but the details of the system build and loop/C-terminal tail placement are not clear. It does appear that a single C-terminal tail model was created for all WT replicates. Was there also modeling for any parts of the truncation mutant? Regardless, since these initial placements and uncertainties in the structures may impact the results and subsequent water wire formation, I would like a discussion of how these starting structures impacted the formation or not of wires. I think that another WT replicate should be run starting from a completely new build that places the tail in a different (but hopefully reasonable location). This could be built with any number of tools to generate reasonable starting structures. It's critical to ensure that multiple independent simulations across different initial builds show the same water wire behavior so that we know the results are robust and insensitive to the starting structure and stochastic variation.

      We thank Reviewer 2 for their suggestion regarding the discussion of the initial structure. In our simulations, the C-terminal tail was initially modeled in an extended conformation (solvent-exposed) to mimic its disordered state prior to folding. This approach resembles an annealing process, where the system evolves from a higher free-energy state toward equilibrium. Notably, across all three replicas, we observed consistent folding of the tail onto the protein surface, supporting the robustness of this conformational preference.

      For the Δ107 truncation mutant, minimal modeling was required, as most experimental structures resolve residues up to S105 or R106. To rigorously assess the influence of the starting configuration, we analyzed the tail’s dynamics using backbone dihedral angle auto- and cross-correlation functions (new Supplementary Figures 10.1 and 10.2). These analyses reveal rapid decay of correlations—consistent with the tail’s short length (5 residues) and high flexibility—indicating that the system "forgets" its initial configuration well within the simulation timescale. Thus, we conclude that our sampling is sufficient to capture equilibrium behavior, independent of the starting structure.

      What does the size of the barrier in the PMF (Figure 6B) imply about the rate of proton transfer/leak and can the pKa shift of the acidic residue be estimated with this energy value compared to bulk?

      We noticed this point aligns with a related concern raised by Reviewer 1. For a detailed discussion please refer to Point 5 in our response to Reviewer 1.

      Experimental validation. The hypotheses generated by this work would be better buttressed if there were some mutation work at the hydrophobic gate (61, 68, 71) to support it. I realize that this may be hard, but it would significantly improve the quality.

      Due to the small size of the transporter, any mutagenesis of EmrE should necessarily be accompanied by functional characterization to fully assess the effects of the mutation on rate-limiting steps. We have revised the manuscript to add a discussion of the challenges with analyzing simple point mutants and citing what is known from prior scanning mutagenesis studies of EmrE.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      The addition of the discussion about the two isomers of 18:1 didn't quite work in the place that the authors added. What the authors wrote on line 126 is true about 18:1 isomers in wild type worms. However, they are reporting their lipidomics results of the fat-2(wa17) mutant worms. In this case, a substantial amount of the 18:1 is the oleic acid (18:1n-9) isomer. The authors can check Table 2 in their reference [10] and see that wild type and other fat mutants indeed contain approximately 10 fold more cis vaccenic than oleic acid, the fat-2(wa17) mutants do accumulate oleic acid, because the wild type activity of FAT-2 is to convert oleic acid to linoleic acid, where it can be converted to downstream PUFAs. I suggest editing their sentence on line 126 to say that the high 18:1 they observed agrees with [10], and then comment about reference 10 showing the majority of 18:1 being the cis-vaccenic isomer in most strains, but the oleic acid isomer is more abundantly in the fat-2(wa17) mutant strain.

      We thank the reviewer for spotting that and sparing us a bit of embarrassment. We have now modified the text and hope we got it right this time:

      "Even though the lipid analysis methods used here are not able to distinguish between different 18:1 species, a previous study showed that the majority of the 18:1 fatty acids in the fat-2(wa17) mutant is actually 18:1n9 (OA) [10] and not 18:1n7 (vaccenic acid) as in most other strains [10,23]; this is because OA is the substrate of FAT-2 and thus accumulates in the mutant."

      Reviewer #2:

      I still do not agree with the answer to my previous comment 6 regarding Figure S2E. The authors claim that hif-1(et69) suppresses fat-2(wa17) in a ftn-2 null background (in Figure S2 legend for example). To claim so, they would need to compare the triple mutant with fat2(wa17);ftn-2(ok404) and show some rescue. However, we see in Figure 5H that ftn2(ok404) alone rescues fat-2(wa17). Thus, by comparing both figures, I see no additional effect of hif-1(et69) in an ftn-2(ok404) background. I actually think that this makes more sense, since the authors claim that hif-1(et69) is a gain-of-function mutation that acts through suppression of ftn-2 expression. Thus, I would expect that without ftn-2 from the beginning, hif-1(et69) does not have an additional effect, and this seems to be what we see from the data. Thus, I would suggest that the authors reformulate their claims regarding the effect of hif1(et69) in the ftn-2(ok404) background, which seems to be absent (consistently with what one would expect).

      We completely agree with the reviewer and indeed this is the meaning that we tried to convey all along. The text has now been modified as follows:

      "Lastly, ftn-2(et68) is still a potent fat-2(wa17) suppressor when hif-1 is knocked out (S2D Fig), suggesting that no other HIF-1-dependent functions are required as long as ftn-2 is downregulated; this conclusion is supported by the observation that the potency of the ftn2(ok404) null allele to act as a fat-2(wa17) suppressor is not increased by including the hif-1(et69) allele (compare Fig 5H and S2E Fig)."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors present a novel CRISPR/Cas9-based genetic tool for the dopamine receptor dop1R2. Based on the known function of the receptor in learning and memory, they tested the efficacy of the genetic tool by knocking out the receptor specifically in mushroom body neurons. The data suggest that dop1R2 is necessary for longer-lasting memories through its action on ⍺/ß and ⍺'/ß' neurons but is dispensable for short-term memory and thus in ɣ neurons. The experiments impressively demonstrate the value of such a genetic tool and illustrate the specific function of the receptor in subpopulations of KCs for longer-term memories. The data presented in this manuscript are significant.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the role of the dopamine receptor, Dop1R2, in memory formation. This receptor has complex roles in supporting different stages of memory, and the neural mechanisms for these functions are poorly understood. The authors are able to localize Dop1R2 function to the vertical lobes of the mushroom body, revealing a role in later (presumably middle-term) aversive and appetitive memory. In general, the experimental design is rigorous, and statistics are appropriately applied. While the manuscript provides a useful tool, it would be strengthened further by additional mechanistic studies that build on the rich literature examining the roles of dopamine signaling in memory formation. The claim that Dop1R2 is involved in memory formation is strongly supported by the data presented, and this manuscript adds to a growing literature revealing that dopamine is a critical regulator of olfactory memory. However, the manuscript does not necessarily extend much beyond our understanding of Dop1R2 in memory formation, and future work will be needed to fully characterize this reagent and define the role of Dop1R2 in memory.

      Strengths:

      (1) The FRT lines generated provide a novel tool for temporal and spatially precise manipulation of Dop1R2 function. This tool will be valuable to study the role of Dop1R2 in memory and other behaviors potentially regulated by this gene.

      (2) Given the highly conserved role of Dop1R2 in memory and other processes, these findings have a high potential to translate to vertebrate species.

      Weaknesses:

      (1) The authors state Dop1R2 associates with two different G-proteins. It would be useful to know which one is mediating the loss of aversive and appetitive memory in Dop1R2 knockout flies.

      We thank you for the insightful comment. We agree that it would be very useful to know which G-proteins are transmitting Dop1R2 signaling. To that extent, we examined single-cell transcriptomics data to check the level of co-expression of Dop1R2 with G-proteins that are of interest to us. (Figure 1 S1)

      Lines 312-325

      “Some RNA binding proteins and Immediate early genes help maintain identities of Mushroom body cells and are regulators of local transcription and translation (de Queiroz et al., 2025; Raun et al., 2025). So, the availability of different G-proteins may change in different lobes and during different phases of memory. The G-protein via which GPCRs signal, may depend on the pool of available G-proteins in the cell/sub-cellular region (Hermans, 2003)., Therefore, Dop1R2 may signal via different G-proteins in different compartments of the Mushroom body and also different compartments of the neuron. We looked at Gαo and Gαq as they are known to have roles in learning and forgetting (Ferris et al., 2006; Himmelreich et al., 2017). We found that Dop1R2 co-expresses more frequently with Gαo than with Gαq (Figure 1 S1). While there is evidence for Dop1R2 to act via Gαq (Himmelreich et al., 2017). It is difficult to determine whether this interaction is exclusive, or if Dop1R2 can also be coupled to other G-proteins. It will be interesting to determine the breadth of G-proteins that are involved in Dop1R2 signaling.”

      (2) It would be interesting to examine 24hr aversive memory, in addition to 24hr appetitive memory.

      This is indeed an important point and we agree that it will complete the assessment of temporally distinct memory traces. We therefore performed the Aversive LTM experiments and include them in the results.

      Lines 208-228

      “24h memory is impaired by loss of Dop1R2

      Next, we wanted to see if later memory forms are also affected. One cycle of reward training is sufficient to create LTM (Krashes & Waddell, 2008), while for aversive memory, 5-6 cycles of electroshock-trainings are required to obtain robust long-term memory scores (Tully et al., 1994). So, we looked at both, 24h aversive and appetitive memory. For aversive LTM, the flies were tested on the Y-Maze apparatus as described in (Mohandasan et al., (2022).

      Flipping out Dop1R2 in the whole MB causes a reduced 24h memory performance (Figure 4A, E). No phenotype was observed when Ddop1R2 was flipped out in the γ-lobe (Figure 4B, F). However, similar to 2h memory, loss of Ddop1R2 in the α/β-lobes (Figure 4C, G) or the α’/β’-lobes (Figure 4D, H) causes a reduction in memory performance. Thus, Dop1R2 seems to be involved in aversive and appetitive LTM in the α/β-lobes and the α’/β’-lobes.

      Previous studies have shown mutation in the Dop1R2 receptor leads to improvement in LTM when a single shock training paradigm is used (Berry et al., 2012). As we found that it disrupts LTM, we wanted to verify if the absence of Dop1R2 outside the MB is what leads to an improvement in memory. To that extent, we tested panneuronal flip-out of Dop1R2 flies for 6hr and 24hr memory upon single shock using the elav-Gal4 driver. We found that it did not improve memory at both time points (Figure 4 S1). Confirming that flipping out Dop1R2 panneuronally does not improve LTM (Figure 4 S1C) and highlighting its irrelevance in memory outside the MB.”

      (3) The manuscript would be strengthened by added functional analysis. What are the DANs that signal through Dop1R. How do these knockouts impact MBONs?

      We thank you for this question. We indeed agree that it is a highly relevand and open question, how distinct DANs signal via distinct Dopamine receptors. Our work here uniquely focusses on Dop1R2 within the MB. We aim to investigate other DopRs and the connection between DANs in the future using similar approaches.

      (4) Also in Figure 2, the lobe-specific knockouts might be moved to supplemental since there is no effect. Instead, consider moving the control sensory tests into the main figure.

      We thank you for this suggestion and understand that in Figure 2 no significant difference is seen. However, we have emphasized in the text that the results from the supplementary figures are just to confirm that the modifications made at the Dop1R2 locus did not alter its normal function.

      Lines 156-162

      “We wanted to see if flipping out Dop1R2 in the MB affects memory acquisition and STM by using classical olfactory conditioning. In short, a group of flies is presented with an odor coupled to an electric shock (aversive) or sugar (appetitive) followed by a second odor without stimulus. For assessing their memory, flies can freely choose between the odors either directly after training (STM) or at a later timepoint.

      To ensure that the introduced genetic changes to the Dop1R2 locus do not interfere with behavior we first checked the sensory responses of that line”

      (5) Can the single-cell atlas data be used to narrow down the cell types in the vertical lobes that express Dop1R2? Is it all or just a subset?

      This is indeed an interesting question, and we thank you for mentioning it. To address this as best as we could, we analyzed the single cell transcriptomic data from (Davie et al., 2018) and presented it in Figure 1 S1.

      Reviewer #3 (Public Review):

      Summary:

      Kaldun et al. investigated the role of Dopamine Receptor Dop1R2 in different types and stages of olfactory associative memory in Drosophila melanogaster. Dop1R2 is a type 1 Dopamine receptor that can act both through Gs-cAMP and Gq-ERCa2+ pathways. The authors first developed a very useful tool, where tissue-specific knock-out mutants can be generated, using Crispr/Cas9 technology in combination with the powerful Gal4/UAS gene-expression toolkit, very common in fruit flies.

      They direct the K.O. mutation to intrinsic neurons of the main associative memory centre fly brain-the mushroom body (MB). There are three main types of MB-neurons, or Kenyon cells, according to their axonal projections: a/b; a'/b', and g neurons.

      Kaldun et al. found that flies lacking dop1R2 all over the MB displayed impaired appetitive middle-term (2h) and long-term (24h) memory, whereas appetitive short-term memory remained intact. Knocking-out dop1R2 in the three MB neuron subtypes also impaired middle-term, but not short-term, aversive memory.

      These memory defects were recapitulated when the loss of the dop1R2 gene was restricted to either a/b or a'/b', but not when the loss of the gene was restricted to g neurons, showcasing a compartmentalized role of Dop1R2 in specific neuronal subtypes of the main memory centre of the fly brain for the expression of middle and long-term memories.

      Strengths:

      (1) The conclusions of this paper are very well supported by the data, and the authors systematically addressed the requirement of a very interesting type of dopamine receptor in both appetitive and aversive memories. These findings are important for the fields of learning and memory and dopaminergic neuromodulation among others. The evidence in the literature so far was generated in different labs, each using different tools (mutants, RNAi knockdowns driven in different developmental stages...), different time points (short, middle, and long-term memory), different types of memories (Anesthesia resistant, which is a type of protein synthesis independent consolidated memory; anesthesia sensitive, which is a type of protein synthesis-dependent consolidated memory; aversive memory; appetitive memory...) and different behavioral paradigms. A study like this one allows for direct comparison of the results, and generalized observations.

      (2) Additionally, Kaldun and collaborators addressed the requirement of different types of Kenyon cells, that have been classically involved in different memory stages: g KCs for memory acquisition and a/b or a'/b' for later memory phases. This systematical approach has not been performed before.

      (3) Importantly, the authors of this paper produced a tool to generate tissue-specific knock-out mutants of dop1R2. Although this is not the first time that the requirement of this gene in different memory phases has been studied, the tools used here represent the most sophisticated genetic approach to induce a loss of function phenotypes exclusively in MB neurons.

      Weaknesses:

      (1) Although the paper does have important strengths, the main weakness of this work is that the advancement in the field could be considered incremental: the main findings of the manuscript had been reported before by several groups, using tissue-specific conditional knockdowns through interference RNAi. The requirement of Dop1R2 in MB for middle-term and long-term memories has been shown both for appetitive (Musso et al 2015, Sun et al 2020) and aversive associations (Plaçais et al 2017).

      Thank you for this comment. We believe that the main takeaway from the paper is the elegant tool we developed, to study the role of Dop1R2 in fruit flies by effectively flipping it out spatio-temporally. Additionally, we studied its role in all types of olfactory associative memory to establish it as a robust tool that can be used for further research in place of RNAi knockouts which are shown to be less efficient in insects as mentioned in the texts in line 394-398.

      “The genetic tool we generated here to study the role of the Dop1R2 dopamine receptor in cells of interest, is not only a good substitute for RNAi knockouts, which are known to be less efficient in insects (Joga et al., 2016), but also provides versatile possibilities as it can be used in combination with the powerful genetic tools of Drosophila.”

      (2) The approach used here to genetically modify memory neurons is not temporally restricted. Considering the role of dopamine in the correct development of the nervous system, one must consider the possible effects that this manipulation can have in the establishment of memory circuits. However, previous studies addressing this question restricted the manipulation of Dop1R2 expression to adulthood, leading to the same findings than the ones reported in this paper for both aversive and appetitive memories, which solidifies the findings of this paper.

      We thank you for this comment and we agree that it would be important to show a temporally restricted effect of Dop1R2 knockout. To assess this and rule out potential developmental defects we decided to restrict the knockout to the post-eclosion stage and to include these results.

      Lines 230-250

      “Developmental defects are ruled out in a temporally restricted Dop1R2 conditional knockout.

      To exclude developmental defects in the MB caused by flip-out of Dop1R2, we stained fly brains with a FasII antibody. Compared to genetic controls, flies lacking Dop1R2 in the mushroom body had unaltered lobes (Figure 4 S2C).

      Regardless, we wanted to control for developmental defects leading to memory loss in flip-out flies. So, we generated a Gal80ts-containing line, enabling the temporal control of Dop1R2 knockout in the entire mushroom body (MB). Given that the half-life of the receptor remains unknown, we assessed both aversive short-term memory (STM) and long-term memory (LTM) to determine whether post-eclosion ablation of Dop1R2 in the MB produced differences compared to our previously tested line, in which Dop1R2 was constitutively knocked out from fertilization. To achieve this, flies were maintained at 18°C until eclosion and subsequently shifted to 30°C for five to seven days. On the fifth day, training was conducted, followed by memory testing. Our results indicate that aversive STM was not significantly impaired in Dop1R2-deficient MBs compared to control flies (Figure 4 S3), consistent with our previous findings (Figure 2). However, aversive LTM was significantly impaired relative to control lines (Figure 4 S3), which also aligned with prior observations. These findings strongly indicate that memory loss caused by Dop1R2 flip-out is not due to developmental defects.”

      (3) The authors state that they aim to resolve disparities of findings in the field regarding the specific role of Dop1R2 in memory, offering a potent tool to generate mutants and addressing systematically their effects on different types of memory. Their results support the role of this receptor in the expression of long-term memories, however in the experiments performed here do not address temporal resolution of the genetic manipulations that could bring light into the mechanisms of action of Dop1R2 in memory. Several hypotheses have been proposed, from stabilization of memory, effects on forgetting, or integration of sequences of events (sensory experiences and dopamine release).

      We thank you for this comment. We agree that it would be interesting to dissect the memory stages by knocking out the receptor selectively in some of them (encoding, consolidation, retrieval). However, our tool irreversibly flips out Dop1R2 preventing us from investigating the receptor’s role in retrieval. Our results show that the receptor is dispensable for STM formation (Figure 2, Figure 4 Supplement 3), suggesting that it is not involved in encoding new information. On the other hand, it is instead involved in consolidation and/or retrieval of long-term and middle-term memories (Figure 3, Figure 4, Figure 5B).

      Overall, the authors generated a very useful tool to study dopamine neuromodulation in any given circuit when used in combination with the powerful genetic toolkit available in Drosophila. The reports in this paper confirmed a previously described role of Dop1R2 in the expression of aversive and appetitive LTM and mapped these effects to two specific types of memory neurons in the fly brain, previously implicated in the expression and consolidation of long-term associative memories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On the first view, the results shown here are different from studies published earlier, while in the same line with others (e.g. Sun et al, for appetitive 24h memories). For example, Berry et al showed that the loss of dop1R2 impairs immediate memory, while memory scores are enhanced 3h, 6h, and 24h after training. Further, they showed data that shock avoidance, at least for higher shock intensities, is reduced in mutant (damb) flies. All in all, this favors how important it is to improve the genetic tools for tissue-specific manipulation. Despite the authors nicely discussing their data with respect to the previous studies, I wondered whether it would be suitable to use the new tool and knock out dop1R2 panneuronally to see whether the obtained data match the results published by Berry et al.. Further, as stated in line 105ff: "As these studies used different learning assays - aversive and appetitive respectively as well as different methods, it is unclear if Dop1R2 has different functions for the different reinforcement stimulus" I wondered why the authors tested aversive and appetitive learning for STM and 2h memory, but only appetitive memory for 24h.

      Thank you for this comment. To that extent, as mentioned above in response to reviewer #2, we included in the results the aversive LTM experiment (Figure 4). Moreover, we performed experiments along the line of Berry et al. using our tool as shown in Figure 4 S1. Our results support that Dop1R2 is required for LTM, rather than to promote forgetting.

      (2) Line 165ff: I can´t find any of the supplementary data mentioned here. Please add the corresponding figures.

      Thank you for pointing this out. In that line we don’t refer to any supplementary data, but to the Figure 1F, showing the absence of the HA-tag in our MB knock-out line. We have clarified this in the text (lines 151-153)

      (3) I can't imagine that the scale bar in Figure 1D-F is correct. I would also like to suggest to show a more detailed analysis of the expression pattern. For example, both anterior and posterior views would be appropriate, perhaps including the VNC. This would allow the expression pattern obtained with this novel tool to be better compared with previously published results. Also, in relation to my comment above (1), it may help to understand the functional differences with previous studies, especially as the authors themselves state that the receptor is "mainly" expressed in the mushroom body (line 99). It would be interesting to see where else it is expressed (if so). This would also be interesting for the panneuronal knockdown experiment suggested under (1). If the receptor is indeed expressed outside the mushroom body, this may explain the differences to Berry et al.

      Thank you for noting this, there was indeed a mistake in the scale bar which we now fixed. Since with our HA-tag immunostaining we could not detect any noticeable signal outside of the MB, we decided to analyze previously existing single cell transcriptomics data that showed expression of the receptor in 7.99% of cells in the VNC and in 13.8% of cells outside the MB (lines 98-100) confirming its sparse expression in the nervous system. The lack of detection of these cells is likely due to the sparse and low expression of the protein. The HA-tag allows to detect the endogenous level of the locus (it is possible that a Gal4/UAS amplification of the signal might allow to detect these cells).

      Regarding the panneuronal knockout, we decided to try to replicate the experiment shown in Berry et al. in Figure 4 S1 and found that Dop1R2 is required for LTM.

      (4) Related to learning data shown in Figures 2-4, the authors should show statistical differences between all groups obtained in the ANOVA + PostHoc tests. Currently, only an asterisk is placed above the experimental group, which does not adequately reflect the statistical differences between the groups. In addition, I would like to suggest adding statistical tests to the chance level as it may be interesting to know whether, for example, scores of knockout flies in 3C and 3D are different from the chance level.

      Many thanks for this correction, we agree with the fact that the way significance scores were shown was not informative enough. We fixed the point by now showing significance between all the control groups and the experimental ones. We also inserted the chance level results in the figure legends.

      (5) Unfortunately, the manuscript has some typing errors, so I would like to ask the authors to check the manuscript again carefully.

      Some Examples:

      Line 31: the the

      Line 56: G-Protein

      Line 64: c-AMP

      Line 68: Dopamine

      Line 70: G-Protein (It alternates between G-protein and G-Protein)

      Line 76: References are formatted incorrectly

      Line 126: Ha-Tag (It alternates between Ha and HA)

      Line 248: missing space before the bracket...is often found

      Thank you for noticing these errors, we have now corrected the spelling throughout the manuscript.

      (6) In the figures the axes are labelled Preference Index (Pref"I"). In the methods, however, the calculation formula is defined as "PREF".

      We thank you for drawing attention to this. To avoid confusion, we changed the definition in the methods section so that it could be clear and coherent (“Memory tests” paragraph in the methods section).

      “PREF = ((N<sub>arm1</sub> - N<sub>arm2</sub>) 100) / N<sub>total</sub> the two preference indices were calculated from the two reciprocal experiments. The average of these two PREFs gives a learning index (LI). LI = (PREF<sub>1</sub> + PREF<sub>2</sub>) / 2.

      In case of all Long-term Aversive memory experiments, Y-Maze protocol was adapted to test flies 24 hours post training. Testing using the Y-Maze was done following the protocol as described in (Mohandasan et al., 2022) where flies were loaded at the bottom of 20-minutes odorized 3D-printed Y-Mazes from where they would climb up to a choice point and choose between the two odors. The learning index was then calculated after counting the flies in each odorized vial as follows: LI = ((N<sub>CS-</sub> - N<sub>CS+</sub>) 100) / N<sub>total</sub>. Where NCS- and NCS+ are the number of flies that were found trapped in the untrained and trained odor tube respectively.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figures 2 and 3, the legends running two different subfigures is confusing. Would be helpful to find a different way to present.

      Thank you for your suggestion. We modified how we present legends, placing them vertically so that it is clearer.

      (2) Use additional drivers to verify middle and long-term memory phenotypes.

      We agree that it would be interesting to see the role of Dop1R2 in other neurons. To that extent, we looked at long term aversive memory in flies where the receptor was panneuronaly flipped out, and did not find evidence that suggested involvement of Dop1R2 in memory processes outside the MB. (Figure 4 S1)

      (3) Additional discussion of genetic background for fly lines would be helpful.

      Thank you for your advice. We have mentioned the genetic background of flies in the key resources table of the methods sections. Additionally, we also included further explanation on how the lines were created and their genetic background (see “Fly Husbandry” paragraph in the methods section).

      “UAS-flp;;Dop1R2 cko flies and Gal4;Dop1R2<sup>cko</sup> flies were crossed back with ;;Dop<sup>cko</sup> flies to obtain appropriate genetic controls which were heterozygous for UAS and Gal4 but not Dop1R2<sup>cko</sup>.”

      Reviewer #3 (Recommendations For The Authors):

      Line 109 states that to resolve the problem a tool is developed to knock down Dop1R2 in s spatial and temporal specific manner- while I agree that this is within the potential of the tool, there is no temporal control of the flipase action in this study; at least I cannot find references to the use of target/gene switch to control stages of development or different memory phases. However the version available for download is missing supplementary information, so I did not have access to supplementary figures and tables.

      Thank you for the comment, as mentioned before it would be great to be able to dissect the memory phases. We show in lines 232 – 250 and Figure 4 S3 that the temporally restricted flip-out to the post-eclosion life stage gave us coherent results with the previous findings, ruling out potential developmental defects.

      In relation to my comment on the possible developmental effects of the loss of the gene, Figure 1F could showcase an underdeveloped g lobe when looking at the lobe profiles. I understand this is not within the scope of the figure, but maybe a different z projection can be provided to confirm there are no obvious anatomical alterations due to the loss of the receptor.

      We understand the doubt about the correct development of the MB and we thank you for your insightful comment. To that extent we decided to perform a FasII immunostaining that could show us the MB in the different lines (Figure 4 S2) and it appears that there are no notable differences in the lobes development in our knockout line.

      It seems that the obvious missing piece of the puzzle would be to address the effects of knocking out Dop1R2 in aversive LTM. The idea of systematically addressing different types of memory at different time points and in different KCs is the most attractive aspect of this study beyond the technical sophistication, and it feels that the aim of the study is not delivered without that component.

      We agree and we thank you for the clarification. As mentioned above in response to Reviewer #2, we decided to test aversive LTM as described in lines –208-228, Figure 4, Figure 4 S1.

      Some statements of the discussion seem too vague, and I think could benefit from editing:

      Line 284 "however other receptors could use Gq and mediate forgetting"- does this refer to other dopamine receptors? Other neuromodulators? Examples?

      Thank you for pointing this out. We Agree and therefore decided to omit this line.

      Line 289 "using a space training protocol and a Dop1R2 line" - this refers to RNAi lines, but it should be stated clearly.

      That is correct, we thank you for bringing attention to this and clarified it in the manuscript.

      –Lines 329-330

      “Interestingly, using a spaced training protocol and a Dop1R2 RNAi knockout line another study showed impaired LTM (Placais et al., 2017).”

      The paragraph starting in line 305 could be re-written to improve clarity and flow. Some statements seem disconnected and require specific citations. For example "In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways and the internal state of the animal...". This is not accurate. Berry et al 2012 report enhanced LTM performance in dop1R2 mutants whereas Plaçais et al 2017 report LTM defects in Dop1R2 knock-downs, but these different findings do not seem to rely on different internal states or signaling pathways. Maybe further elaboration can help the reader understand this speculation.

      We agree and we thank you for this advice. We decided to add additional details and citations to validate our speculation

      Lines 350-353

      “In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways. The signaling pathway that is activated further depends on the available pool of secondary messengers in the cell (Hermans, 2003) which may be regulated by the internal state of the animal.”

      "...for reward memory formation, loss of Dop1R2 seems to impair memory", this seems redundant at this point, as it has been discussed in detail, however, citations should be provided in any case (Musso 2015, Sun 2020)

      Thank you for noting this. We recognize the redundancy and decided to exclude the line.

      Finally, it would be useful to additionally refer to the anatomical terminology when introducing neuron names; for example MBON MVP2 (MBON-g1pedc>a/b), etc.

      Thank you for this suggestion. We understand the importance of anatomical terminologies for the neurons. Therefore, we included them when we introduce neurons in the paper.

      We thank you for your observations. We recognize their value, so we have made appropriate changes in the discussion to sound less vague and more comprehensive.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Using highly specific antibody reagents for biological research is of prime importance. In the past few years, novel approaches have been proposed to gain easier access to such reagents. This manuscript describes an important step forward toward the rapid and widespread isolation of antibody reagents. Via the refinement and improvement of previous approaches, the Perrimon lab describes a novel phage-displayed synthetic library for nanobody isolation. They used the library to isolate nanobodies targeting Drosophila secreted proteins. They used these nanobodies in immunostainings and immunoblottings, as well as in tissue immunostainings and live cell assays (by tethering the antigens on the cell surface).

      Since the library is made freely available, it will contribute to gaining access to better research reagents for non-profit use, an important step towards the democratisation of science.

      Strengths:

      (1) New design for a phage-displayed library of high content.

      (2) Isolation of valuble novel tools.

      (3) Detailed description of the methods such that they can be used by many other labs.

      We are grateful for these supportive comments.

      Weaknesses:

      My comments largely concentrate on the representation of the data in the different Figures.

      We have made adjustments according to the reviewer’s recommendations.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors propose an alternative platform for nanobody discovery using a phage-displayed synthetic library. The authors relied on DNA templates originally created by McMahon et al. (2018) to build the yeast-displayed synthetic library. To validate their platform, the authors screened for nanobodies against 8 Drosophila secreted proteins. Nanobody screening has been performed with phage-displayed nanobody libraries followed by an enzyme-linked immunosorbent assay (ELISA) to validate positive hits. Nanobodies with higher affinity have been tested for immunostaining and immunoblotting applications using Drosophila adult guts and hemolymph, respectively.

      Strengths:

      The authors presented a detailed protocol with various and complementary approaches to select nanobodies and test their application for immunostaining and immunoblotting experiments. Data are convincing and the manuscript is well-written, clear, and easy to read.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      On the eight Drosophila secreted proteins selected to screen for nanobodies, the authors failed to identify nanobodies for three of them. While the authors mentioned potential improvements of the protocol in the discussion, none of them have been tested in this manuscript.

      We prepared all eight antigens by single-step IgG purification (see Materials and Methods) without additional biophysical quality control (e.g., size-exclusion chromatography). Consequently, we cannot definitively determine whether the three “no-binder” cases resulted from the aggregation or misfolding of the antigens, versus gaps in our naive library’s sequence space. While approaches such as additional purification steps or affinity maturation of weak binders would likely rescue these difficult targets, comprehensive pipeline optimization is beyond the scope of establishing and validating the phage-displayed nanobody platform. We have clarified this limitation and suggested these strategies in third paragraph of the Discussion.

      The same comment applies to the experiments using membrane-tethered forms of the antigens to test the affinity of nanobodies identified by ELISA. Many nanobodies fail to recognize the antigens. While authors suggested a low affinity of these nanobodies for their antigens, this hypothesis has not been tested in the manuscript.

      We observed that several nanobodies with strong ELISA signals showed reduced binding to membrane-displayed antigens. This discrepancy may result from low affinity of the nanobodies or differences in post-translational modifications (e.g., glycosylation) and antigen context between secreted IgG-fusion proteins (used for panning/ELISA) and GPI- or mCD8-anchored proteins. In an ongoing work, we have performed affinity maturation of the nanobodies and successfully increased the affinity toward the target antigen. These results will be reported separately.

      Improving the protocol at each step for nanobody selection would greatly increase the success rate for the discovery of nanobodies with high affinity.

      We fully agree that systematic optimization—from antigen preparation (e.g., additional purification steps) through screening conditions (e.g., buffer composition, additional affinity-maturation steps)—could substantially increase the success rate and nanobody affinity. These represent important directions for future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 3. The merge of two GFP channels does not make much sense. Can the authors not use artificial colours? And show the panels at higher resolution, such that a viewer can really see and judge what they are seeing? The same comments apply to all Supplementary Figures.

      We appreciate the reviewer’s comment. In the revised Figure 3, we have replaced the cyan/green overlay with red/green overlay and used enlarged pictures so that GFP-positive cells and corresponding nanobody staining are clearly visible. We applied the same layout to all relevant Supplementary Figures.

      (2) Figure 4. Also, in this Figure, it is not really possible to see what the authors say one should see. The resolution should be higher, and arrows or arrowheads should point to important structures.

      We appreciate the reviewer’s comment. In the revised Figure 4A, we have added arrows to point to the immunostaining signal in cells with smaller nuclei and added inset panels to show a closer view of representative NbMip-4G staining.

      Reviewer #2 (Recommendations for the authors):

      (1) Images are sometimes quite small and difficult to interpret. For example, Figures S2C-D.

      We thank the reviewer for this suggestion. In the revised figures, we have replaced the cyan/green overlay with red/green overlay and used enlarged pictures that clearly show GFP-positive cells alongside their corresponding nanobody staining.

      (2) Supplemental figures are not always cited in the text.

      Thank you for the comment. To eliminate this misunderstanding, we have updated the Nesfatin1 nanobody screen data as Supplementary Figure 1 and Mip nanobody screen data as Supplementary Figure 2. We have made the corresponding changes in the Results section.

    1. Author response:

      We were delighted by the reviewers' general comments. We thank the reviewers for their thoughtful reviews, constructive criticism, and analysis suggestions. We have carefully addressed each of their points during the revision of the manuscript.

      Unfortunately, after the paper was submitted to eLife, the first author, who ran all the analyses, left academia. We now realized that we currently do not have sufficient resources to perform all additional analyses as requested by the reviewers.

      The following is the authors’ response to the original reviews:

      Public Reviews:

      Reviewer #1 (Public Review):

      This study uses MEG to test for a neural signature of the trial history effect known as 'serial dependence.' This is a behavioral phenomenon whereby stimuli are judged to be more similar than they really are, in feature space, to stimuli that were relevant in the recent past (i.e., the preceding trials). This attractive bias is prevalent across stimulus classes and modalities, but a neural source has been elusive. This topic has generated great interest in recent years, and I believe this study makes a unique contribution to the field. The paper is overall clear and compelling, and makes effective use of data visualizations to illustrate the findings. Below, I list several points where I believe further detail would be important to interpreting the results. I also make suggestions for additional analyses that I believe would enrich understanding but are inessential to the main conclusions.

      (1) In the introduction, I think the study motivation could be strengthened, to clarify the importance of identifying a neural signature here. It is clear that previous studies have focused mainly on behavior, and that the handful of neuroscience investigations have found only indirect signatures. But what would the type of signature being sought here tell us? How would it advance understanding of the underlying processes, the function of serial dependence, or the theoretical debates around the phenomenon?

      Thank you for pointing this out. Our MEG study was designed to address two questions: 1) we asked whether we could observe a direct neural signature of serial dependence, and 2) if so, whether this signature occurs at the encoding or post-encoding stage of stimulus processing in working memory. This second question directly concerns the current theoretical debate on serial dependence.

      Previous studies have found only indirect signatures of serial dependence such as reactivations of information from the previous trial or signatures of a repulsive bias, which were in contrast to the attractive bias in behavior. Thus, it remained unclear whether an attractive neural bias can be observed as a direct reflection of the behavioral bias. Moreover, previous studies observed the neuronal repulsion during early visual processes, leading to the proposal that neural signals become attracted only during later, post-encoding processes. However, these later processing stages were not directly accessible in previous studies. To address these two questions, we combined MEG recordings with an experimental paradigm with two items and a retro-cue. This design allowed to record neural signals during separable encoding and post-encoding task phases and so to pinpoint the task phase at which a direct neural signature of serial dependence occurred that mirrored the behavioral effect.

      We have slightly modified the Introduction to strengthen the study motivation.

      (1a) As one specific point of clarification, on p. 5, lines 91-92, a previous study (St. JohnSaaltink et al.) is described as part of the current study motivation, stating that "as the current and previous orientations were either identical or orthogonal to each other, it remained unclear whether this neural bias reflected an attraction or repulsion in relation to the past." I think this statement could be more explicit as to why/how these previous findings are ambiguous. The St. John-Saaltink study stands as one of very few that may be considered to show evidence of an early attractive effect in neural activity, so it would help to clarify what sort of advance the current study represents beyond that.

      Thank you for this comment. In the study by St. John-Saaltink et al. (2016), two gratings oriented at 45° and 135° were always presented to either the left or right side of a central fixation point in a trial (90° orientation difference). As only the left/right position of the 45° and 135° gratings varied across trials, the target stimulus in the current trial was either the same or differed by exactly 90° from the previous trial. In consequence, this study could not distinguish whether the observed bias was attractive or repulsive, which concerned both the behavioral effect and the V1 signal. Furthermore, the bias in the V1 signal was partially explained by the orientation that was presented at the same position in the previous trial, which could reflect a reactivation of the previous orientation rather than an actual altered orientation.

      We have changed the Introduction accordingly.

      References:

      St. John-Saaltink E, Kok P, Lau HC, de Lange FP (2016) Serial Dependence in Perceptual Decisions Is Reflected in Ac6vity Pa9erns in Primary Visual Cortex. Journal of Neuroscience 36: 6186–6192.

      (1b) The study motivation might also consider the findings of Ranieri et al (2022, J. Neurosci) Fornaciai, Togoli, & Bueti (2023, J. Neurosci), and Lou& Collins (2023, J. Neurosci) who all test various neural signatures of serial dependence.

      Thank you. As all listed findings showed neural signatures revealing a reactivation of the previous stimulus or a response during the current trial, we have added them to the paragraph in the Introduction referring to this class of evidence for the neural basis for serial dependence.

      (2) Regarding the methods and results, it would help if the initial description of the reconstruction approach, in the main text, gave more context about what data is going into reconstruction (e.g., which sensors), a more conceptual overview of what the 'reconstruction' entails, and what the fidelity metric indexes. To me, all of that is important to interpreting the figures and results. For instance, when I first read, it was unclear to me what it meant to "reconstruct the direction of S1 during the S2 epoch" (p. 10, line 199)? As in, I couldn't tell how the data/model knows which item it is reconstructing, as opposed to just reporting whatever directional information is present in the signal.

      (2a) Relatedly, what does "reconstruction strength" reflect in Figure 2a? Is this different than the fidelity metric? Does fidelity reflect the strength of the particular relevant direction, or does it just mean that there is a high level of any direction information in the signal? In the main text explain what reconstruction strength and what fidelity is?

      Thank you for pointing this out. We applied the inverted encoding model method to MEG data from all active sensors (271) within defined time-windows of 100 ms length. MEG data was recorded in two sessions on different days. Specifically, we constructed an encoding model with 18 motion direction-selective channels. Each channel was designed to show peak sensitivity to a specific motion direction, with gradually decreasing sensitivity to less similar directions. In a training step, the encoding model was fiCed to the MEG data of one session to obtain a weight matrix that indicates how well the sensor activity can be explained by the modeled direction. In the testing step, the weight matrix was inverted and applied to the MEG data of the other session, resulting in a response profile of ‘reconstruction strengths’, i.e., how strongly each motion direction was present in a trial. When a specific motion direction was present in the MEG signal, the reconstruction strengths peaked at that specific direction and decreased with increasing direction difference. If no information was present, reconstruction strengths were comparable across all modeled directions, i.e., the response profile was flat. To integrate response profiles across trials, single trial profiles were aligned to a common center direction (i.e., 180°) and then averaged.

      To quantify the accuracy of each IEM reconstruction, i.e., how well the response profile represents a specific motion direction relative to all other directions we computed the ‘reconstruction fidelity’. Fidelity was obtained by projecting the polar vector of the reconstruction at every direction angle (in steps of 1°) onto the common center (180°) and averaging across all direction angles (Rademaker et al 2019, Sprague, Ester & Serences, 2016). As such, ‘reconstruction fidelity’ is a summary metric with fidelity greater than zero indicating an accurate reconstruction.

      How does the model know which direction to reconstruct? Our modelling procedure was informed about the stimulus in question during both the training and the testing step. Specifically, we informed our model during the training step about e.g., the current S2. Then, we fit the model to training data from the S2 epoch and applied it to testing data from the S2 epoch. Crucially, during the testing step the motion direction in question, i.e., current S2, becomes relevant again. For example, when S2 was 120°, the reconstructions were shifted by 60° in order to align with the common center, i.e., 180°. In addition, we also tested whether we could reconstruct the motion direction of S1 during the S2 epoch. Here, we used again the MEG data from the S2 epoch but now for S1 training. i.e., the model was informed about S1 direction. Accordingly, the recentering step during testing was done with regard to the S1 direction. Similarly, we also reconstructed the motion direction of the previous target (i.e., the previous S1 or S2), e.g., during the S2 epoch.

      Together, the multi-variate pattern of MEG activity across all sensors during the S2 epoch could contain information about the currently presented direction of S2, the direction of the preceding S1 and the direction of the target stimulus from the previous trial (i.e., either previous S1 or previous S2) at the same time. An important exception from this regime was the cross-reconstruction analysis (Appendix 1—figure 2). Here we trained the encoding model on the currently relevant item (S1 during the S1 epoch, S2 during the S2 epoch and the cued item during the retro-cue epoch) of one MEG session and reconstructed the previous target on the other MEG session.

      Finally, to examine shifts of the neural representation, single-trial reconstructions were assigned to two groups, those with a previous target that was oriented clockwise (CW) in relation to the currently relevant item and those with a previous target that was oriented counter-clockwise (CCW). The CCW reconstructions were flipped along the direction space, hence, a negative deviation of the maximum of the reconstruction from 180° indicated an attraction toward the previous target, whereas a positive deviation indicated a repulsion. Those reconstructions were then first averaged within each possible motion direction and then across them to account for different presentation numbers of the directions, resulting in one reconstruction per participant, epoch and time point. To examine systematic shifts, we then tested if the maximum of the reconstruction was systematically different from the common center (180°). For display purposes, we subtracted the reconstructed maximum from 180° to compute the direction shifts. A positive shift thus reflected attraction and a negative shift reflected repulsion.

      We have updated the Results accordingly.

      References:

      Rademaker RL, Chunharas C, Serences JT (2019) Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience. 22: 1336-1344.

      Sprague TC, Ester EF, Serences JT (2016) Restoring Latent Visual Working Memory Representations in Human Cortex. Neuron. 91: 694-707

      (3) Then in the Methods, it would help to provide further detail still about the IEM training/testing procedure. For instance, it's not entirely clear to me whether all the analyses use the same model (i.e., all trained on stimulus encoding) or whether each epoch and timepoint is trained on the corresponding epoch and timepoint from the other session. This speaks to whether the reconstructions reflect a shared stimulus code across different conditions vs. that stimulus information about various previous and current trial items can be extracted if the model is tailored accordingly.

      As reported above, our modeling procedure was informed about same stimulus during both the training and the testing step, except for the cross-reconstruction analysis.

      Regarding the training and testing data, the model was always trained on data from one session and tested on data from the other session, so that each MEG session once served as the training data set and once as the test data set, hence, training and test data were independent. Importantly, training and testing was always performed in an epoch- and time point-specific way: For example, the model that was trained on the first 100-ms time bin from the S1 epoch of the first MEG session was tested on the first 100-ms time bin from the S1 epoch of the second MEG session.

      Specifically, when you say "aim of the reconstruction" (p. 31, line 699), does that simply mean the reconstruction was centered in that direction (that the same data would go into reconstructing S1 or S2 in a given epoch, and what would differentiate between them is whether the reconstruction was centered to the S1 or S2 direction value)?

      As reported above, during testing the reconstruction was centered at the currently relevant direction. The encoding model was trained with the direction labels of S1, S2 or the target item, corresponding to the currently relevant direction, i.e., S1 in S1 epochs, S2 in S2 epochs and target item (S1 or S2) in the retro-cue epoch. The only exception was the reconstruction of S1 during the S2 epoch. Here the encoding model was trained on the S1 direction, but with data from the S2 epoch and then applied to the S2 epoch data and recentered to the S1 direction. So here, S1 and S2 were indeed trained and tested separately for the same epoch.

      (4) I think training and testing were done separately for each epoch and timepoint, but this could have important implications for interpreting the results. Namely if the models are trained and tested on different time points, and reference directions, then some will be inherently noisier than others (e.g., delay period more so than encoding), and potentially more (or differently) susceptible to bias. For instance, the S1 and S2 epochs show no attractive bias, but they may also be based on more high-fidelity training sets (i.e., encoding), and therefore less susceptible to the bias that is evident in the retrocue epoch.

      Thanks for pointing this out. Training and testing were performed in an epoch- and time point-specific way. Thus, potential differences in the signal-to-noise ratio between different task phases could cause quality differences between the corresponding reconstructed MEG signals. However, we did not observe such differences. Instead, we found comparable time courses of the reconstruction fidelities and the averaged reconstruction strengths between epochs (Figure 2b and 2c, respectively). Fig. 2b, e.g., shows that reconstruction fidelity for motion direction stimuli built up slowly during the stimulus presentation, reaching its maximum only after stimulus offset. This observation may contrast to different stimulus materials with faster build-ups, like the orientation of a Gabor.

      We agree with the reviewer that, regardless of the comparable but not perfectly equal reconstruction fidelities, there are good arguments to assume that the neural representation of the stimulus during its encoding is typically less noisy than during its post-encoding processing and that this difference could be one of the reasons why serial dependence emerged in our study only during the retro-cue epoch. However, the argument could also be reversed: a biased representation, which represents a small and hard-to-detect neural effect, might be easier to observe for less noisy data. So, the fact that we found a significant bias only during the potentially “noisier” retro-cue epoch makes the effect even more noteworthy.

      We mentioned the limitation related to our stimulus material already at the end of the Discussion. We have now added a new paragraph to the Discussion to address the two opposing lines of reasoning.  

      (4) I believe the work would benefit from a further effort to reconcile these results with previous findings (i.e., those that showed repulsion, like Sheehan & Serences), potentially through additional analyses. The discussion attributes the difference in findings to the "combination of a retro-cue paradigm with the high temporal resolution of MEG," but it's unclear how that explains why various others observed repulsion (thought to happen quite early) that is not seen at any stage here. In my view, the temporal (as well as spatial) resolution of MEG could be further exploited here to better capture the early vs. late stages of processing. For instance, by separately examining earlier vs. later time points (instead of averaging across all of them), or by identifying and analyzing data in the sensors that might capture early vs. late stages of processing. Indeed, the S1 and S2 reconstructions show subtle repulsion, which might be magnified at earlier time points but then shift (toward attraction) at later time points, thereby counteracting any effect. Likewise, the S1 reconstruction becomes biased during the S2 epoch, consistent with previous observations that the SD effects grow across a WM delay. Maybe both S1 and S2 would show an attractive bias emerging during the later (delay) portion of their corresponding epoch? As is, the data nicely show that an attractive bias can be detected in the retrocue period activity, but they could still yield further specificity about when and where that bias emerges.

      We are grateful for this suggestion. Before going into detail, we would like to explain our motivation for choosing the present analysis approach that included averaging time points within an epoch of interest.

      Our aim was to detect a neuronal signature of serial dependence which is manifested as an attractive shift of about 3.5° degrees within the 360° direction space. To be able to detect such a small effect in the neural data and given the limited resolution of the reconstruction method and the noisy MEG signals, we needed to maximize the signal-to-noise ratio. A common method to obtain this is by averaging data points. In our study we asked subjects to perform 1022 trials, down-sampled the MEG data from the recorded sampling rate of 1200 Hz to 10 Hz (one data point per 100 ms) that we used for the estimation of reconstruction fidelity and calculated the final neural shift estimates by averaging time points that showed a robust reconstruction fidelity, thus representing interpretable data points.

      Our procedure to maximize the signal-to-noise ratio was successful as we were able to reliably reconstruct the presented and remembered motion direction in all epochs (Figure 1a and 1b in the manuscript). However, the reconstruction did not work equally well for all time points within each epoch. In particular, there were time points with a non-significant reconstruction fidelity. In consequence, for the much smaller neural shift effect we did not expect to observe reliable time-resolved results, i.e., when considering each time point separately. Instead, we used the reconstruction results to define the time window in order to calculate the neural shift, i.e., we averaged across all time points with a significant reconstruction fidelity.

      Author response image 1 depicts the neural shift separately for each time point during the retro-cue epoch. Importantly, the gray parts of the time courses indicate time points where the reconstruction of the presented or cued stimulus was not significant. This means that the reconstructed maxima at those time points were very variable/unreliable and therefore the neural shifts were hardly interpretable.

      Author response image 1.

      Time courses of the reconstruction shift reveal a tendency for an attractive bias during the retrocue phase. Time courses of the neural shift separately for each time point during the S1 (left panel), S2 (middle panel) and retro-cue epochs (right panel). Gray lines indicate time points with non-significant reconstruction fidelities and therefore very variable and non-interpretable neural reconstruction shifts. The colored parts of the lines correspond to the time periods of significant reconstruction fidelities with interpretable reconstruction shifts. Error bars indicate the middle 95% of the resampling distribution. Time points with less than 5% (equaling p < .05) of the resampling distribution below 0° are indicated by a colored circle. N = 10.

      First, the time courses in the Author response image 1 show that the neural bias varied considerably between subjects, as revealed by the resampling distributions, at given time points. In this resampling procedure, we drew 10 participants in 10.000 iterations with replacement and calculated the reconstruction shift based on the mean reconstruction of the resampled participants. The observed variability stresses the necessity to average the values across all time points that showed a significant reconstruction fidelity to increase the signal-to-noise ratio.

      Second, despite this high variability/low signal-to-noise ratio, Author response image 1 (right panel) shows that our choice for this procedure was sensible as it revealed a clear tendency of an attractive shift at almost all time points between 300 through 1500 ms after retro-cue onset with only a few individual time-points showing a significant effect (uncorrected for multiple comparisons). It is worth to mention that this time course did not overlap with the time course of previous target cross-reconstruction (Appendix 1—figure 2, right panel), as there was no significant target cross-reconstruction during the retro-cue epoch with an almost flat profile around zero. Also, there was no overlap with previous target decoding in the retro-cue epoch (Figure 5 in the manuscript). Here, the previous target was reactivated significantly only at early time points of 200 and 300 ms post cue onset (i.e., at time points with a non-significant reconstruction fidelity and therefore no interpretable neural shift), while the nominally highest values of the attractive neural shift were visible at later time points that also showed a significant reconstruction fidelity (Figure 2b in the manuscript).

      Third, Author response image 1 (left and middle panel) shows the time courses of the neural shift during the S1 and S2 epochs. While no neural shift could be observed for S1, during the S2 epoch the time-resolved analysis indicated an initial attractive shift followed by a (nonsignificant) tendency for a repulsive shift. After averaging neural shifts across time points with a significant reconstruction fidelity, there was no significant effect with an overall tendency for repulsion, as reported in the paper. The attractive part of the neural shift during the S2 epoch was nominally strongest at very early time points (at 100-300 ms after S2 onset) and overlapped perfectly with the reactivation of the previous target as shown by the cross-reconstruction analysis (Appendix 1—figure 2, middle panel). This overlap suggests that the neural attractive shift did not reflect an actual bias of the early S2 representation, but rather a consequence of the concurrent reactivation of the previous target in the same neural code as the current representation. Finally, this neural attractive shift during S2 presentation did not correlate with the behavioral error (single trial-wise correlation: no significant time points during S2 epoch) or the behavioral bias (subject-wise correlation). In contrast, for the retro-cue epoch, we observed a significant correlation between the neural attractive shift and behavior.

      Together, the time-resolved results show a clear tendency for an attractive neural bias during the retro-cue phase, thus supporting our interpretation that the attractive shift during the retro-cue phase reflects a direct neuronal signature of serial dependence. However, these additional analyses also demonstrated a large variability between participants and across time points, warranting a cautious interpretation. We conclude that our initial approach of averaging across time points was an appropriate way of reducing the high level of noise in the data and revealed the reported significant and robust attractive neural shift in the retrocue phase.

      (5) A few other potentially interesting (but inessential considerations): A benchmark property of serial dependence is its feature-specificity, in that the attractive bias occurs only between current and previous stimuli that are within a certain range of similarity to each other in feature space. I would be very curious to see if the neural reconstructions manifest this principle - for instance, if one were to plot the trialwise reconstruction deviation from 0, across the full space of current-previous trial distances, as in the behavioral data. Likewise, something that is not captured by the DoG fivng approach, but which this dataset may be in a position to inform, is the commonly observed (but little understood) repulsive effect that appears when current and previous stimuli are quite distinct from each other. As in, Figure 1b shows an attractive bias for direction differences around 30 degrees, but a repulsive one for differences around 170 degrees - is there a corresponding neural signature for this component of the behavior?

      We appreciate the reviewer's idea to split the data. However, given that our results strongly relied on the inclusion of all data points, i.e., including all distances in motion direction between the current S1, S2 or target and the previous target and requiring data averaging, we are concerned that our study was vastly underpowered to be able to inform whether the attractive bias occurs only within a certain range of inter-stimulus similarity. To address this important question, future studies would require neural measurements with much higher signal-to-noise-ratio than the present MEG recordings with two sessions per participant and 1022 trials in total.

      Reviewer #2 (Public Review):

      Summary:

      The study aims to probe the neural correlates of visual serial dependence - the phenomenon that estimates of a visual feature (here motion direction) are attracted towards the recent history of encoded and reported stimuli. The authors utilize an established retro-cue working memory task together with magnetoencephalography, which allows to probe neural representations of motion direction during encoding and retrieval (retro-cue) periods of each trial. The main finding is that neural representations of motion direction are not systematically biased during the encoding of motion stimuli, but are attracted towards the motion direction of the previous trial's target during the retrieval (retro-cue period), just prior to the behavioral response. By demonstrating a neural signature of attractive biases in working memory representations, which align with attractive behavioral biases, this study highlights the importance of post-encoding memory processes in visual serial dependence.

      Strengths:

      The main strength of the study is its elegant use of a retro-cue working memory task together with high temporal resolution MEG, enabling to probe neural representations related to stimulus encoding and working memory. The behavioral task elicits robust behavioral serial dependence and replicates previous behavioral findings by the same research group. The careful neural decoding analysis benefits from a large number of trials per participant, considering the slow-paced nature of the working memory paradigm. This is crucial in a paradigm with considerable trial-by-trial behavioral variability (serial dependence biases are typically small, relative to the overall variability in response errors). While the current study is broadly consistent with previous studies showing that attractive biases in neural responses are absent during stimulus encoding (previous studies reported repulsive biases), to my knowledge it is the first study showing attractive biases in current stimulus representations during working memory. The study also connects to previous literature showing reactivations of previous stimulus representations, although the link between reactivations and biases remains somewhat vague in the current manuscript. Together, the study reveals an interesting avenue for future studies investigating the neural basis of visual serial dependence.

      Weaknesses:

      (1) The main weakness of the current manuscript is that the authors could have done more analyses to address the concern that their neural decoding results are driven by signals related to eye movements. The authors show that participants' gaze position systematically depended on the current stimuli's motion directions, which together with previous studies on eye movement-related confounds in neural decoding justifies such a concern. The authors seek to rule out this confound by showing that the consistency of stimulus-dependent gaze position does not correlate with (a) the neural reconstruction fidelity and (b) the repulsive shift in reconstructed motion direction. However, both of these controls do not directly address the concern. If I understand correctly the metric quantifying the consistency of stimulus-dependent gaze position (Figure S3a) only considers gaze angle and not gaze amplitude. Furthermore, it does not consider gaze position as a function of continuous motion direction, but instead treats motion directions as categorical variables. Therefore, assuming an eye movement confound, it is unclear whether the gaze consistency metric should strongly correlate with neural reconstruction fidelity, or whether there are other features of eye movements (e.g., amplitude differences across participants, and tuning of gaze in the continuous space of motion directions) which would impact the relationship with neural decoding. Moreover, it is unclear whether the consistency metric, which does not consider history dependencies in eye movements, should correlate with attractive history biases in neural decoding. It would be more straightforward if the authors would attempt to (a) directly decode stimulus motion direction from x-y gaze coordinates and relate this decoding performance to neural reconstruction fidelity, and (b) investigate whether gaze coordinates themselves are history-dependent and are attracted to the average gaze position associated with the previous trials' target stimulus. If the authors could show that (b) is not the case, I would be much more convinced that their main finding is not driven by eye movement confounds.

      The reviewer is correct that our eye-movement analysis approach considered gaze angle (direction) and not gaze amplitude. We considered gaze direction to be the more important feature to control for when investigating the neural basis of serial dependence that manifests, given the stimulus material used in our study, as a shift/deviation of angle/direction of a representation towards the previous target motion direction. To directly relate gaze direction and MEG data to each other we equaled the temporal resolution of the eye tracking data to match that of the MEG data. Specifically, our analysis procedure of gaze direction provided a measure indicating to which extent the variance of the gaze directions was reduced compared with random gaze direction patterns, in relation to the specific stimulus direction within each 100 ms time bin. Importantly, this procedure was able to reveal not only systematic gaze directions that were in accordance with the stimulus direction or the opposite direction, but also picked up all stimulus-related gaze directions, even if the relation differed across participants or time.

      Our analysis approach was highly sensitive to detect stimulus-related gaze directions during all task phases (Appendix 1—figure 3). As expected, we found systematic gaze directions when S1 and S2 were presented on the screen, and they were reduced thereafter, indicating a clear relationship between stimulus presentation and eye movement. Systematic gaze directions were also present in the retro-cue phase where no motion direction was presented. Here they showed a clearly different temporal dynamic as compared to the S1 and S2 phases. They appeared at later time points and with a higher variability between participants, indicating that they coincided with retrieving the target motion direction from working memory.

      To relate gaze directions with MEG results, we calculated Spearman rank correlations. We found that there was no systematic relationship at any time point between the stimulus related reconstruction fidelity and the amount of stimulus-related gaze direction. Even more, the correlation varied strongly from time point to time point revealing its random nature. In addition to the lack of significant correlations, we observed clearly distinct temporal profiles for gaze direction (Appendix 1—figure 3a and Appendix 1—figure 3b) and the reconstruction fidelities (Figure 2b in the manuscript, Appendix 1—figure 3c), in particular in the critical retro-cue phase.

      We favored this analysis approach over one that directly decoded stimulus motion direction from x-y gaze coordinates, as we considered it hardly feasible to compute an inverted encoding model with only two eye-tracker channels as an input (in comparison to 271 MEG sensors), and to our knowledge, this has not been done before. Other decoding methods have previously been applied to x-y gaze coordinates. However, in contrast to the inverted encoding model, they did not provide a measure of the representation shift which would be crucial for our investigation of serial dependence.

      We appreciate the suggestion to conduct additional analyses on eye tracking data (including different temporal and spatial resolution and different features) and their relation to MEG data. However, the first author, who ran all the analyses, has in the meantime left academia. Unfortunately, we currently do not have sufficient resources to perform additional analyses.

      While the presented eye movement control analysis makes us confident that our MEG finding was not crucially driven by stimulus-related gaze directions, we agree with the reviewer that we cannot completely exclude that other eye movement-related features could have contributed to our MEG findings. However, we would like to stress that whatever that main source for the observed MEG effect was (shift of the neuronal stimulus representation, (other) features of gaze movement, or shift of the neuronal stimulus representation that leads to systematic gaze movement), our study still provided clear evidence that serial dependence emerged at a later post-encoding stage of object processing in working memory. This central finding of our study is hard to observe with behavioral measures alone and is not affected by the possible effects of eye movements.

      We have slightly modified our conclusion in the Results and Appendix 1. Please see also our response to comment 1 from reviewer 3.

      (2) I am not convinced by the across-participant correlation between attractive biases in neural representations and attractive behavioral biases in estimation reports. One would expect a correlation with the behavioral bias amplitude, which is not borne out. Instead, there is a correlation with behavioral bias width, but no explanation of how bias width should relate to the bias in neural representations. The authors could be more explicit in their arguments about how these metrics would be functionally related, and why there is no correlation with behavioral bias amplitude.

      We are grateful for this suggestion. We correlated the individual neuronal shift with the two individual parameter fits of the behavior shift, i.e., amplitude (a) and tuning width (w). We found a significant correlation between the individual neural bias and the w parameter (r = .70, p = .0246) but not with the a parameter (r = -.35, p = .3258) during the retro-cue period (Appendix 1—figure 1). This indicates that a broader tuning width of the individual bias (as reflected by a smaller w parameter) was associated with a stronger individual neural attraction.

      It is important to note that for the calculation of the neural shift, all trials entered the analysis to increase the signal-to-noise ratio, i.e., it included many trials where current and previous targets were separated by, e.g., 100° or more. These trials were unlikely to produce serial dependence. Subjects with a more broadly tuned serial dependence had more interitem differences that showed a behavioral attraction and therefore more trials affected by serial dependence that entered the calculation of the neural shift. In contrast, individual differences in the amplitude (a) parameter were most likely too small, and higher individual amplitude did not involve more trials as compared to smaller amplitude to affect the neural bias in a way to be observed in a significant correlation.

      We have added this explanation to Appendix 1.  

      (3) The sample size (n = 10) is definitely at the lower end of sample sizes in this field. The authors collected two sessions per participant, which partly alleviates the concern. However, given that serial dependencies can be very variable across participants, I believe that future studies should aim for larger sample sizes.

      We want to express our appreciation for raising this issue. We apologize that we did not explicitly explain and justifythe choice for the sample size used in our paper, in particular, as we had in fact performed a formal a-priori power analysis.

      At the time of the sample size calculation, there were no comparable EEG or MEG studies to inform our power calculation. Thus, we based our calculation merely on the behavioral effect reported in the literature and, in particular, observed in a behavioral study from our lab that included four different experiments with overall more than 100 participants with 1632 trials each (see Fischer et al., 2020), in which the behavioral serial dependence effect (target vs. nontarget) was very robust. Based on the contrast between target and non-target with an effect size of 1.359 in Experiment 1, a power analysis with 80% desired power led to a small, estimated sample size of 6 subjects.

      However, we expected that the detection of the neural signature of this effect would require more participants. Therefore, we based our power calculation on a much smaller behavioral effect, i.e. the modulation of serial dependence by the context-feature congruency that we observed in our previous study (Fischer et al., 2020). In particular, we focused on Experiment 1 of the previous study that used color as the feature for retro-cueing, as we planned to use exactly the same paradigm for the MEG study. In contrast to the serial dependence effect, its modulation by color resulted in a more conservative power estimate: Based on an effect size of 0.856 in that experiment, a sample size of n = 10 should yield a power of 80% with two MEG sessions per subject.

      At the time when we conducted our study, two other studies were published that investigated serial dependence on the neural level. Both studies included a smaller number of data points than our study: Sheehan & Serences (2022) recorded about 840 trials in each of 6 participants, resulting in fewer data points both on the participant and on the trial level. Hajonides et al. (2023) measured 20 participants with 400 trials each, again resulting in fewer datapoints than our study (10 participants with 1022 trials each). Taken together, our a-priori sample size estimation resulted in comparable if not higher power as compared to other similar studies, making us feel confident that the estimated sample was sufficient to yield reliable results.

      We have now included this description and the results of this power analysis in the Materials and Methods section.

      Despite this, we fully agree with the reviewer that our study would profit from higher power. With the knowledge of the results from this study, future projects should attempt to increase substantially the signal-to-noise-ratio by increasing the number of trials in particular, in order to observe, e.g., robust time-resolved effects (see our comments to review 1).

      References:

      Fischer C, Czoschke S, Peters B, Rahm B, Kaiser J, Bledowski C (2020) Context information supports serial dependence of multiple visual objects across memory episodes. Nature Communication 11: 1932.

      Sheehan TC, Serences JT (2022) Attractive serial dependence overcomes repulsive neuronal adaptation PLOS Biology 20: e3001711.

      Hajonides JE, Van Ede F, Stokes MG, Nobre AC, Myers NE (2023) Multiple and Dissociable Effects of Sensory History on Working-Memory Performance Journal of Neuroscience 43: 2730–2740.

      (4) It would have been great to see an analysis in source space. As the authors mention in their introduction, different brain areas, such as PPC, mPFC, and dlPFC have been implicated in serial biases. This begs the question of which brain areas contribute to the serial dependencies observed in the current study. For instance, it would be interesting to see whether attractive shifts in current representations and pre-stimulus reactivations of previous stimuli are evident in the same or different brain areas.

      We appreciate this suggestion. As mentioned above, we currently do not have sufficient resources to perform a MEG source analysis.

      Reviewer #3 (Public Review):

      Summary:

      This study identifies the neural source of serial dependence in visual working memory, i.e., the phenomenon that recall from visual working memory is biased towards recently remembered but currently irrelevant stimuli. Whether this bias has a perceptual or postperceptual origin has been debated for years - the distinction is important because of its implications for the neural mechanism and ecological purpose of serial dependence. However, this is the first study to provide solid evidence based on human neuroimaging that identifies a post-perceptual memory maintenance stage as the source of the bias. The authors used multivariate pattern analysis of magnetoencephalography (MEG) data while observers remembered the direction of two moving dot stimuli. After one of the two stimuli was cued for recall, decoding of the cued motion direction re-emerged, but with a bias towards the motion direction cued on the previous trial. By contrast, decoding of the stimuli during the perceptual stage was not biased.

      Strengths:

      The strengths of the paper are its design, which uses a retrospective cue to clearly distinguish the perceptual/encoding stage from the post-perceptual/maintenance stage, and the rigour of the careful and well-powered analysis. The study benefits from high within participant power through the use of sensitive MEG recordings (compared to the more common EEG), and the decoding and neural bias analysis are done with care and sophistication, with appropriate controls to rule out confounds.

      Weaknesses:

      A minor weakness of the study is the remaining (but slight) possibility of an eye movement confound. A control analysis shows that participants make systematic eye movements that are aligned with the remembered motion direction during both the encoding and maintenance phases of the task. The authors go some way to show that this eye gaze bias seems unrelated to the decoding of MEG data, but in my opinion do not rule it out conclusively. They merely show that the strengths of the gaze bias and the strength of MEGbased decoding/neural bias are uncorrelated across the 10 participants. Therefore, this argument seems to rest on a null result from an underpowered analysis.

      Our MEG as well eye-movement analysis showed that they were sensitive to pick up robustly stimulus-related effects, both for presented and remembered motion directions. When relating both signals to each other by correlating MEG reconstruction strength with gaze direction, we found a null effect, as pointed out by the reviewer. Importantly, there was also a null effect when the shift of the reconstruction (representing our main finding) was correlated with gaze direction. Furthermore, an examination of the individual time courses of gaze direction and individual MEG reconstruction strength revealed that the lack of a relationship between MEG and gaze data did not rest on a singular observation but was present across all time points. Even more, the temporal profile of the correlation varied strongly from time point to time point revealing its random nature and indicating that there was no hint of a pattern that just failed to reach significance. Taking these observations together, our MEG findings were unlikely to be explained by eye position.

      Nevertheless, we agree with the reviewer that there is general problem of interpreting a null effect with a limited number of observations (and an analysis approach that focused on one out of many possible features of the gaze movement). Thus, we admit that there is a (slight) possibility that eye movements contributed to the observed MEG effects. This possibility, however, did not affect our novel finding that serial dependence occurred during the postencoding stage of object processing in working memory.

      Please see also our response to point 1 from reviewer 2.

      Impact:

      This important study contributes to the debate on serial dependence with solid evidence that biased neural representations emerge only at a relatively late post-perceptual stage, in contrast to previous behavioural studies. This finding is of broad relevance to the study of working memory, perception, and decision-making by providing key experimental evidence favouring one class of computational models of how stimulus history affects the processing of the current environment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      The significance statement opens "Our perception is biased towards sensory input from the recent past." This is a semantic point, but it seems a somewhat odd statement, given there is so much debate about whether serial dependence is perceptual vs. decisional, and that the current work indeed claims that it emerges at a late, post-encoding stage.

      Thank you for this point. We agree. “Visual cognition is biased towards sensory input from the recent past.” would be a more appropriate statement. According to the Journal's guidelines, however, the paragraph with the Significant Statement will be not included in the final manuscript.

      It would be preferable for data and code to be available at review so that reviewers might verify some procedural points for clarity.

      Code and preprocessed data used for the presented analyses are now available on OSF via http://osf.io/yjc93/. Due to storage limitations, only the preprocessed MEG data for the main IEM analyses focusing on the current direction are uploaded. For access to additional data, please contact the authors.

      For instance, I could use some clarification on the trial sequence. The methods first say the direction was selected randomly, but then later say each direction occurred equally often, and there were restrictions on the relationships between current and previous trial items. So it seems it couldn't have truly been random direction selection - was the order selected randomly from a predetermined set of possibilities?

      For the S1/S2 stimuli in a trial the dots moved fully coherent in a direction randomly drawn from a pool of directions between 5° and 355° spaced 10° from one another, therefore avoiding cardinal directions. Across trials, there was a predetermined set of possible differences in motion direction between the current and the previous target. This set included 18 motion direction differences, ranging from -170° to 180°, in steps of 10°. Trial sequences were balanced in a way that each of these differences occurred equally often during a MEG session.

      I could also use some additional assurance the sample size (participants or data points) is sufficient for the analysis approach deployed here.

      We performed a formal a-priori power analysis to justify our choice for the sample size. Please see our response to reviewer 2, point 3, where we explained the procedure of the apriori power analysis in detail. We have now included this description and the results of this power analysis in the Materials and Methods.

      Did you consider a decoding approach, instead of reconstruction, to test what information predominates the signal, in an unbiased way?

      Thank you for this argument. With our analysis approach based on the inverted encoding model, we believe to be unbiased, since we first reconstructed whether the MEG signal contained information about the presented and remembered motion direction. Only in the next step, we tested whether this reconstructed signal showed an offset and if so, whether this offset was biased towards or away from the previous target. A decoding approach aims to answer classification questions and is not suitable to reveal the actual shifts of the neural information. In our study, we could decode, e.g., the current direction or the previous target, but this would not answer the question of whether and at which stage of object processing the current representation was biased towards the past. Moreover, in a decoding approach to reveal which information predominates in the signal, we would have to classify different options (e.g. current information vs previous), thereby biasing the possible set of results more than in our chosen analysis.

      I think the claim of a "direct" neural signature may come off as an overstatement when the spatial and temporal aspects of the attractive bias are still so coarsely specified here.

      Thank you for pointing this out. We agree that the term “direct neural signature” can be seen as an overstatement when it is interpreted to indicate a narrowly defined activity of a brain region (ideally via “direct” invasive recordings) that reflects serial dependence. Our definition of the term “direct” referred to the observation of an attractive shift in a neural representation of the current target motion direction item towards the previous target. This was in contrast to previous “indirect” evidence for the neural basis of serial dependence based on either repulsive shifts of neural representations that were opposite to the attractive bias in behavior or on a reactivation of previous information in the current trial without presenting evidence for the actual neural shift. With this definition in mind, we consider the title of our study a valid description of our findings.

      Reviewer #2 (Recommendations For The Authors):

      I was wondering why the authors chose a bootstrap test for their neural bias analysis instead of a permutation test, similar to the one they used for their behavioral analysis. As far as I know, bootstrap tests do not provide guaranteed type-1 error rate control. The procedure for the permutation test would be quite straightforward here, randomly permuting the sign of each participant's neural shift and recording the group-average shift in a permutation distribution. This test seems more adequate and more consistent with the behavioral analysis.

      Thank you for this comment. We adapted a resampling approach (bootstrapping) that was similar to that by Ester et al. (2020) who also investigated categorical biases and also applied a reconstruction method (Inverted Encoding Model) to assess significance of a bias of the reconstructed orientation against zero in a certain direction. The bootstrapping method relied on a) detecting an offset against zero and b) evaluating the robustness of the observed effect across participants. In contrast, a permutation approach, as suggested by the reviewer, assesses whether an empirical neural shift is more extreme than the permutation distribution. The permutation approach seems more suited to assess the magnitude of the shift which in our study was not a priority. Therefore, we reasoned that the bootstrapping for our inference statistics was better suited to assess the direction of the neural shift and its robustness across participants.

      We have added this additional information to the Materials and Methods:

      References:

      Ester EF, Sprague TC, Serences JT (2020) Categorical biases in human occipitoparietal cortex. Journal of Neuroscience 40:917–931.

      The manuscript could be improved by more clearly spelling how the training and testing data were labelled, particularly for the reactivation analyses. If I understood correctly, in the first reactivation analysis the authors train and test on current trial data, but label both training and testing data according to the previous trial's motion direction. In the second analysis, they label the training data according to the current motion direction, but label the testing data according to the previous motion direction. Is that correct?

      Yes, this is correct. Please see also our response to reviewer 1, point 2 and 3, for a detailed description.

      I was surprised to see that the shift in the reconstructed direction is about three times larger than the behavioral attraction bias. Would one not expect these to be comparable in magnitude? It would be helpful to address and discuss this in the discussion section.

      Thank you for pointing this out. We agree with the reviewer that as both measures provided an identical metric (angle degree), one would expect that their magnitudes should be directly comparable. However, we speculate that these magnitudes inform only about the direction of the bias and their significant difference from zero, thus they operate on different scales and are not directly comparable. For example, Hallenbeck et al. (2022) showed that fMRI-based reconstructed orientation bias and behavioral bias correlated on both individual and group level, despite strong magnitude differences. This is in line with our observation and supports the speculation that the magnitudes of neural and behavioral biases operate on different scales and, thus, are not directly comparable.

      We have updated to the Discussion accordingly.

      References:

      Hallenbeck GE, Sprague TC, Rahmati M, Sreenivasan KK, Curtis CE (2022) Working memory representations in visual cortex mediate distraction effects Nature Communications 12: 471.

      Reviewer #3 (Recommendations For The Authors):

      (1) It may be worth showing that the gaze bias towards the current/cued stimulus is not biased towards the previous target. One option might be to run the same analysis pipeline used for the MEG decoding but on the eye-tracking data. Another could be to remove all participants with significant gaze bias, but given the small sample size, this might not be feasible.

      We appreciate this suggestion. However, as mentioned above, we currently do not have sufficient resources to conduct additional analyses on the eye tracking data.

      (2) Minor typo: Figure 3c - bias should be 11.7º, not -11.7º.

      Corrected. Thank you!

      Note on data/code availability: The authors state that preprocessed data and analysis code will be made available on publication, but are not available yet.

      Code and preprocessed data used for the present analyses are now available on OSF via http://osf.io/yjc93/. Due to storage limitations, only the preprocessed MEG data for the main IEM analyses focusing on the current direction are uploaded. For access to additional data, please contact the authors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Participants in this study completed three visits. In the first, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a 0-100 visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism. Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres were also explored.

      Results show that in the experimental visits (Visit 2 and 3), when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular. HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974, 1998) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the same scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increased, thus linking these brain regions with pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Strengths:

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. The study has incorporated effective pseudo-randomisation and conducted a rigorous set of statistical analyses to account for as many confounds as possible. I will particularly credit the authors on their analysis which explores the impact of sex and female participants' stage of menses on the study outcomes. It would be particularly interesting for future work to pursue some of these lines of research which investigate the differences in the endogenous opioid mechanism between sexes and the added interaction of stage of menses or training status.

      There are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study. Indeed, their in-depth analysis of many of these areas provides ample support for the claims they make in relation to these specific questions. As such, I consider their evidence concerning the fMRI data to be very convincing (and interesting).

      Weaknesses:

      Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain ratings seem to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected pre-exercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully assume that there is no exercise-induced hypoalgesia effect as there is no true control comparison (a no-exercise condition).

      In more detail, Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect.

      That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a great finding in my mind as anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful/intense and therefore aversive - great news! This likely has many applications within the field of public health.

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner. I consider the overall strength of the evidence to be solid, with the answer to the primary research question still a little ambiguous.

      Reviewer #2 (Public review):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/lowintensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Participants in this study completed three visits. In the first one, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism.

      Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres was also explored.

      Results show that in the experimental visits (Visit 2 and 3) when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular, HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the Borg (1974) scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increase, thus linking these brain regions with the percept of pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW-intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain rating seems to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected preexercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully question whether there is an exerciseinduced hypoalgesia effect as there is no true control comparison (a no-exercise condition). Nevertheless, there are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study.

      I have provided some specific comments for the authors to consider. They are organised to correspond to each section as it is presented, and I have denoted the line I am referring to each time.

      To conclude, thank you to the authors for their work, and thank you to the editor for the opportunity to contribute to the review of this paper. I hope my comments are seen as useful and I look forward to seeing the authors' responses.

      We sincerely appreciate the reviewer's insightful comments, which highlight the strengths of our study. In response to the concerns raised, we have made several key revisions to the original manuscript to address the reviewers’ comments. As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise.

      This reviewer suggests an interesting interpretation of the data suggesting that exercise induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect. Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration. We have now provided a more detailed overview of the pain ratings at different stimulus intensities after HI and LI exercise in both drug treatment conditions for heat and pressure pain ratings. We elaborated on the specific comments raised in more detail in the following sections.

      Specific Comments

      (1) Abstract

      Line 25 - "we were unable to"... personal preference but this wording is a little 'weighted' in my view. I personally do not think researchers search to prove hypotheses correct, rather we search to prove hypotheses wrong, and therefore only through repeated attempts of falsification can we surmise that something holds true.

      We agree with the reviewer that the chosen wording can be perceived as weighted and have rephrased the sentence.

      Line 33 to 35 - the "...but individual factors... might play a role" is a crucial caveat to this sentence for me. Whilst I can understand that the results of the authors' study indicate that prior assumptions about exercise-induced hypoalgesia and its opioidergic mechanisms may be questioned, I think a little more evidence is needed to finally decide whether aerobic exercise has no overall effect on experimental pain responses. (see more in the Results comments below).

      We thank the reviewer for their comment. We agree that no claims can be made regarding the effect of aerobic exercise per se on pain modulation compared to no exercise based on the current data. Furthermore, we agree that more research is needed to further advance our understanding of (non-)opioidergic mechanisms in exercise-induced pain modulation. However, based on the data presented in this study we propose that the involvement of endogenous opioids in exercise-induced hypoalgesia could be influenced by sex and fitness levels since we could show differences in opioidergic involvement between males and females of different fitness levels. Future studies should account for the fitness levels and sex of the sample investigated.

      (2) Introduction

      Line 48 - please predefine anterior cingulate cortex here.

      We thank the reviewer for detecting this and have introduced the abbreviation for the anterior cingulate cortex in the referenced line.

      Line 49 - please predefine periaqueductal gray here instead of line 52.

      We have introduced the abbreviation for periaqueductal grey in the referenced line.

      Line 47 to 54 - when discussing the descending pain modulatory systems, authors seem to be relating specifically to the intensity/magnitude of pain experiences. However, the different brain regions that are mentioned may have varying "roles" according to which dimension of pain is of focus.

      Hofbauer et al. (2001) - https://doi.org/10.1152/jn.2001.86.1.402

      Rainville et al. (1997) - https://doi.org/10.1126/science.277.5328.968

      The two above studies provide some nice earlier findings on the brain regions - some of which are mentioned by the authors in this section - associated with the processing of pain quality in addition to the intensity of pain... simply attach here if they are of interest to the authors.

      The studies by Hofbauer et al. (2001) and Rainville et al. (1997) provide interesting findings on the effect of hypnotic suggestions on pain affect and the perceived intensity of a painful stimulus. However, these studies did not investigate exercise-induced changes in brain regions of the DPMS. The studies referenced in the relevant section of the manuscript are (one of the few) imaging studies that have indeed investigated brain structures of the DPMS in the context of exercise and pain modulation and, thus, were included in this paragraph to focus on the findings of these studies as well as emphasise the scarcity of imaging studies investigating exercise-induced pain modulation. Given these divergent research topics of the proposed studies, we suggest not including them in this paragraph to maintain a clearer line of argument and focus on exercise-induced pain modulation in brain regions of the DPMS.

      L59 to 61 - a minor comment about the phrasing within this sentence and a recommended change is provided below for the flow of the sentence/paragraph.

      "...there are instances where administration of µ-opioid antagonists has decreased exerciseinduced pain modulation (Droste et al. 1988; etc.) whereas in others there has been little effect (Droste et al. 1988; etc.).

      We have altered the sentence based on the reviewers' suggestions to improve the flow and coherence of the sentence.

      L56 to 72 - Whilst the current version of this paragraph scans well enough, I find that the narrative flits between the mechanisms being discussed and the rationale/shortcomings of current research. I think that the original content of this paragraph can be structured into:

      A- The endogenous opioid system is a likely candidate to explain how exercise elicits a hypoalgesia response.

      B- Citation(s) of the imaging studies (Boecker et al., 2008, etc.) and earlier literature which support A (e.g., Janal et al. 1984).

      C- Further support of this theory as µ-opioid antagonists like naloxone seem to counteract the endogenous opioid effect (Haier et al., 1981).

      D- Introduction of the caveats of previous research such as the studies that observed that µ-opioids did not impact the endogenous pain modulation system during exercise (e.g., Droste et al., 1991, etc.) and the range of different interventions and exercise modalities which make it difficult to draw clear conclusions of the pain modulation effect.

      To me, this structure would set out the details you have already put together in a more orderly and systematic way and also will lead nicely into your ensuing paragraph (Line 74 onwards).

      We appreciate the reviewers' constructive comments on structuring this paragraph. We agree that the proposed version eases the readability and comprehension of the paragraph and have, thus, adapted the restructured paragraph according to the reviewer’s suggestion.

      L75 - Why are single-arm pre-post measures and designs an issue? If you can elaborate a little more this would be very insightful for a reader.

      Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once before and once following an intervention. This study design presents some limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs are potentially confounded by the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception. We have now added this to the paper.

      L80 - The reference for the functional threshold power assessment is provided as a number. Please could the authors change to reflect which study/studies they are referring to here (I presume it is the Borszcz and/or the McGrath studies?).

      We apologise for this oversight and have now updated the reference to be displayed correctly. The reviewer is correct in assuming that Borszcz et al. (2018) is the referenced study here.

      L88 - Did participants also receive pressure pain stimulations in addition to the thermal stimuli, as the figure suggests?

      Note Since read on to L102-104 and understood why pressure pain was included but not mentioned due to results. However, I would still recommend including pressure pain stimulations in this line, if possible, to be consistent with what Figure 1 shows and later text in the Methods section also shows.

      We thank the reviewer for their suggestion to mention pressure pain at the referenced line to increase the clarity and consistency of the experimental paradigm. Pressure and heat pain were applied in alternating fashion during scanning. Whilst the results of pressure pain are not included in this study we agree with the reviewer that it should be mentioned again as part of the methods and have added this.

      L94 - I really like Figure 1. Great job.

      Could the authors please define the inter-trial interval (ITI) in the legend? And please could the authors clarify what unit the 30, 50, and 70 figures in the "18 trials per block" section refer to.

      We thank the reviewer for their positive feedback. We have now included a definition of inter-trial-interval (ITI) in the figure legend. Furthermore, we adapted Figure 1 so that the units of the stimulus intensities (30, 50, 70) on the Visual Analog Scale (VAS) are included in the figure allowing for a clearer identification.

      (3) Results

      General comment for figures ... is there a specific reason the authors chose for error bars to be represented by an SE value as opposed to an SD value?

      The reason I ask is that participant responses seem to vary (See Figure 2A and 2E-G as an example). Error bars showing SD values would perhaps do justice to the variability in participant response(s), whereas the SE may be a better representation of the variability in responses due to the assessor's methods of collection. Whilst the SE error bars are narrow (great job on that!), the individual responses are clearly varied which I speculate could be because of the interventions that have been implemented (i.e., exercise intensity).

      The use of Standard Error (SE) is more common in the cognitive neuroscience literature.

      However, as this reviewer noted, we have also included individual data points alongside the SE, thereby providing a comprehensive view that allows for a thorough interpretation of the data distribution.

      L102 to 104 - In fact, it is interesting that exercise did not impact the pressure pain ratings whereas the same cannot be said for thermal pain. In line with some of my comments below about the impact of exercise on pain intensity responses, I would be intrigued to see the results of the pressure pain ratings in more detail.

      Another note on this... Whilst the results for the pressure pain may be beyond the scope of this paper and will be reported separately, knowing of this data is tantalising for a reader. I would suggest to: A) either mention the pressure pain and include the analysis of the data; or B) not mention the pressure pain altogether and save it for the subsequent paper. Either way, I look forward to seeing further discussion on this in future work.

      We have now summarised the behavioural results of exercise on pressure pain ratings below in Supplemental Figure S1.

      There was no hypoalgesic effect evident in the behavioural pain ratings comparing HI to LI exercise in the saline (SAL) condition (β = 0.57, CI [-1.73, 2.86], SE = 1.17, t(1354) = 0.48, P = 0.63; Supplemental Figure S1A, blue bars) as well as no interaction of drug treatment and exercise intensity on pressure pain ratings (β = -1.43, CI [-4.87, 2.01], SE = 1.75, t(2756.02) = -0.82, P = 0.42; Supplemental Figure S1). Post-hoc paired t-tests (Bonferroni-corrected) confirmed there to be no significant differences between the drug treatment conditions at LI (P = 0.18) or HI (P = 0.85) and no significant difference between the exercise intensities in the SAL (P = 0.65) and NLX (P = 0.48) conditions, confirming no significant differences in drug treatment between the exercise intensities.

      Furthermore, there was no significant effect of fitness level on differences in pain ratings (LI – HI exercise) in the SAL condition (β = 3.16, CI [-1.64, 7.97], SE = 2.37, t(38) = 1.34, P = 0.19; Supplemental Figure S1B) and no significant correlation between fitness level and difference pain ratings (r = 0.25, P = 0.13). Finally, there was no significant interaction of drug treatment, exercise intensity, and sex on difference pain ratings (β =-7.97, CI [-18.67, 2.73], SE = 5.51, t(190) = -1.45, P = 0.15; Supplemental Figure S1C-D).

      Exercise did not appear to affect pressure pain ratings and we have now added this to the discussion and in the methods section. However, we think that the figure should be part of the supplements.

      L112 to 113 - Fantastic work for including this analysis in your study. Great job.

      We appreciate the reviewers’ positive feedback on conducting these crucial analyses when investigating sex and gender differences in pain.

      L186 to 189 - It is fascinating that there appears to be no effect of NALOXONE on pain ratings within female participants at a VAS rating of 30 for thermal pain as well as a much diminished hyperalgesia effect at a VAS rating of 50 compared to males. Meanwhile, at higher intensity stimulations corresponding to a VAS rating of 70, females in fact demonstrate a more pronounced hyperalgesia effect compared to males. In addition, the hyperalgesia effect of NALOXONE for males seems to "peak" at a VAS rating of 50. The mechanisms behind these findings alone would be incredibly exciting to explore... but maybe in another study.

      We agree with the reviewer that the differences in males and females are fascinating results and concur that this may hint at varying degrees of opioidergic involvement at different stimulus intensities. This finding is intriguing and potentially clinically relevant, warranting further investigation in future research, although it lies beyond the scope of the current paper.

      L189 - To double check... Figures 4A and 4B refer to the entire cohort (male and female responses combined) whereas C-E are separated by sex?

      In addition, as there are no annotations to the top of Figures 4C-E were no significant differences observed between saline and naloxone conditions per each stimulus intensity? i.e., similar tests to what are shown in Table S6 but separated for each sex.

      Without getting too carried away, there may be something here that indicates a difference between sexes concerning the opioid-driven pain modulation response on a neurological level (i.e., brain region activation).

      The reviewer is correct in assuming that Figures 4A and 4B refer to the entire cohort whilst Fig. 4C – 4E are split for males and females. The full output of the analyses for Fig. 4A and 4B are reported in Supplemental Tables S5 – S7. Furthermore, the full output of the LMER analyses for Fig. 4E is reported in Supplemental Table S10. We agree with the reviewer that additional annotations in Fig. 4C – Fig. 4E ease interpretation and have, thus, added them to the respective figures, denoting the significance of the interaction term stimulus intensity and drug treatment for females (Fig. 4C) and males (Fig. 4D), respectively. For completeness, we now report the post-hoc paired samples t-tests for females and males in the Supplemental Tables S8 and S9, respectively.

      L254 to 258 - "we could not establish an overall hypoalgesia effect of exercise...". Do the results of the exercise intensity x drug treatment provide an answer for this exact hypothesis? After checking the methods section, I cannot seem to find whether the statistical analysis has involved a comparison of the pain ratings after the high (alone), low (alone), or high and low (combined) exercise compared to ratings during control or pre-calibration as part of precalibration (i.e., pain ratings in a rested state without any exercise yet completed).

      We concur with the reviewer's assessment that the study design and statistical analyses cannot address the ‘overall’ effect of exercise compared to no exercise. Please refer back to our general response before comment 1, where we have addressed this point.

      As it seems that the analysis assesses the differences between high and low-intensity exercise, to me, the results of the exercise intensity x drug treatment analysis do not assess whether there is an exercise-induced hypoalgesia effect or not. Instead, it seems to assess whether the intensity of exercise is a differentiating factor in the expected exercise-induced hypoalgesia effect to subsequent pain intensity ratings to experimental pain stimulation. For the authors to judge whether aerobic exercise does or does not have a hypoalgesia effect, then the exercise conditions (either combined or standalone) would have to be compared to a control condition or a data set that involved pain ratings from a pre-exercise timepoint.

      We thank the reviewer for their comment. We would like to point out the we concluded there to be no hypoalgesic effect between the LI and HI exercise based on the LMER model comparing the behavioural pain ratings between the exercise conditions in the SAL condition (β = 1.19, CI [-1.85, 4.22], SE = 1.55, t(1354) = 0.77, P = 0.44; Figure 6A, blue bars and Table S9). The statistical model investigating the interaction of exercise intensity and drug treatment served to show that NLX did not modulate pain differently between the LI and HI exercise conditions.

      Given that our experiment involved different exercise levels in a randomized order, a simple pre vs post analysis is not straightforward. Nevertheless, we have set up a model where we take into account the rating time point (pain ratings provided before each exercise block (prepain ratings) and following each exercise block (post-pain ratings)) at each stimulus intensity (VAS 30, 50, 70) and exercise intensity (LI and HI). The model also takes into account the exercise intensity performed in the previous block, the overall block number as well as the varying subject intercepts. The analysis was completed for heat (Author response image 1A) and pressure (Author response image 1B) pain ratings in the SAL condition to establish whether there was a significant effect of exercise intensity on the changes from pre to post-pain ratings. The model for heat pain yielded a significant main effect for stimulus intensity (β = 1.43, CI [1.34, 1.52], SE = 0.05, t(2054.95) = 31.61, P < 0.001) but no significant interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.14). The model for pressure pain in the SAL condition yielded a significant main effect of stimulus intensity (β = 1.00, CI [0.92, 1.08], SE = 0.04, t(2054.99) = 24.68, P < 0.001) and block number (β = 1.14, CI [0.35, 1.94], SE = 0.41, t(2055.98) = 2.80, P = 0.005) but not interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.38).

      Author response image 1.

      Heat (A) and Pressure (B) pain ratings in the saline (SAL) condition for pre (purple) and post (turquoise) exercise pain ratings at LI and HI exercise and all stimulus intensities (VAS 30, 50, 70). The bars depict the mean pain rating pre and post-exercise and the dots depict the subject-specific mean ratings. The error bars depict the SEM.

      Another point of consideration is that Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      • It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      • It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      • It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect. That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a winner in my mind. Anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful and therefore aversive - great news!

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner.

      As a result of this interpretation of your findings, I do not think that aerobic exercise as a means to cause subsequent hypoalgesia to experimental thermal nociception can be fully discounted. On the contrary, I think your results showed in Figure 6A are evidence for it.

      The reviewer is correct in assuming that Figure 6A shows the averaged pain ratings across all stimulus intensities (VAS 30, 50, and 70) for each subject. To provide more details, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Fig. S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      The reviewer further suggests that the average pain ratings in the SAL condition are lower than the anticipated stimulus intensity, thus, indicating exercise-induced hypoalgesia. While this interpretation is one possibility, there is an alternative explanation: the lower pain ratings may stem from habituation to heat pain (Greffrath et al., 2007; Jepma et al., 2014; May et al., 2012). To support this perspective, we have visualised data from other studies in our lab that have been conducted with the same thermode head and device (TSA-2), using the same calibration procedure and aiming for the same stimulus intensities (VAS 30, 50, and 70). In both studies (Author response image 2A: Study 1: Behavioural sample; Author response image 2B: Study 2: fMRI sample; Author response image 2C: Original Exercise Study), participants did not engage in an exercise task and the pain ratings at VAS 30 and VAS 50 were lower than the anticipated intensities (VAS 30: 11.1/13.4; VAS 50: 35.0/35.9). Furthermore, in a previous study by (Wittkamp et al., 2024), the authors showed that, despite calibrating the heat stimuli at VAS 60, participants rated the pain stimuli with M = 48.58 (SD = 13.79).

      This discrepancy observed between calibrated intensities and ratings provided could be attributable to habituation effects, especially at low-intensity stimuli. Moreover, we would like to point the reviewer to the highest stimulus intensity at VAS 70 (Author response image 2C), where no habituation in all three data sets (including the current study) has taken place. This consistency suggests that exercise-induced hypoalgesia may not be present in our findings or potentially confounded by habituation effects.

      Author response image 2.

      Heat pain ratings at different intensities (30, 50, and 70 VAS) in different study samples. Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      The reviewer further suggests that there is evidence for endogenous opioidergic modulation since the pain ratings in the NLX condition are lower than the anticipated intensities. We fully agree but, again, would argue that the DPMS can exert its effects on painful stimuli in a default manner, i.e. irrespective of any exercise effect.

      We concur with the reviewer’s interpretation that there is no effect of exercise intensity on exercise-induced hypoalgesia since the ratings between both exercise intensities are not significantly different.

      Finally, we agree that our data does not allow for the interpretation of an ‘overall’ effect of exercise-induced hypoalgesia and would like to point out that we did not aim to claim this. Rather, the data suggests there to be no effect of LI vs. HI aerobic exercise on pain modulation. We acknowledge, however, that the phrasing involving ‘overall’ can be misleading and have revised this to focus on the comparison between LI and HI exercise, thereby enhancing precision and clarity.

      Note This is also where it would be really interesting to see the pain pressure data if it were to be included. Mainly to see whether it coheres with what the thermal stimulation stuff shows.

      We have provided the ratings for the pressure pain ratings in the SAL condition below (Author response image 3).

      Author response image 3.

      Pressure pain ratings in the SAL condition at stimulus intensity (VAS 30, 50, and 70). Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      L259 - As mentioned in the comment above. Could the authors distinguish what is being shown in Figure 6A? Are the data presented as the pooled mean for all stimulation intensities? If not, what data is displayed per bar/column?

      We thank the reviewer for their comment. The reviewer is correct in assuming that the bars in Figure 6A depict the pooled means across all stimulus intensities (VAS 30, 50, 70) for each drug treatment condition and exercise intensity. To allow for a more detailed comprehension of the data, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Figure S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      L278 - Can the authors please provide a reference that explains how W.kg-1 at FTP is a measure of fitness level?

      We thank the reviewer for their comment. The obtained FTP value was corrected for the weight of each participant (Watt/kg), yielding a weight-corrected fitness measure that allows for better comparison between subjects. We denoted this in the figures as W*kg-1 which serves to be the equivalent term.

      L296 - Take the line away from Figure 7A... Does the individual data show a positive relation between pain rating changes and W.kg-1? Besides the three data points (1 on the far right of the figure and the two on the far left), I find it hard to see any real trend.

      We acknowledge the reviewers’ concern regarding the regression line and the visual clarity of the individual data points. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      We have conducted an additional LMER model where we have excluded the subjects with the highest and lowest FTP values (sub-28 with 3.19 W/kg and sub-06 with 0.76 W/kg, respectively.) The LMER still yields a significant main effect of fitness level (β = 6.82, CI [1.25, 11.65], SE = 3.18, t(34) = 2.14, P = 0.039; Author response image 4) and a positive correlation between the difference ratings and fitness level approaching significance (r = 0.32, P = 0.057).

      Author response image 4.

      Fitness level on difference pain ratings (LI-HI exercise) without subjects with highest and lowest FTP (N = 37). (A) Subject-specific differences in heat pain ratings (dots) between LI and HI exercise conditions (LI – HI exercise pain ratings) and corresponding regression line pooled across all stimulus intensities in the SAL condition. Fitness level (FTP) showed a significant positive relation to heat pain ratings with a significant main effect of FTP (P = 0.039) on difference ratings.

      (4) Discussion

      L356 to 358 - Exactly. What you write here, I agree with. Your testing allowed you to judge whether there is an effect of aerobic exercise intensity on pain modulation. However, I think this has been a little conflated with the idea that there is "no overall effect of aerobic exercise on pain modulation" in other areas of the article (L358-361, Results, and Abstract). As per my previous comment, I am not sure this (no overall effect) is true.

      We agree with the reviewer and have adapted the manuscript so that the misleading phrase including ‘overall’ is removed.

      L358 to 365 - One addition to this debate about whether this is a hypoalgesia effect of aerobic exercise. In 358 - 361 (particularly the end of 361) there is a strong conclusion that there is no direct involvement of the endogenous opioid system. Then glance onto L364 to 365 and there is then an almost conflicting summary that a hypoalgesia effect driven by opioidergic regions of the brain (and ergo endogenous opioids) is in effect. If there were no direct endogenous opioid involvement, then differences between NALOXONE (blockade of the opioid mechanism) and SALINE conditions would not exist.

      We thank the reviewer for their comment. The structure of this paragraph aimed to guide the reader towards a more nuanced understanding of the possible mechanisms and caveats in exercise-induced pain modulation. Whilst our data suggest an effect of NLX on pain ratings where we showed significantly higher pain ratings in the NLX condition compared to the SAL condition we could not identify an interaction between treatment and exercise intensity. This suggests that there is no significant difference in opioidergic involvement between HI and LI exercise. Our exploratory analyses, however, show an effect of endogenous opioids involved as an underlying mechanism dependant on sex and fitness level.

      My perspective is that an exercise-induced hypoalgesia effect has occurred (based on the data in Figure 6A) but that this effect is certainly caveated by the sex and fitness levels that this study has observed (and kudos for it).

      As mentioned above, based on the current data we cannot untangle whether the reduced pain ratings in the SAL condition are due to habituation to noxious stimuli or an actual hypoalgesic effect of exercise (or potentially a mix of both). However, we fully agree with the reviewer that exercise-induced pain modulation is influenced by fitness level and sex.

      L390 - "endogenous pain modulation through μ-opioid receptors increases with increasing pain intensity". Aside from the general discussion about whether aerobic exercise causes a post-exercise hypoalgesia effect. This finding is also interesting for the pain incurred during exercise in the form of naturally occurring muscle pain and may also be clinically relevant as it could be that the endogenous pain modulation "system" could be primed through repeated exercise as your results show that the fitness level (i.e., a close correlate of how much someone has engaged in exercise and therefore 'activated' the endogenous pain modulation system) is associated with a more pronounced post-exercise hypoalgesia effect.

      This is an interesting aspect. With regards to the pain induced by exercise itself (i.e. muscle pain) we did not gather any data on this type of pain and interpreting this would be mere speculation. However, it is an interesting hypothesis to investigate in future studies whether the pain induced by exercise is potentially influenced by the endogenous opioid system. We agree with the reviewers’ interpretation that repeated exercise might prime the endogenous opioid system, especially in fitter individuals who engage more frequently in exercise and, thus, ‘train’ the endogenous opioid system. We have included this line of interpretation in the original manuscript, where we suggest that the mFC, a brain region with high µ-opioid receptor density, might be ‘trained’ by repeated exercise and, therefore, shows increase activation in fitter individuals after short bouts of exercise.

      L404 to 405 - "a resting baseline does not control for unspecific factors such as attentional load or distraction (Brooks et al., 2017; Sprenger et al., 2012) through exercise." I am not sure I agree. A control condition allows one to truly deduce whether exercise causes a hypoalgesia effect or not. The attentional load may be a factor, but I would argue this is distinct from endogenous pain modulation - unless there is a study that shows cognitive load alone can elicit endogenous opioids like exercise. About distraction, this would be the case if the pain measures were taken during the exercise. However, as the pain measures taken in the MRI were post-exercise and there was no added distraction related to the exercise present anymore, then I do not think any added effect of distraction due to the exercise and its effect on postexercise pain measure is relevant any longer.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation would allow for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. It is important to note that both studies (Brooks et al., 2017; Sprenger et al., 2012) have indeed shown that the effect of cognitive pain modulation is mediated by endogenous opioids.

      L406 - I do not think a low-intensity exercise is a true "control" condition. It certainly does allow the study to compare the dose-response relationship but as the individual is exercising (even at a moderate physiological intensity) then comparison of HIGH vs LOW does not tell us whether exercise does or does not cause hypoalgesia. In contrast, the results from Figure 6A seem to show that even LOW intensity exercise has a hypoalgesia effect and this is a good thing for those who cannot exercise at high intensities (e.g., chronic populations).

      Please refer back to our general response before comment 1, where we have addressed this point.

      L410 - A small digression in relation to the exercise intensities:

      The intensity domains (moderate - heavy - severe) are not truly controlled within this study (mainly for the LOW condition), and therefore some participants could have exercised within different exercise intensity domains than others. To explain, the exercise intensity domains are distinguishable by the physiological responses associated with the boundaries of each of these domains. The FTP is believed to be a demarcation point between heavy and severe intensity domains (though kinesiologists debate the validity of this). Other concepts similar to FTP are Critical Power or the Respiratory Compensation Point. Ultimately, the boundary between heavy and severe intensity domains is characterised by the highest possible intensity by which a steady-state in oxygen kinetics (V̇ O2) occurs (Burnley & Jones, 2018). If this is expressed as a power output (Watts) and then a percentage of this power output is used to prescribe exercise intensity, then the physiological response is not always as expected. The reason is that for some people the gaseous exchange threshold (the demarcation point between the moderate and heavy intensity domains) is not always the same percentage between resting and FTP/Critical Power/Respiratory Compensation Point for each person. As a result, some individuals who are prescribed an intensity of 55% FTP/Critical Power/Respiratory Compensation Point may subsequently exercise within the moderate intensity domain (most people did based on the heart rate and RPE responses) whilst some others might actually exercise more within the heavy intensity domain. A quick check of Figures 3B-C could indicate that this might have been the case for two or three participants, but that is inference and speculation as we cannot truly know unless gas parameters were taken (which is perfectly understandable that they have not been taken because this study has done so much else). However, the importance of this for this study is that if some participants did indeed exercise at a slightly higher physiological intensity, this undermines the LOW condition as a "control" as the physiological stimulus between conditions (Brownstein et al., 2023). It means that the proposed differences in endogenous opioids (Vaegter et al., 2015; 2019) between exercise intensities may not have been present and therefore summarising a lack of an exercise induced hypoalgesia effect is slightly confounded. This is one factor contributing to my scepticism about the conclusion that there is a lack of an exercise-induced hypoalgesia response.

      We thank the reviewer for their comment as it touches upon the challenges of estimating exercise intensities precisely. It is, indeed, crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers such as the Functional Threshold Power (FTP), Critical Power, and the Respiratory Compensation Point (VO2max) (Burnley & Jones, 2018). Previous research has shown that the FTP and FTP20 tests are reliable and convenient methods to estimate approximate measures of VO2max (Denham et al., 2020) and that the FTP test is a useful test for performance prediction in moderately trained cyclists (Sørensen et al., 2019).

      We acknowledge that without direct measurements of VO2max, it is challenging to determine the precise intensity domain in which each participant was operating. While the RPE and HR might suggest that some participants performed in the moderate intensity domain in the LI exercise condition, we could still ascertain there to be a significant difference in the relative power (%FTP), heart rate (HR), and rating of perceived exertion (RPE) between the LI and HI exercise conditions. In the overall sample, the consistency in relative power, heart rate, and RPE responses among participants suggests that the exercise doses were effectively communicated and adhered to; therefore, the validity of the LI exercise condition remains robust.

      While we did not include metabolic assessments in our protocol, our study focused on providing a comprehensive analysis of the exercise-induced hypoalgesia phenomenon across two distinct exercise intensities. Additionally, the rationale for selecting specific exercise intensities was grounded in the existing literature, which indicates significant differences in the hypoalgesic response between exercise intensity levels (Jones et al., 2019; Vaegter et al., 2014).

      According to the reviewer, the potential lack of difference between the exercise conditions might contribute to the fact that there was no difference in endogenous opioid release and, thus, no difference in pain ratings between the exercise conditions. However, our data still suggests that there is an influence of endogenous opioids in the HI exercise condition in males with higher fitness levels. Together with recent findings on the association of µ-opioid receptor activation and fitness levels in men (Saanijoki et al., 2022), as well as the difference in µ-opioid receptor availability between high and moderate aerobic exercise (Saanijoki et al., 2018), we would hypothesise that the release of endogenous opioids after short HI bouts of exercise depend on fitness levels (and potentially sex).

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have, therefore, included this in the discussion of the manuscript.

      L417 - For some reason I am doubting this value (r = 0.61). Could this be checked? I think it is higher in their study. r = 0.88?

      Also, as someone with a kinesiology background, I would argue this is a given anyway. The maximum power one can cycle for 20 minutes is related to the maximum power one can cycle for 60 minutes, this is expected. (That is no slight on the authors of this study, more a remark that readers could look and figure that for themselves if they needed to know).

      We thank the reviewer for their comment. We have carefully re-checked the correlation coefficient between the FTP20 and FTP60 tests in the study by Borsczc et al. (2018) and have corrected the correlation coefficient to r = 0.88. We thank the reviewer for detecting this. Whilst we agree that it seems somehow intuitive that the FTP20 and FTP60 should correlate highly, we wanted to provide the reader with a better understanding of where the FTP20 tests originated from and how it is suitable to assess aerobic fitness levels without having to maintain a steady power output for 60 minutes.

      L428 - Kudos to the authors for taking a standardised approach to this. Hopefully, my comment earlier might provide some extra food for thought about exercise intensity. I think there are several other ways future research could prescribe exercise without the need for expensive and cumbersome bits of equipment to know how hard people are exercising.

      We strongly agree with the reviewer and hope that our study can inspire future research to implement more convenient and inexpensive ways to establish aerobic (and anaerobic) fitness levels.

      L456 to 458 - Would it be possible to revisit this and check whether the pooled mean of all stimulation intensities for pain intensity ratings after pressure pain is lower than 50? If so, I think it can also be assumed that there is a slight hypoalgesia effect occurring for pressure pain too.

      We have revisited the pressure pain ratings pooled across all stimulus intensities (VAS 30,50, and 70). Indeed, the ratings are below 50 VAS (Supplemental Figure S1A) in the SAL and NLX conditions. As mentioned before lower pain ratings after LI exercise cannot be taken as evidence for exercise-induced analgesia.

      L495 to L499 - I find this fascinating. Great finding.

      We thank the reviewer for their positive feedback.

      (5) Methods

      L650 - "Watts"

      We have changed the sentence accordingly.

      L651 - beats per minute can also be represented as b.min-1 and cadence as revolutions.min-1.

      To allow for easier interpretation of the results in a broader readership we would like to propose to maintain the original abbreviations.

      L678 - Just to check what the authors mean by "on the second experimental day", they are actually referring to Visit 2 of 3 (first experimental visit of 2) as it is shown in Figure 1?

      We apologise for the lack of clarity. Indeed, the second experimental day refers to the third visit in the study. We have added this to the sentence to increase clarity.

      L708 - would change the end of the sentence to "and remained blinded throughout the study"

      We have changed the sentence accordingly.

      L742 - comma after "in one participant".

      We have added the missing comma.

      L746 - slight mistype... RPE in brackets instead of PRE

      We have changed the abbreviation to RPE.

      L747 - In case the authors are interested in affective measures in future studies... Hardy and Rejeski (1989) have a 9-point Likert scale rating affective valence which might be useful to check out.

      Thank you. The scale by Hary and Rejeski (1989) is a very relevant measure of affective valence during exercise, and we will consider this in future studies.

      L755 - Four squares for the thermode to be applied were drawn on the arm but through the methods I can only seem to see that the thermode was applied to the second square during calibration. During the MRI scan, did someone move the thermode to different squares for different stimulations?

      We appreciate the reviewers' question. Indeed, the heat calibration and recalibration on the first and second day, respectively, have always been completed on the same skin patch (patch 2) to allow for comparability of calibration across days. During the experimental sessions, the thermode head was repositioned in a randomised order across participants (i.e., skin patch 14-3-2) before each block. This was done manually before the MRI block commenced. The order of thermode head position was kept constant within participants across experimental days (day 2 and day 3).

      L764 - ITI predefined?

      We thank the reviewer for their comment and would like to point to line 130 in the revised manuscript where the abbreviation for inter-trial-interval (ITI) was first introduced.

      (6) Other Sections + Supplementary Materials

      L891 - I apologise in advance for this comment as it is the most trivial comment you will ever receive, but there is an extra "." On this line after J.N. initials for methodology.

      We have changed the punctuation accordingly.

      Table S1 - Strictly speaking, some of the intensity denominations in this table are not exactly an "intensity".

      Iannetta et al. (2020) - https://doi.org/10.1249/mss.0000000000002147 provides a commentary on intensity domains as well as Burnley and Jones (2018) - https://doi.org/10.1080/17461391.2016.1249524

      Likewise in this table - the term "without fatigue" in the description column is not strictly true as participants will naturally fatigue but authors are referring more to a "steady state".

      We have changed the name of the column to ‘Description’ to describe the test phase as proposed by Allen and Coggen (2012) and previously implemented by McGrath et al. (2019) and not the ‘intensity domains’ (as specified by Iannetta et al. (2020)). Further, we have refined the wording in Table S1 and replaced the term ‘without fatigue’ with ‘steady state’.

      Once again, thank you to the authors for their great work on this project and to the editor for the chance to review this paper.

      We would like to thank this reviewer for their very insightful and important comments and for pointing out the strengths of the manuscript. We believe the suggestions will help to improve the quality of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/low intensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      We thank the reviewer for their insightful comments that contribute to improving the quality of the manuscript. In response to the identified weaknesses, we have made key revisions to enhance clarity and rigor. Regarding the lack of a resting control condition, we acknowledge that our study does not assess the overall effect of exercise versus no exercise. Our primary objective was to compare high- (HI) and low-intensity (LI) exercise on pain modulation, hypothesizing that lower intensities would have minimal effects. We revised the manuscript to eliminate misleading phrases about an "overall" effect, clearly emphasizing our aim to investigate the comparative effects of different exercise intensities. To address terminology inconsistencies, we have adopted "exercise-induced pain modulation," reflecting existing literature that recognizes both hypoalgesia and hyperalgesia associated with exercise (Vaegter and Jones, 2020). We clarified this terminology in the introduction and specified the pain modalities used in our study. We also improved methodological transparency by better describing the timing and order of exercise and drug treatment interventions. Concerning exercise intensity estimation, we acknowledge the complexities in classifying moderate, heavy, and severe domains. We added the study by Wong et al. (2023) to discuss the potential limitations of the FTP estimation protocol. Although direct measures of VO2max or blood lactate are absent in our study, our findings, including perceived exertion (RPE) scores and relative power data, support that participants were primarily in the heavy-intensity domain during HI exercise. To clarify RPE ratings, we adjusted the presentation to align with the Borg scale's intended anchor points, ensuring greater accuracy in reported exertion levels. Statistical analyses confirm significant differences in RPE between exercise intensities. These revisions aim to clarify our intent and methodologies, ultimately strengthening the contribution of our research to understanding exercise-induced pain modulation.

      (1) Lines 27-33 - please present some data and accompanying statistical output in the results section of the abstract.

      We thank the reviewer for their comment. In the results section of the abstract, we report whether the findings are (not) significant using the general threshold of P < 0.05. However, we prefer not to include more detailed data and statistical outputs here, as these are thoroughly presented in the results section and do not contribute to the abstract’s primary purpose of providing a concise summary.

      (2) Line 29 - please indicate how fitness level was quantified.

      The functional threshold power (FTP) adjusted for weight served as an indication of cardiovascular fitness level. We have now included this in the abstract.

      (3) Line 35 - please include a sentence detailing the implications of your findings.

      We have now included a sentence on the implications of our findings in the abstract.

      (4) Introduction general - I appreciate that it was an exploratory analysis, however, the introduction does not particularly lay the groundwork for this (e.g., the influence of fitness level, sex, etc) - please include some background within the introduction to establish the role level of fitness/exercise/training/physical activity on pain modulation.

      A paragraph detailing the role of fitness level and sex in the context of exercise-induced pain modulation and endogenous opioid release was part of the introduction of our manuscript but has been removed as per the reviewing editor’s request (as the inclusion of sex and fitness level was not part of the preregistration). We have now re-included a shortened version of this paragraph to provide some background on these potentially crucial factors in exercise-induced pain modulation.

      (5) Lines 40-41 - reference needed.

      We thank the reviewer for detecting this and have now included references concerning the release of endogenous opioids and the term exercise-induced hypoalgesia.

      (6) Lines 48-49 - please provide the full terms for ACC and PAG (PAG has been provided on line 52, but should be presented earlier).

      We thank the reviewer for detecting this. We now introduce the abbreviations for the periaqueductal grey (PAG) and anterior cingulate cortex (ACC) in the correct lines.

      (7) Line 49 - the term exercise-induced pain is often used interchangeably (incorrectly) with many different types of pain experienced during/after exercise (e.g. muscle burn/ache, DOMS, injury etc.). Please see O'Malley et al 2024 (doi: 10.1113/EP091687).

      We thank the reviewer for their comment. Despite the distinction between different types of pain induced by exercise being important, this is less relevant for the current study. We would like to point out that the full term used is exercise-induced pain modulation, referring to the modulation of (experimental) pain through exercise. We have deliberately chosen this term as it summarises exercise-induced hypoalgesia as well as hyperalgesia. Therefore, we did not refer to pain induced by exercise and would disagree that this term has been used interchangeably with different types of pain in the current manuscript.

      (8) Line 57 - neither of these studies looked at exercise-induced pain, rather they examined experimentally induced pain (e.g. cold pressor test) or chronic pain and how exercise might exacerbate it. This leads back to the previous comment - it is important to define what is meant by exercise-induced pain (EIP) from the offset, and then remain consistent in the reference to this.

      We agree with the reviewer and have cited the studies accordingly. We would like to point out that the current study does not investigate exercise-induced pain but the modulation of experimental pain through exercise and have used the term exercise-induced pain modulation consistently in the manuscript to describe this.

      (9) Line 61 - Droste et al and Olausson et al are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Droste et al. (1991) and Olaussen et al. (1986).

      (10) Line 61 - Do you mean exercise-induced hypoalgesia, or modulation of exercise-induced pain - it is not clear? EIH is introduced in Line 40 and in consistent with what the Koltyn study explored. Conversely, Koltyn induced pain using heat and pressure, rather than exercise.

      In this manuscript, we have opted for the term ‘exercise-induced pain modulation’ since previous research has shown that exercise can elicit hypoalgesia as well as hyperalgesia (for review see Vaegter and Jones (2020)). Thus, the term refers to the modulation of pain through exercise. We have now included a sentence detailing the use of the term ‘exercise-induced pain modulation’ in the first passage of the introduction. Corresponding to Koltyn et al. (2014), we have used heat and pressure stimuli to induce pain and investigate the modulating effect of different exercise intensities on these pain modalities.

      (11) Line 62 and 64 - Both the Janal study and Haier study are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Janal et al. (1984) and Haier et al. (1981).

      (12) Line 62 and 64 - define long/short distance/duration.

      We have revised the terminology from "short-duration" to "short-distance" to facilitate a more precise comparison of the exercise protocols employed in the studies by Janal et al. (1984) and Haier et al. (1981). Specifically, the long-distance run conducted by Janal et al. (1984) spanned 6.3 miles (10.3 km), while the short-distance run executed by Haier et al. (1981) covered 1 mile (1.6 km).

      (13) Line 62 - what type of pain?

      Janal et al. (1984) implemented thermal, ischemic, and cold pressor pain in their study and observed a hypoalgesic effect in response to thermal and ischemic pain that was reversed under NLX administration. We have now specified this in the text.

      (14) Line 67 - please place "i.e., the insula, ACC and prefrontal regions" in parentheses.

      Done.

      (15) Lines 67-69 - please provide clarity on the nature of the interventions being employed. For example, are you referring to interventions to reduce/overcome pain? Or are you referring to approaches to experimentally induce or increase pain during exercise? In either case, please be specific on the interventions employed, and why this variation in approach may make it challenging to draw a conclusion

      The interventions employed by several studies aimed to investigate the pharmacological underpinnings of the pain modulatory effect of exercise and were, thus, pharmacological interventions. The primary objective of these interventions is usually not to reduce/induce/decrease/increase pain but to block a specific receptor type to infer the involvement/role of these receptor types in pain modulation through exercise. In the context of exercise and pain specifically, the most frequently used pharmacological intervention consists of administering a µ-opioid receptor antagonist (naltrexone/naloxone (NLX)). Depending on which type of µ-opioid receptor antagonist is used, different administration protocols are employed (i.e., oral or intravenous administration, different doses, only bolus without constant injection). This variability in the administration protocols of these pharmacological interventions can account for different findings of the extent of opioidergic involvement in exercise-induced pain modulation. We have now refined the according section to increase the precision and clarity of the interventions used.

      (16) Line 69 - administration of what?

      This passage refers to the variability of administration of µ-opioid receptor antagonists such as naloxone (NLX) or naltrexone. We have now specified this in the according line.

      (17) Line 74 - EIH?

      As described above, we have chosen the term 'exercise-induced pain modulation' as an umbrella term for both exercise-induced hypoalgesia and hyperalgesia. However, the reviewer is correct that specifically studies investigating exercise-induced hypoalgesia have been criticised. Still, the proposed criticism also applies to studies detecting hyperalgesia and we would, thus, argue to retain the term ‘exercise-induced pain modulation’ here for the sake of consistency.

      (18) Line 75 - please define "single-arm pre-post measurements"

      We appreciate the reviewers' comment. Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once prior to and once following the intervention. This study design presents several limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs do not consider the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Consequently, when measuring pain levels with only one pre- and one post-intervention assessment, there is a risk of misinterpreting the outcomes where a reduction in post-intervention pain ratings might erroneously be credited to the exercise intervention itself, rather than being a result of habituation to the noxious stimuli experienced. Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception.

      (19) Line 84 - is (40) a reference?

      We apologise for this oversight and have now updated the reference by Borszcz et al. (2018) to be displayed correctly.

      (20) Line 86 - is that 10 min per block (i.e. 40 min exercise time), or 10 min in total? If the former please include "per block" at the end of the sentence (Line 87).

      The reviewer is correct in assuming that we employed 10 min of cycling per block, resulting in a total of 40 minutes of cycling. We have updated the sentence now including ‘per block’ as suggested by the reviewer.

      (21) Line 89 - when you refer to "painfulness" are you referring to the intensity of pain experienced? If so, I think "pain intensity" would be more appropriate.

      In the current study, participants were asked about the ‘painfulness’ of each stimulus based on previous studies (Horing et al., 2019; Horing & Büchel, 2022; Tinnermann et al., 2022). The term ‘painfulness’ is a composite measure of ‘pain intensity’ (sensory dimension) and ‘pain unpleasantness’ (affective dimension) (Talbot et al., 2019). Since unpleasantness is also a definitional criterion of pain (‘Terminology | International Association for the Study of Pain’, n.d.) and previous research shows a high correlation between ‘pain unpleasantness’ and ‘pain intensity’ (Granot et al., 2008; Talbot et al., 2019) we have opted for the term ‘painfulness’ as a more comprehensive measure. Inherently, these two measures are highly correlated.

      (22) Line 91-93 - the way this is written could be suggestive of this being separate to the cycling blocks. Please rephrase to confirm that this was administered prior to the commencement of the cycling blocks.

      We have refined the sentence to make it clearer that the drug treatment was administered before the cycling block commenced on each of the experimental days. We would like to further specify, that whilst the bolus dose of the treatment was administered prior to the experiment, a constant intravenous supply of SAL/NLX was maintained throughout the experiment using an infusion pump.

      (23) Methods general - why only 10 min of exercise? It is likely that there is a 'dose effect' of exercise on EIH, whereby the intensity of exercise and the duration of the exercise are important. Short-duration but high-intensity exercise can induce EIH, as can moderate duration low-intensity exercise. But, for this protocol, was the intensity high enough or long enough to meet the 'dose' needed?

      We thank the reviewer for their question. Our decision to employ 10-minute exercise blocks was rooted in both scientific evidence on exercise-induced hypoalgesia and the (clinical) applicability of the findings. Research has shown that exercise durations ranging from 8 minutes to 2 hours of aerobic exercise can induce hypoalgesia (for review see Koltyn (2002)). Specifically, several studies induce hypoalgesia at 10-15 minutes of aerobic exercise (Gomolka et al., 2019; Gurevich et al., 1994; Haier et al., 1981; Jones et al., 2019; Sternberg et al., 2001; Vaegter et al., 2015). Furthermore, many prior studies have employed exercise durations that are tailored to professional or amateur athletes which may not be practical for healthy individuals with lower fitness levels who may find it challenging to engage in longer sessions, such as an hour of running. When considering applying these findings to the clinical chronic pain population it is crucial to assess the manageability of proposed exercise protocols. We believe that 10 minutes of exercise, whilst being a relatively brief exercise duration, may still be sufficient to elicit exercise-induced hypoalgesia.

      (24) Methods general - what was the time gap between each round (i.e. after the fMRI, how long before the participant started the next cycling block?).

      After each fMRI run the participants were taken out of the MR scanner. The HR and SPO2 were measured and participants were given the chance to go to the restroom before positioning them on the bike and starting the next block. All in all, the time following the fMRI scan and before the new block commenced ranged between 5-10 minutes. We have now included this specification in the methods section.

      (25) Methods general - there is some evidence to show that the EIH effect is less consistently shown when heat is used to induce pain - was there a reason heat was used as the pain induction method here?

      We thank the reviewer for their comment. Indeed, previous meta-analyses by Naugle et al. (2012) report larger effect sizes for pressure pain (Cohen’s d = 0.69) closely followed by heat pain (d = 0.59). In light of this evidence, we included both pain modalities in the current study. Notably, we found no significant differences in pressure pain responses between LI and HI exercise. It is important to emphasise that the term "pressure pain" predominantly encompasses studies employing handheld pressure algometry, whereas our investigation utilised a pressure cuff. This methodological variation raises the possibility that our findings—and corresponding effect sizes—may not be directly comparable to prior pressure pain studies.

      (26) Methods general - please be consistent in the use of terminology. In some areas, you use the phrase "cycling block" whereas in other areas it is referred to as a "cycling run".

      We have revised the methods section to be more precise with the terms ‘run’ and ‘block’.

      (27) Line 571-573 - Please detail how participants were excluded based on scores from STAI and BDI-II.

      We apologise for the misspelling, as it should be that one participant was excluded based on a BMI (body mass index) below 18. No participant had to be excluded based on the STAI or BDI-II score in the current study. We have corrected this in the manuscript.

      (28) Line 636-651 - the FTP20 test has been shown not to be a valid marker of the separation between the heavy and severe exercise intensity domains (see Wong et al 2023 - https://doi.org/10.1080/02640414.2023.2176045). Given that participants completed the high intensity cycle in 'zone 4' (91-106% of FTP), it is probable that participants could have completed this 10 min in either the heavy or the severe exercise intensity domains, with significant implications for the relative challenge this 10 min of exercise. Why was zone 4 used? What are the implications of this? Please discuss and include this as a limitation.

      We thank the reviewer for their comment as it touches upon the challenges of accurately estimating exercise intensities. It is indeed crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers.

      The study by Wong et al. (2023) is interesting; it assesses blood lactate and VO2 levels at FTP and FTP+15 Watts. Despite being highly relevant for the field some of the findings should be interpreted with caution due to the low sample size of 13 participants, consisting of 11 male and only 2 female cyclists, which may limit generalisability. Additionally, the testing protocol implemented in the study to determine participants' FTP consisted of a 5-minute self paced pedalling at 100 Watts followed by a 20-minute maximal, self-paced time trial. This differs from the FTP20 test as implemented in the current study (see Supplemental Table S1) or by other studies (McGrath et al., 2019). The finding in Wong et al. (2023) that participants were only able to sustain cycling at FTP for an average of 33 minutes suggests that the deviating protocol overestimates FTP. Mackey and Horner (2021) propose that the validity of the FTP20 test might rely on the warm-up used before FTP20 testing and the training status of athletes.

      However, we acknowledge that without direct measurements of VO2max or blood lactate levels, it is challenging to determine the precise intensity domain in which each participant was operating in the current study. Still, the RPE (low: M = 8.59, SD = 1.32; high: M = 14.92, SD = 1.98) suggests that participants operated in the heavy-intensity domain in the HI exercise condition. This is further supported by the relative power (%FTP) maintained in the HI (M = 105; SD = 0.05; Author response image 5, purple) and LI (M = 58; SD = 0.06; Author response image 5, green) exercise conditions (difference: t(37) = 44.58, P < 2.2e-16, d = 6.46) confirming the accuracy of the implemented FTP test as well as the maintained power throughout the cycling blocks. Thus, we would argue that participants in the current study predominantly exercised the heavy domain during the HI exercise condition. We have included the relative Power in Figure 3A, replacing the absolute Power.

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have now included a discussion of the validity of the FTP20 test as a demarcation point concerning the intensity domains.

      Author response image 5.

      Raincloud plot of relative power (%FTP) during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks.

      (29) Line 676 - please provide further information on each cycling run/block. Did each participant complete a total of 4 runs (i.e., a total of 40 minutes of exercise), with 2 runs completed at a high intensity and 2 runs completed at a low intensity in a randomised order (e.g., for one participant this could be 10 minutes at low, followed by 10 minutes at high, followed by 10 minutes a low, followed by 10 minutes at high)? Figure 1 details this nicely, however, it would be helpful to read in-text.

      The reviewer is correct in assuming that there were a total of 4 blocks on each experimental day. Participants completed cycling in 2 blocks at HI and in 2 blocks at LI in a pseudorandomised order. This order was kept constant across experimental days (i.e. completing the same block order on Day 2 and Day 3). We have detailed this further in the Methods section.

      (30) Discussion general - it is possible that EIH could be induced via different mechanisms and that these mechanisms are at least in part due to exercise intensity. For example, EIH from higher-intensity exercise might have some contribution from CPM.

      We thank the reviewer for their comment. Previous research aimed to disentangle the two seemingly similar mechanisms of exercise-induced hypoalgesia (EIH) and conditioned pain modulation (CPM) (Ellingson et al., 2014; Rice et al., 2019; Samuelly-Leichtag et al., 2018; Vaegter et al., 2014). CPM is typically induced by applying a tonic noxious stimulus that decreases pain sensitivity to another noxious stimulus applied simultaneously or shortly after at a distant body part (Graven-Nielsen & Arendt-Nielsen, 2010). Despite EIH and CPM showing distinct mechanisms, it cannot be completely ruled out that there are at least partially overlapping mechanisms driving the two phenomena (Rice et al., 2019). Due to our study design, where the time difference between cycling blocks and the applied pain was on average five minutes, it is unlikely that CPM is the driving pain modulatory mechanism in our study setup.

      (31) Line 101 - as this was preregistered, should the study design be followed and then reported?

      We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (32) Line 110 - please provide some data on the fitness levels and how this is classified as high/low.

      The FTP (relative to body weight) was used as an estimate of cardiovascular and endurance fitness (Valenzuela et al., 2018). We refrained from classifying the fitness levels dichotomously as low or high since this is a subjective measure in a sample of healthy individuals of diverse fitness levels. Instead, we utilised the FTP as a more nuanced metric for comparison.

      (33) Lines 159-160 - in the context of the difference in intensity between the sessions. But, it is likely that the high-intensity exercise would have posed quite different relative challenge between participants.

      We thank the reviewer for their comment. As described above, we did not obtain direct measurements of VO2max or blood lactate levels making it challenging to determine the precise intensity domain in which each participant was operating in the current study. However, all participants received the same instructions to the BORG rating scale ensuring the comparability of RPE across participants to a certain extent.

      (34) Figure 3C - what instructions and familiarisation were given to participants regarding the 6-20 Borg scale? In Figure 3C it looks as though several participants rated the low exercise intensity at 6. This would/should be equivalent to sitting quietly, so it looks as though at least several participants did not understand how to use the RPE - please discuss.

      Indeed, three participants rated the LI exercise condition at 6 due to an error in the translation of the scale instruction. Participants were instructed that the lower anchor point of the scale (6) referred to ‘extremely light’ instead of ‘no exertion’. Thus, we have rescaled the RPE ratings where a rating of 6 now corresponds to a 7 (‘extremely light’) on the BORG scale and again calculated the paired t-test. There is still a significant difference in the RPE between exercise intensities (t(38) = 19.65, P < 2.2e-16, d = 3.69; Author response image 6). We have corrected this in the manuscript accordingly and updated Figure 3C.

      Author response image 6.

      Raincloud plot of rating of perceived exertion (RPE) on the BORG scale during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks. A rating of 6 reflects ‘no exertion’ and 20 reflects ‘maximal exertion’.

      (35) Line 171 - is (37, 38) a reference?

      We apologise for this oversight and have now updated the references to be displayed correctly.

      (36) Line 176-18 - is this interaction sufficiently powered? Differences between sexes are not mentioned in the pre-registered study

      We have conducted an additional post-hoc power analysis for the interaction of drug, fitness level, and sex on differential heat pain ratings. We employed the power analysis for mixed models implemented in R (powerCurve) with 1000 simulations. This revealed that with a power of α = 0.8, a sample size of n = 27 would have been sufficient to detect this effect (Author response image 7). Despite not having preregistered the factor ‘sex’, we believe that the observed results provide valuable insights that contribute to a deeper understanding of the data. We have established these analyses to be exploratory, emphasising the need for caution in their interpretation. However, we feel it is essential to report these findings to inform future studies, ensuring that such factors are adequately considered.

      Author response image 7.

      Post-hoc power analysis for behavioural effects from the linear mixed effects (LMER) model with interaction drug, fitness level, and sex using the R package powerCurve with α = 0.8 and 1000 simulations.

      (37) Line 227 - this is not what this analysis shows. The comparison is low vs high-intensity exercise on pain modulation, not exercise vs. no exercise. You cannot conclude that aerobic exercise has no effect on pain modulation because you did not do that comparison (i.e. no baseline (without exercise) for pain).

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (38) Methods General - why was a control condition not used, or at least a baseline pain response, so that low/high-intensity exercise could be compared to a baseline? Given this, I'm not sure I agree with the study conclusions (abstract: 'These results indicate that aerobic exercise has no overall effect on pain in a mixed population sample') because you have compared high vs low-intensity exercise, not exercise vs. no exercise.

      As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise. This reviewer suggests an interesting interpretation of the data suggesting that exercise-induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect.

      Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration.

      (39) Line 285 - or that better-trained individuals have a greater EIH response to higher intensity exercise, but both those of low and high fitness have established EIH after low intensity exercise. Given there isn't a 'no exercise' baseline, it is hard to make conclusions about EIH effect generally, only comparisons between high/low exercise intensity.

      We thank the reviewer for their comment. We agree that we cannot establish whether all participants showed a hypoalgesic response to the LI exercise with the current study design. However, our results show that participants with higher fitness levels showed increased hypoalgesia after HI exercise compared to those with lower fitness levels. We have refined the sentence accordingly.

      (40) Figure 7A - the regression line here is not that convincing.

      We acknowledge the reviewers’ concern regarding the regression line. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      (41) Line 354 - the NLX infusion was double-blind, but what are the implications of participants knowing that they completed high/low-intensity exercise - this cannot be blinded.

      The reviewer is correct that the exercise intensities cannot be blinded. To account for potential expectation effects of exercise on several psychological and physiological domains (including pain), participants completed a questionnaire on the calibration day where they had to indicate their expectations of to what extent acute exercise affects several domains (Lindheimer et al., 2019). They could rate each domain on a Likert scale ranging from ‘large decrease’ (-3) to ‘large increase’ (3) with 0 denoting ‘no effect’. This format was chosen to allow measuring the direction and magnitude of expectation effects and to avoid being directive or suggestive (Lindheimer et al., 2019). Despite including other psychological and physiological domains in the questionnaire (i.e., stress, anxiety, energy, memory) we focused on the specific pain domains (muscle pain, joint pain, and whole body pain) to establish participant’s expectations regarding the effect of acute exercise on pain. We tested whether the expectation ratings for each pain type were significantly different from 0 (no effect) using a one-sample t-test.

      There was no significant effect for muscle pain (t(38) = 1.78, P = 0.08, M = 0.39, SE = 0.12), joint pain (t(38) = -0.12, P = 0.90, M = -0.03, SE = 0.11), or ‘whole-body pain (t(38) = -1.05, P = 0.30, M = -0.21, SE = 0.12) suggesting there to be no expectation effect on these pain domains in the overall sample (Supplemental Figure S10A). Since there is variation in the data we calculated the correlation of the expectation ratings in the different pain domains with the difference score between the pain ratings in the SAL condition (LI – HI rating; Supplemental Figure S10B). This analysis yielded no significant correlation in either of the pain domains (joint pain: r = 0.11, P = 0.49; muscle pain: r = -0.07, P = 0.68; whole-body pain: r = 0.07, P = 0.68).

      Moreover, given that we have not been able to show a difference between the exercise intensities on pain modulation, expectation effects are likely not to contribute to this null effect.

      (42) Line 356-358 - and this comparison (and primary hypothesis) is not blinded.

      While we agree with the reviewer that this comparison is not – and potentially cannot be – blinded, we would like to reiterate our results from the previous paragraph that indicate that such expectation effects of exercise on pain were not present in the sample and, thus, did not seem to have influenced the results. It is noteworthy that the double-blind design of our study design specifically pertains to the pharmacological intervention employed.

      (43) Line 358-360 - this could be explained by both types of exercise inducing EIH via the same mechanism (which is disrupted by NLX).

      We thank the reviewer for their comment and would like to refer back to the reviewer's comment number 38 for a response to this.

      (44) Line 360-361 - this conclusion cannot be drawn, because you have only compared high vs low intensity exercise. So, the conclusion should be 'These results suggest that there is no difference between high and low aerobic exercise intensity on heat-induced pain'.

      We agree with the reviewer and have rephrased the sentence to reflect the claim accurately.

      (45) Line 396 - as previously discussed, this conclusion cannot be drawn through this study design.

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (46) Line 399 - please expand on this point - it is critical to the hypothesis and should also be included in the introduction. What intensities/duration/dose of aerobic exercise is generally established to cause EIH?

      We thank the reviewer and agree that this is a crucial aspect that requires further specification. Below we have expanded on the duration/intensities shown to elicit exercise-induced hypoalgesia and included a concise version of this detailed paragraph in the manuscript introduction.

      For aerobic exercise, different methods have been employed to determine exercise intensity levels i.e., through the VO2max, age-predicted HRmax, or incremental intensities (Koltyn, 2002). Most studies using VO2max as a measure of exercise intensity (Koltyn et al., 1996; Micalos & Arendt-Nielsen, 2016; Vaegter et al., 2014) were able to induce hypoalgesia with HI levels ranging between 65%-75% VO2max. When using the HRmax as a measure of determining exercise intensities, HI exercise at 70%-75% of the HRmax has been shown to produce greater hypoalgesia compared to moderate intensity at 50% HRmax (Naugle et al., 2014; Vaegter et al., 2014). Furthermore, previous research has suggested that HI exercise produces greater hypoalgesia compared to LI exercise (60-70% HRmax vs. light activity: M. D. Jones et al., 2019; 70% vs. 50% HRmax: Naugle et al., 2014; 75% vs. 50% VO2max: Vaegter et al., 2014).

      Furthermore, different durations can be regarded as suitable with durations between 8 minutes to 2 hours of aerobic exercise having been shown to induce hypoalgesia (for review see Koltyn (2002)). Hoffman et al. (2004) showed a hypoalgesic response after 30 minutes but not after 10 minutes at 75% VO2max of cycling. In contrast, other studies were able to induce hypoalgesia at 10-15 minutes of HI aerobic exercise (75% VO2may: Gomolka et al., 2019; 63% VO2max: Gurevich et al., 1994; self-paced: Haier et al., 1981; 60-70% HRmax: Jones et al., 2019; 85% HRmax: Sternberg et al., 2001; 75% VO2max: Vaegter et al., 2015).

      (47) Line 400-401 - please define high intensity.

      We thank the reviewer for their comment. The referenced studies by Vaegter et al. (2014) and Jones et al. (2019) based the estimation of HI and LI exercise on an age-related target heart rate corresponding to VO2max and HRmax, respectively. In Vaegter et al. (2014), the HI condition corresponded to 75% VO2max, while the LI to 50% VO2max. In Jones et al. (2019), the HI exercise condition corresponded to 60% and 70% of HRmax, while the LI condition was defined as pedalling slowly against a light resistance of 0.5 kg of force to maintain a rating of perceived exertion (RPE) not above resting. We have included this clarification in the relevant section to elucidate the intensities of the chosen exercise conditions.

      (48) Line 403-405 - I'm not sure I follow (perhaps I have misunderstood) - pain induction was completed after exercise in the MRI scanner, so there was no distraction effect of exercise in either condition. A baseline could have been established in the same way and there would be exactly the same conditions, just without prior exercise.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation allows for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. Nevertheless, it is important to note that previous studies (Brooks et al., 2017; Sprenger et al., 2012) have shown that cognitive pain modulation is mediated by endogenous opioids. Therefore, tasks with different attentional loads potentially influence post-task pain ratings. Although, we agree with the reviewer that the effect of distraction or attentional load would be minimal in the MR scanner, there still could be an effect of different cognitive loads from exercise vs. no exercise. Nevertheless, we focus the discussion on investigating the dose-response relationship between different exercise intensities where an ‘active’ control condition might contribute to a more nuanced understanding of exercise-induced pain modulation.

      (49) Line 403-411 - this is fine (although I do not agree that this was the best methodological decision), however, it does limit the conclusions that can be drawn (as previously mentioned). That is, you cannot conclude that no EIH occurred, only that there was no difference between low and high-intensity exercise in post-exercise pain response.

      We agree with the reviewer that the comparison of HI vs. LI exercise does not allow for an interpretation of the overall effect of exercise as opposed to no exercise on pain modulation. The comparison of HI and LI exercise allows the investigation of a dose-response relationship of these distinct exercise intensities. While LI exercise might not be a 'pure' control condition in the traditional sense, it is valuable for exploring the complexities of exercise and pain interaction.

      (50) Line 419-422 - sorry I do not follow - you say that moderate intensity exercise most reliably induces EIH but then select exercise intensities that are likely to be in the heavy or severe intensity domain? Please also include in this discussion the limitations of FTP20 as a threshold marker (see Wong et al) and the implications on the results/conclusions.

      We thank the reviewer for their comment. In the referenced sentence, we have defined the HI exercise as described in the reviews. Specifically, Wewege and Jones (2020) reported hypoalgesia to be greater after higher-intensity exercise, although the intensity was not further specified. Naugle et al. (2012) noted that HI exercise (i.e., 75% of VO2max) produced greater hypoalgesia, while Koltyn (2002) indicated that hypoalgesia occurs at intensities ranging from 60% to 75% of VO2max but more reliably at 75% VO2max or higher. Consequently, we have removed the term ‘moderate’, as it does not accurately reflect what has been reported in the reviews and could be misleading. Moreover, we have clarified the specific criteria for what is considered high (or higher) intensity exercise in the referenced reviews.

      We kindly ask the reviewers to refer back to the previous comment (reviewer comment number 28) regarding the discussion of the intensity domains and the FTP20 test as demarcation point for these intensity domains.

      (51) Line 422-425 - indeed, pacing is an important element of this test, which inexperienced cyclists have difficulty with when they are not provided with proper familiarisation.

      We agree with the reviewer that the FTP20 test has mainly been validated and employed in experienced cyclists and requires further validation in non-athletes of both sexes. However, since we have used an extensive warm-up period and several paced steps (intervals, 5-minute time-trial) as well as recovery periods (Supplemental Table S1) based on McGrath et al. (2019) we propose that participants were thoroughly familiarised with the elements of pacing before the estimation of the FTP in the 20-minutes took place. On average, participants showed a variation of M = 21.80 Watts (SE = 1.44 Watts) during the 20-minute paced FTP20 test (Supplemental Figure S11A). Interestingly, our data suggests that participants with a higher FTP showed higher variation of power output (Watts) during the 20-minute FTP test compared to individuals with lower fitness levels (Supplemental Figure S11B).

      (52) Line 425-427 - please remove this, the RPE difference between exercise bouts is not evidence that participants cycled at FTP.

      We thank the reviewer for their comment. However, we would propose to include the rating of perceived exertion (RPE) since it shows that the exercise intensities have been perceived as significantly different by the participants. This behavioural measure of exertion is potentially important for a broader audience to understand the exercise implementation beyond physiological markers.

      (53) Line 432 - high vs. low-intensity aerobic exercise

      We have changed the sentence accordingly to support the claim of the study that there was no difference in exercise-induced pain modulation between HI and LI aerobic exercise.

      (54) Line 447-449 - this seems contradictory to the first line of this paragraph (430-432) - i.e. that the heterogenous sample may have caused the null finding. Why deliberately select a participant sample that is likely to lead to a null effect?

      In the current study, we aimed to include participants of diverse fitness levels and both sexes to verify the findings on exercise-induced pain modulation in a broader population. We consider this important concerning translational aspects of EIH. Indeed, our heterogeneous sample may have ‘caused’ the observed null effect, but at the same time, it suggests that more homogenous (sometimes composed solely of male athletes) samples employed in many earlier studies might have skewed the understanding of exercise-induced pain modulation and thus unintentionally suggested a (non-existing) generalisation of this effect to the general population.

      (55) Line 532-456 - although Koltyn found electrical pain to have the greatest effect?

      The review by Naugle et al. (2012) reported effect sizes for heat (Cohens d = 0.59) and pressure pain intensity (d = 0.69) following aerobic exercise but did not provide effect sizes for electrical pain intensity. They noted that the effect size for electrical pain intensity after isometric exercise was d = 0.40, which is lower than that for heat and pressure pain. While Koltyn (2002) stated that electrical and pressure stimuli induce exercise-induced hypoalgesia more consistently than thermal pain, the study did not clarify whether this applies to pain threshold, intensity, or tolerance, nor did they provide effect sizes. Given that electrical, pressure, and heat pain are the most commonly used methods to induce quantifiable pain in the context of exercise studies (Vaegter and Jones, 2020), we based our decision to use heat and pressure pain primarily on Naugle et al.'s findings.

      (56) Line 468-469 - why leave out content that was pre-registered (i.e. difference between pressure and heat pain) but includes analysis that wasn't (i.e. sex differences)? If a study is going to be pre-registered, then isn't it important to follow that design?

      We thank the reviewer for this comment. We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (57) Line 532-525 - and how could this have been accounted for?

      We apologise for any confusion, as we are unsure about the specific reference the reviewer is making based on the provided line numbers. We believe the question relates to how the potential effects of endocannabinoids were considered in the current study design, and we've addressed that in our response. In human studies, it is not possible to centrally block endocannabinoids, which makes it difficult to directly estimate their role in exercise-induced pain modulation in humans. Measuring endocannabinoids in the blood might not adequately capture changes in endocannabinoid levels in the brain throughout the different exercise intensity conditions. Despite these limitations, exploring the role of endocannabinoids in exercise-induced pain modulation presents a promising avenue for future research that could enhance our understanding of pain mechanisms and improve pain management strategies.

      58) Limitations General - please include the other limitations discussed in this review.

      Done.

      (59)Line 530 - please amend this conclusion, in line with previous comments.

      Done.

      We would like to thank the reviewer for critically evaluating the manuscript and providing insightful comments. We appreciate the reviewer recognising the strengths of our work and believe that their suggestions will contribute to improving the quality of the manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

      We plan to provide more direct evidence that matriptase activation is regulated by the Rho-ROCK pathway, utilizing antibodies that specifically recognize the activated form of matriptase.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

      As noted, while we have demonstrated that Rho activation is both necessary and sufficient to induce matriptase activation, the precise mechanism by which Rho mediates this activation remains unclear. As discussed in the manuscript, several potential molecular mechanisms could underlie the contribution of Rho to matriptase activation. As part of our future work, we intend to systematically investigate each of these mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca2+-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca2+-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca2+-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca2+ activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)" The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel." For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter" For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We deeply appreciate the reviewer comments on our manuscript. We have proceeded with all the minor changes mentioned. We also want to emphasize three major points:

      (1) Reversine has been shown to have several off-targets effects. Including inducing apoptosis (Chen et al. J Bone Oncol. 2024).

      (2) Hypoxia varies from 2% to 6%. Our definition of hypoxia is 5% concentration of oxygen with 5% concentration of CO<sub>2</sub>, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8%.

      (3) Natale et al. 2004 (Dev Bio) and Sozen et al. 2015 (Mech of Dev) described that inhibition of p38 deeply affect the development of pre-implantation embryos after the 8-cell stage. For this reason, comprehensible dissect the interaction between p53, HIF1A and p38 during aneuploid stress is challenging. We do not discard a double function of p38 during lineage specification and in response to DNA damage.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 69: Please add the species used in your cited publications (murine).

      Fixed

      (2) Line 72: Consider changing "Because" to "As".

      Fixed

      (3) Line 88: "from the nuclei" - please refer to where the reader may find the example provided (Figure S1A).

      Fixed

      (4) Line 89: This should be Figure S1B as no quantification is presented in S1A. S1A only contains examples of micronuclei.

      Fixed

      (5) Line 91: Refer to Figure S1A.

      Fixed

      (6) Line 91-93: Are these numbers correct? The query arises from the numbers presented in Figure S1B. Please define how the median was calculated; is it micronuclei CREST+ plus micronuclei CREST-?

      Fixed. We did not differentiate in these percentage the presence of CREST.

      (7) Line 95: extra/missing bracket?

      Fixed

      (8) Line 88-91:

      [a] Regarding the number of cells with micronuclei in this text, please clarify your sample size and how the percentages were calculated as they currently do not align (e.g., are these the total number of embryos from a single experimental replicate?).

      Also, different numbers are found here and in the figure legend: (DMSO-22/256 cells from 32 embryos; Rev-82/144 cells from 18 embryos; AZ-182/304 cells from 38 embryos) vs. Fig S1 legend (DMSO-n=128 cells; Rev-72 cells; AZ-152 cells).

      [c] Is the median calculated using the numbers presented above? If yes, then the numbers do not tally, please check (DMSO-22/256 cells=8.6%; Rev-82/144 cells=56.9%; AZ-182/304 cells =59.9%) vs. Line 91-93: DMSO=12.5%, Rev=75%; AZ=62.5% blastomeres had micronuclei.

      The percentage represents the average of aneuploidy per embryo after normalization.

      See table for DMSO. This number represents the average of aneuploid cells each aneuploid embryo has. Notice that some embryos are fully diploid. Some have more that 12.5% -> 25%. Most of the aneuploid embryos have 12.5% of aneuploidy. It is not black and white as how many aneuploid cell there is in the sample but a full understanding of how aneuploid are the aneuploid embryos in each sample.

      Author response image 1.

      (9) Line 108:

      [a] "n=28 per treatment" please clarify whether this refers to the number of embryos or cells and also add how many independent replicate experiments this data is representative of. as the text only refers to Figure 1C you can remove the P-values for ** and *.

      Number of embryos. Fixed

      (10) Line 111: Suggest citing Figure 1C at the end of the sentence.

      Fixed

      (11) Line 118-119: the reference to figures require updating to ensure they refer to the appropriate figure; ...decidua (Figure S1C)...viable E9.5 embryos (Figure S1D).

      Fixed

      (12) Line 126: A description of the data in Figures 1D and 1E is missing. Also, consider describing the DNA damage observed in the DMSO control group. Visually, it appears that DNA damage reduces from the 8-cell to the morula stage (Figure 1E) but increases at the blastocyst stage (Figure S2A)? Point for discussion for a normal rate of DNA damage?

      Agree, there is some DNA damage in the TE in blastocyst

      (13) Line 134: 8 EPI and 4 PE cells in what group?

      Fixed: DMSO-treated embryos

      (14) Line 137: Could this also suggest that AZ and reversine induce DNA damage through a different mechanism/pathway, resulting in the differential impact observed? Despite both being inhibitors of Mps1.

      This is a possibility.

      (15) Line 153: the legend for Figure 2A says the Welch t-test was performed, but the Mann-Whitney U-test was stated here. Which is correct?

      Welch’s t-test

      (16) Line 155: ...at the blastocyst stage. Compared to what?

      DMSO-treated embryos

      (17) Line 160: Data in Figure 2B requires the definition of P-values for , , . Please add one for and remove the one for **.

      Fixed

      (18) Line 173-174: Data in Fig. 4 requires the definition of the P-values for ****. Please remove the others.

      Fixed

      (19) Line 180: Instead of jumping across figures, this section would benefit from stating the numbers directly to allow for an accurate comparison, e.g. 64 and 7 in Figure 2D vs. X and Y in Figure 1C.

      (20) Line 187: Hif1a should be italicised.

      Fixed

      (21) Line 197: Based on the description here, I believe you are missing a reference to Figure 1A.

      Fixed

      (22) Line 203: Instead of jumping across figures, this section would benefit from stating the numbers directly to allow for accurate comparison, "particularly in the TE and PE" (67 vs 54; and 11 vs 6, respectively).

      (23) Line 209-210:

      [a] "...lowered the number of yH2AX foci..." is this a visual observation as quantification was performed for yH2AX intensity, not quantification of foci?

      A description for PARP1 levels in morula stage embryos was presented ("...relatively low in morula), but not for yH2AX levels at this stage of development. Missing description?

      Fixed

      (24) Line 235: This sentence would benefit from being specific about the environmental conditions...eg "Under normoxia, DMSO/AZ3146-treated...",

      (25) Line 238: The sentence should reference Figure 4F not 4G.

      Fixed

      (26) Line 242-243:

      [a] "slightly increased... in the TE (49.06%) and PE (50%) but, strikingly, reduced... EPI (33.3%)" compared to what and in which figure?

      Assuming you are comparing normoxia (4F) to hypoxia (4G), the numbers change for the TE (46.75% to 49.06%, respectively), EPI (42.88% to 33.3%, respectively), and PE (28.57% to 50%, respectively); yet these data were described as "strikingly different" for EPI (9.58 decrease) but only "slightly increased" for PE (21.42 increase). Suggest using appropriate adjectives to describe the results.

      Fixed

      (27) Line 256: It is stated in line 255 that treatment was performed at the zygote stage, yet this sentence says reversine treatment occurred at the 2-cell stage? Which is correct? Please amend appropriately. Refer to the comment below regarding adding a schematic to aid readers

      Fixed

      (28) Line 259: "n>27 per treatment" please clarify whether this refers to the number of embryos or cells and also add how many independent replicate experiments this data is representative of. Data in Figures S5A-B requires a definition of P-values for , . Please remove for *, *.

      Fixed

      (29) Line 261: AZ3146/reversine stated here, the figure shows Reversine/AZ3146. Please consider being consistent.

      Fixed

      (30) Line 263: "... normal morphology and cavitation (Figure S5D); however the image presented for Rev/DMSO and Rev/AZ3146 chimeras appear smaller with a distorted/weird shape when compared to DMSO/AZ. I believe the description does not match the images presented.

      Fixed

      (31) Line 267: "...similar results as 8-cell stage derived chimeras"; however, there is only a reference to Fig S5E which depicts 2-cell/zygote stage (see comment above for line 256 regarding required clarification of stage of treatment) derived chimeras. There is also a missing reference to Figure 4B, D, and/or F?

      Fixed

      (32) Line 271: add a reference to Figure S5E.

      Fixed

      (33) Line 283: "AZ3146/reversine" should be "Reversine/AZ3146" to match the figure.

      Fixed

      (34) Line 284: Figures 5E-F show both morphology and cavitation; the text should reflect this.

      Fixed

      (35) Line 281-285: I think this text requires editing to improve clarity. It is difficult for this reader to understand the authors' interpretation of the results....inhibiting HIF1A reduces morphology and cavitation. That's correct. However, this also diminished the contribution of AZ3146-treated cells to all 3 cell lineages; this is not quite accurate. AZ3146-treated cells were significantly reduced in total cell numbers because TE was significantly reduced. It is not appropriate to generalise this result to all 3 lineages, as EPI and TE appear to increase AZ's contribution following IDF treatment, albeit non-statistically significant.

      Fixed

      (36) Line 320: citation? ....reversine-treated embryos. Is this referring to your previous publication...Bolton 2016?

      Fixed

      (37) Line 344: missing space between 7.5 and IU.

      Fixed

      (38) Line 358: animal ethics approval number/code missing.

      Fixed

      (39) Line 397: missing space between "...previously" and "(Bermejo...".

      Fixed

      (40) Line 417: missing space between "...control" and "(Gu et...".

      Fixed

      (41) Line 421: missing space between "protocol" and "(Eakin...".

      Fixed

      (42) Line 427-429: Medium-grade mosaic chimeras were referred to as DMSO:AZ:Rev (3:3:2) here; but Figure 4 and associated legend says otherwise. Please amend appropriately. Were all medium mosaics generated in this manner? As I could only find Rev/AZ chimeras; my understanding of the Rev/AZ chimeras is 1:1 Rev:AZ instead of 3:2:3 DMSO:Rev:AZ.

      Fixed

      (43) Line 428: "reversine-treaded: please correct spelling.

      Fixed

      (44) Line 593: "n=28 per treatment" Please clarify whether this refers to the number of embryos or cells and also add how many independent replicate experiments this data is representative of.

      Fixed

      (45) Line 597: "through morula stage" when compared to what group?

      DMSO-treated embryos

      (46) Line 598: Data in Figure S5A-B requires the definition of P-values for , , **. Please remove for . Please define the error bars. SEM/95% confidence interval?

      Fixed

      (47) Line 604-607: Regarding 2B, no statistical test is stated yet Mann-Whitney was stated in Line 160 of the results section. Please confirm which test was used and include it in both sections for consistency.

      Fixed

      (48) Line 608: "Chemical downregulation of HIF1A"... this is not described in the results/methods section or shown in the figure. Please amend all sections for accuracy.

      Fixed

      (49) Line 613: please change "effect in" to "effect on".

      Fixed

      (50) Line 614: Please clarify the number of embryos or cells and also add how many independent replicate experiments this data is representative of. Data in Figure 2 also requires a definition of P-value for ****.

      Fixed

      (51) Line 625: Please clarify the number of embryos or cells and also add how many independent replicate experiments this data is representative of. Data in Figure 3 also requires a definition of P-value for ****.

      Fixed

      (52) Line 627: description requires editing to improve accuracy "...is only slightly increased at the 8-cell stage after exposure to reversine and AZ3146". However, the results show significantly higher DNA damage with Reversine treatment, but not with AZ when compared to DMSO. Please amend accordingly.

      Fixed

      (53) Line 629: Please define the error bars. SEM/95% confidence interval?

      Fixed

      (54) Line 634-635: it is written here that chimeras were made from 1:1 DMSO/AZ3146 and Reversine/DMSO; but Figure 4A shows 1:1 DMSO(grey):AZ3146(blue), and Reversine(red):AZ3146(blue), which contradicts the legend + method section; see comments for Line 427-429. Please amend these sections accordingly.

      Fixed

      (55) Line 648: reversine/AZ3146 chimeras? Refer to comments above.

      Fixed

      (56) Line 649-650: ...AZ-treated blastomeres contribute similarly to reversine-blastomeres to the TE and EPI but significantly increase contribution to the EPI? Please add the appropriate comparison group.

      Fixed

      (57) Line 652: Please clarify the number of embryos or cells and also add how many independent replicate experiments this data is representative of.

      Fixed

      (58) Line 664: Please clarify the number of embryos or cells and also add how many independent replicate experiments this data is representative of.

      Fixed

      (59) Line 675-677: FigS1B legend requires a definition of P-value for * and ****, can omit **

      Fixed

      (60) Line 678-680: FigS1C and S1D legend: sample size and replicates? Only mentioned in Lines 117-120, which requires back calculation.

      Fixed

      (61) Line 682-694: (1) Fig. S2B legend: missing P-value description for *** and ***; statistical test not stated, please add. Also, Figure S2E, only requires the definition for , and can omit others.

      Fixed

      (62) Line 702: FigS3B: missing description for ****, omit others.

      Fixed

      (63) Line 704-705: missing description for Rev/AZ group and hypoxia vs. normoxia conditions.

      Fixed

      (64) Line 712-713: "n>27 per treatment" Please clarify whether this refers to the number of embryos or cells and also add how many independent replicate experiments this data is representative of. Data in Figure S5 requires the definition of P-values for , . Please remove for *, *.

      Fixed

      (65) Line 713-715: could benefit from a description of which were marked from mTmG; e.g. why is DMSO, Rev, Rev in Green for [D]; does this mean 2-cell stage chimeras were only made with embryos treated with DMSO and Reversine? Has it been tested if you did this with AZ3146, do the proportions remain the same? This would be interesting to know.

      DMSO and reversine are in green because they are the cells mark with green in the chimeras. We also did chimeras with AZ3146. Hope this clarifies.

      (66) Line 719-721: why is there a difference between the proportion of aneuploid cells for the different chimeras? AZ in D/AZ, and R/AZ groups; while only R in D/R group? Is this because you only count those that were marked with mTmG (e.g. based on [Fig S5D])? (67) Line 724: low- and medium-grade chimeras would indicate quality, recommend adding low/medium grade aneuploid/mosaic chimeras.

      Fixed

      (68) Line 725-729: it may be my mistake, but I think the results description is not found within the Results section, but only here in the legend? Please include this detail also in the Results section.

      Fixed

      (69) Line 729: which is AZ or Rev cells?

      (70) References - Page number missing for some references; abbreviated version vs. non abbreviated version of journal titles used. Please be consistent/meet journal requirements.

      Fixed

      (71) Figures

      Figure 1: [C] both AZ-NANOG and DMSO-SOX17 have mean/median(?) of 11 cells (described in results), yet in this figure (on the same axis) these groups are not level. Are the numbers correct? This is also the case for Rev-SOX17 which is described in the results as having 8 cells yet appears to be above the 8 mark in the graphs; AZ-CDX2, which has 64 cells yet appears to be below the 60 mark; AZ-total, which has 82 cells yet appears to be below the 80 mark. In [E] the label orientation, "ns" has both horizontal and vertical orientation. Please make appropriate changes throughout to reflect accuracy.

      Figure 3: [C] As for Figure 1, DMSO-NANOG, which is described in results as having 14 cells, yet appears to be below the 13 mark in the graph; DMSO-SOX17, which has 6 cells yet appears to be above the 7 mark.

      These is due to average

      Figure 4: [D and E] random numerals appear in the bars on the graph. 9,10 and 7, 14? Are these sample size numbers? If they are, they should appear in all bars/groups or in the legend.

      Yes, these are sample sizes

      Figure 5: [D and G] same comment as for Fig 4 above, random numbers in the graph.

      Yes, these are sample sizes

      (72) Supplementary figures. Figure S2 [A] No quantification? This is important to add as representative images are only a 2D plane, which can be easily misinterpreted. [E] Should the y-axis label be written as "Number of cells normalised to DMSO group", or similar? Or is there a figure missing to depict the ratio of cells in each cell lineage normalised to the DMSO group, which is the description written in the legend? But I don't see a figure showing the ratio, just the absolute number of cells. Is this a missing figure or a mislabelled axis?

      Quantification at the blastocyst stage is misleading due to high cellular heterogeneity.

      Reviewer #3 (Recommendations for the authors):

      (1) The statement in the abstract: "embryos with a low proportion of aneuploid cells have a similar likelihood of developing to term as fully euploid embryos" Line 48-50 Capalbo does not really answer as the biopsy may not be reflective of ICM.

      This is a great point. Trophectoderm biopsies may not reflect the real proportion of aneuploidy in the ICM. We emphasize this in discussion and Fig. S4.

      (2) Line 69/70, at least 50% Singla et al/Bolton. It would be helpful to elaborate a bit more on this study. How can this be assessed when analysis results in destruction?

      (3) Differences in the developmental potential of reversine versus AZ-treated embryos. It is not entirely clear why. The differences in non-dividing cells if any are small, and the -crest cells are rather minor also. Could these drugs have other effects that are not evaluated in the study?

      Yes, specifically, reversine has been shown to have several off-targets effects. Including inducing apoptosis (Chen et al 2024).

      (4) Lines 45-46 understanding of reduction of aneuploidy should mention/discuss the paper of attrition/selection, of the kind by the Brivanlou lab for instance, or others. As well as allocation to specific lineages, including the authors' work.

      Dr. Brinvanlou experiments in gastruloids do not represent the same developmental stage of pre-implantation embryos. Comparison between models is debatable.

      (5) Line 53: human experiments are more limited due to access to samples. What does 'not allowed' mean? By who, where?

      NIH does not allow to experiment with human embryos for ethical reasons.

      (6) The figure callouts to S1A in lines 93,97. What is a non-dividing nucleus? For how long is it observed?

      A non-dividing nucleus is an accumulation of DNA in a round form without define separation of the chromosomes and their specific kinetochores (CREST antibody). The presence of non-dividing nucleus during the 4 -to-8 cell stage can indicate activation of the spindle assembly checkpoint during prometaphase. Example of non-dividing nucleus can be observed in Fig S1.B.

      (7) Line 108 A relatively minor effect on cell number and quality of blastocysts is observed. It is not surprising that thereafter, developmental potential is also high. At that stage, what are the individual cell karyotypes?

      Due to technical limitations, we can’t determine the specific karyotypes of these cells.

      (8) Line 153. The p53 increase of 1.3 fold is not dramatic.

      The levels of p53 at the morula stage is 7-fold differences. In contrast, at the blastocyst stage, a change in 1.3-fold is indeed less dramatic. This can be a result of the elimination of aneuploid cells or mechanism to counter the activation of the p53 pathway, like overexpression of the Hif1a pathway.

      (9) Line 155. Is there a more direct way to test for p38 activation?

      Natale et al 2004 (Dev Biol) and Sozen et al 2015 (Mech of Dev) described that inhibition of p38 deeply affect the development of pre-implantation embryos after the 8-cell stage. For this reason, comprehensible dissect the interaction between p53, HIF1A and p38 during aneuploid stress is challenging. We do not discard a double function of p38 during lineage specification and in response to DNA damage.

      (10) Line 191/192 Low oxygen conditions, is this equal to hypoxia? What is the definition of hypoxia here? The next sentence says physiological. Is that the same or different?

      Low oxygen can be defined as hypoxia. This varies from 2% to 6%. Our definition of hypoxia is 5% concentration of oxygen with 5% concentration of CO<sub>2</sub>, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8%.

      (11) The question is whether there is something specific about HIF1 and aneuploidy, or whether another added stress would have similar effects on the competitiveness of treated cells.

      That is a great follow up of our work.

      (12) Line 300. Is p21 unregulated at the protein level or mRNA level? Please indicate.

      mRNA level.

      (13) Figure 1D/E H2Ax intensity is cell cycle phase-dependent. It might be meaningful to count foci by the nucleus and show both ways of analysis.

      (14) Check the spelling of phalloidin.

      Fixed in text and figures!

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Deng et al reports single cell expression analysis of developing mouse hearts and examines the requirements for cardiac fibroblasts in heart maturation. The work includes extensive gene expression profiling and bioinformatic analysis. The prenatal fibroblast ablation studies show new information on the requirement of these cells on heart maturation before birth.

      The strengths of the manuscript are the new single cell datasets and comprehensive approach to ablating cardiac fibroblasts in pre and postnatal development in mice. Extensive data are presented on mouse embryo fibroblast diversity and morphology in response to fibroblast ablation. Histological data support localization of major cardiac cell types and effects of fibroblast ablation on cardiac gene expression at different times of development.

      A weakness of the study is that the major conclusions regarding collagen signaling and heart maturation are based on gene expression patterns and are not functionally validated.

      Reviewer #2 (Public review):

      This study aims to elucidate the role of fibroblasts in regulating myocardium and vascular development through signaling to cardiomyocytes and endothelial cells. This focus is significant, given that fibroblasts, cardiomyocytes, and vascular endothelial cells are the three primary cell types in the heart. The authors employed a Pdgfra-CreER-controlled diphtheria toxin A (DTA) system to ablate fibroblasts at various embryonic and postnatal stages, characterizing the resulting cardiac defects, particularly in myocardium and vasculature development. Single-cell RNA sequencing (scRNA-seq) analysis of the ablated hearts identified collagen as a crucial signaling molecule from fibroblasts that influences the development of cardiomyocytes and vascular endothelial cells.

      This is an interesting manuscript; however, there are several major issues, including an over-reliance on the scRNA-seq data, which shows inconsistencies between replicates.

      We thank the reviewer for carefully reading our revised manuscript. All of the questions listed below were raised in the previous round and have been addressed in the current revision. As noted in the “Recommendations for the Authors” section, the reviewer has no additional comments at this time.

      Some of the major issues are described below.

      (1) The CD31 immunostaining data (Figure 3B-G) indicate a reduction in endothelial cell numbers following fibroblast deletion using PdgfraCreER+/-; RosaDTA+/- mice. However, the scRNA-seq data show no percentage change in the endothelial cell population (Figure 4D). Furthermore, while the percentage of Vas_ECs decreased in ablated samples at E16.5, the results at E18.5 were inconsistent, showing an increase in one replicate and a decrease in another, raising concerns about the reliability of the RNA-seq findings.

      (2) Similarly, while the percentage of Ven_CMs increased at E18.5, it exhibited differing trends at E16.5 (Fig. 4E), further highlighting the inconsistency of the scRNA-seq analysis with the other data.

      (3) Furthermore, the authors noted that the ablated samples had slightly higher percentages of cardiomyocytes in the G1 phase compared to controls (Fig. 4H, S11D), which aligns with the enrichment of pathways related to heart development, sarcomere organization, heart tube morphogenesis, and cell proliferation. However, it is unclear how this correlates with heart development, given that the hearts of ablated mice are significantly smaller than those of controls (Figure 3E). Additionally, the heart sections from ablated samples used for CD31/DAPI staining in Figure 3F appear much larger than those of the controls, raising further inconsistencies in the manuscript.

      (4) The manuscript relies heavily on the scRNA-seq dataset, which shows inconsistencies between the two replicates. Furthermore, the morphological and histological analyses do not align with the scRNA-seq findings.

      (5) There is a lack of mechanistic insight into how collagen, as a key signaling molecule from fibroblasts, affects the development of cardiomyocytes and vascular endothelial cells.

      (6) In Figure 1B, Col1a1 expression is observed in the epicardial cells (Figure 1A, E11.5), but this is not represented in the accompanying cartoon.

      (7) Do the PdgfraCreER+/-; RosaDTA+/- mice survive after birth when induced at E15.5, and do they exhibit any cardiac defects?

      Reviewer #3 (Public review):

      Summary:

      The authors investigated fibroblasts' communication with key cell types in developing and neonatal hearts, with focus on critical roles of fibroblast-cardiomyocyte and fibroblast-endothelial cells network in cardiac morphogenesis. They tried to map the spatial distribution of these cell types and reported the major pathways and signaling molecules driving the communication. They also used Cre-DTA system to ablate Pdgfra labeled cells and observed myocardial and endothelial cell defects at development. They screened the pathways and genes using sequencing data of ablated heart. Lastly they reported a compensatory collagen expression in long term ablated neonate heart. Overall, this study provides us with important insight on fibroblasts' roles in cardiac development and will be a powerful resource for collagens and ECM focused research.

      Strengths:

      The authors utilized good analyzing tools to investigate on multiple database of single cell sequencing and Multi-seq. They identified significant pathways, cellular and molecular interactions of fibroblasts. Additionally, they compared some of their analytic findings with human database, and identified several groups of ECM genes with varying roles in mice.

      Weaknesses:

      This study is majorly based on sequencing data analysis. At the bench, they used very strident technique to study fibroblast functions by ablating one of the major cell population of heart. Also, experimental validation of their analyzed downstream pathways will be required eventually.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Most of my comments have been adequately addressed. Additional comments on new data in the revised manuscript are below.

      (1) In the new figure S11, it is not really possible to draw major conclusions on mitral valve morphology and maturation since the planes of sections to not seem comparable. Observations regarding attachment to the papillary muscle might be dependent on the particular section being evaluated. However, it is useful to see that the valves are not severely affected in the ablated animals.

      We appreciate the reviewer’s comment and agree with the reviewer’s observation. Accordingly, we have updated the manuscript by removing the original conclusion-related statement and instead highlighting that the valves were not severely affected in the ablated animals (page 6).

      (2) In the last supplemental figure S19, it is not possible to determine if results are or are not statistically significant for n=2 as shown for FS and EF for the ablated animals and controls. The text says that there is a trend of improved heart function, but evaluation of additional animals is needed to support this conclusion.

      We thank the reviewer for the comment and agree that a sample size of n = 2 is too small to draw meaningful conclusions. As previously suggested by the reviewer, we have removed this result from the manuscript (page 10).

      Reviewer #2 (Recommendations for the authors):

      The manuscript has greatly improved following the revision, and I have no additional comments to offer.

      Thanks!

      Reviewer #3 (Recommendations for the authors):

      Authors did a good job addressing questions asked at first review. However, I have some minor concerns.

      (1) The paper notes that collagen signaling is observed in FB-VasEC in humans, but not in FB-VenCM, unlike mice. Did authors analyze predictive ligand receptor interaction as they did with control and ablated mice heart? This could add valuable new insights that how FB regulate ventricular CM in human heart.

      Thank you. We have analyzed the predicted ligand-receptor interactions between Fb and Ven_CM, as well as between Fb and Vas_EC, using human scRNA-seq data. The results are provided as a supplemental figure (Fig. S8C).

      (2) The authors provided data on Defect in CD31 expression in several models. Did they observed any other phenotypes associated with defective endothelial or vascular system? Such as, blood accumulation in pericardium, larger/smaller capillaries? Did they also examined percentage of Cdh5+ cells?

      We thank the reviewer for the questions. We did not observe clear evidence of blood accumulation in the pericardium of the ablated hearts, as shown in figure 3B, 3E, 6B, and 6F. Additionally, we did not perform Cdh5 staining in either the control or ablated hearts.

      (3) Please mention the sample age of Figure 2A-C.

      These are single-cell mRNA sequencing data from CD1 mice across 18 developmental stages, ranging from E9.5 to P9. We have added this information to the manuscript (page 4).

      (4) Please follow the same style to describe X axis in graphs in Figure 3D (and all similar graphs in manuscript) as followed in 3G.

      Thank you. We assume the reviewer was referring to the descriptions in the relevant figure legends. We have updated the legend for Figure 3D to ensure consistency with the description provided for Figure 3G (page 15).

      (5) It is important to provide echocardiographic M mode images with a comparable number of cardiac cycles in control and ablated (Fig. 6H).

      We thank the reviewer for the comment. As explained in our previous response, the echocardiographic data for both control and mutant mice were collected in conscious animals. The differences in their cardiac cycles reflect variations in heart rate, which represent a disease phenotype and cannot be altered. Therefore, we are unable to provide M-mode images with a similar number of cardiac cycles for control and ablated mice.

      (6) In the long-term neonatal ablation experiments, collagen expressions return to normal. The manuscript attributes this to possible "compensatory expression," Do they have any thoughts how this is regulated? Are other cell types stepping in, or are surviving FBs proliferating?

      We thank the reviewer for the question. As suggested, the compensatory collagen expression could be driven by surviving fibroblasts or other cell types. Since we currently lack evidence to exclude either possibility, we believe both could be contributing factors.

      (7) While collagen is shown to be a dominant signaling molecule, its centrality is inferred primarily from scRNAseq and ligand-receptor predictions. Did authors try any functional rescue experiment (e.g., exogenous collagen supplementation or receptor blockade) to directly validate this pathway's role in vivo?

      We thank the reviewer for the comment. As noted in our previous revision in response to similar questions from the other two reviewers, we agree that these rescue experiments are of interest but are beyond the scope of the current study. We plan to pursue these investigations in future work and share our findings when available.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

      First, we thank the reviewer for their fair evaluation of our manuscript and for providing constructive feedback. Regarding the detection of matriptase activation—which Reviewer 1 identified as a weakness—we fully agree that direct validation is crucial. Therefore, in this revision we have carried out additional experiments using the M69 antibody, which specifically recognizes the activated form of matriptase. Details of these new experiments are provided in our point-by-point responses below.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

      Thank you for your valuable feedback on our manuscript. As Reviewer 2 points out, the precise mechanism by which the Rho/ROCK pathway activates matriptase remains unclear. We have discussed the possible molecular mechanisms in the Discussion section. Elucidating the detailed mechanism of matriptase activation will be the focus of our future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1-1 - Matriptase activation by Rho: The authors show activation of Matriptase in western blots by the simple reduction of (full-length?) protein level in Figures 5 and 7. Most publications however show activated Matriptase either by antibodies detecting specifically the active form (including the publication referenced in this manuscript), or the appearance of the activated form next to the inactive form (based on different molecular weights). Therefore, it is not completely clear whether the treatment with Rho activators (Figure 5) results in an overall decrease of Matriptase, or really in an increase in the activated form. Therefore, the authors should show the actual increase of the active form. As a control, the impact of camostat treatment and overexpression of Hai1 on the active form of Matriptase could be included. It also should be indicated in the figure legend how long cells had been treated with the drugs before being subjected to lysis. Moreover, the western blots need to be quantified.

      We performed a more rigorous analysis using the M69 antibody, which specifically recognizes the activated form of matriptase and has been widely used in previous studies(e.g. Benaud et al., 2001; Hung et al., 2004; Wang et al., 2009). We likewise confirmed a significant increase in M69 signals by both western blotting and immunostaining from samples in which matriptase was activated by acid medium treatment (Figure 5A). Crucially, we also observed matriptase activation with the M69 antibody both in Rho/ROCK activator-treated cells (Figure 5A) and in differentiated granular-layer-like cells (Figures 7A and 7D). These findings strongly support the conclusion that matriptase is activated downstream of the Rho/ROCK pathway.

      Comment 1-2 - Based on their results, the authors conclude that Matriptase cleaves TROP2 in the SG2 layer of the epidermis, which is a little contradictory to former studies, which have shown Matriptase to be most prominently expressed and active in the basal layer and only little in the spinous layer (e.g Chen et al., Matriptase regulates proliferation and early, but not terminal, differentiation of human keratinocytes. J Invest Dermatol.2013). In this light, one could also argue that inhibiting Matriptase "simply" reduces epidermal differentiation. Can other differentiation markers be tested to rule that the effects on tight junctions are secondary consequences of interferences with earlier / more global steps of keratinocyte differentiation?

      As the reviewer noted, previous studies have demonstrated that matriptase is essential for keratinocyte differentiation, and that it cleaves substrates beyond EpCAM and TROP2—any of which could potentially influence the differentiation process. To test this possibility, we chose to monitor maturation of adherens junction (AJ) as an indicator of keratinocyte differentiation into granular-layer cells. Prior work has shown that during differentiation into granular-layer cells, AJs develop and experience increased intercellular mechanical tension, and that this rise in mechanical tension at AJs is critical for subsequent TJ formation (Rübsam et al., 2017). To assess AJ tension, we stained with the α-18 monoclonal antibody, which specifically recognizes the tension-dependent conformational change of α-catenin, a core AJ component. In control cells, differentiation into granular-layer like cells led to a marked increase in α-18 signal at cell–cell adhesion sites. Importantly, when HaCaT cells were treated with Camostat to inhibit matriptase and then induced to differentiate, we observed an equivalent increase in α-18 signal at AJs (Figure 7F). However, we did not detect claudin enrichment at cell-cell contacts under these conditions (Figures 7F and 7H). These results suggest that matriptase inhibition does not impair AJ maturation during granular-layer differentiation, but does profoundly disrupt TJ formation. While we cannot rule out the possibility that matriptase acts more broadly from these results, we judged that a comprehensive substrate survey lies outside the scope of the present manuscript.

      Comment 1-3 - In addition, as in Figure 5, full-length levels of Matriptase in Figure 7A need to be complemented by the active version to demonstrate more convincingly that TROP2 processing coincides with (and is most likely caused by) increased Matriptase activation. In the quantification in 7B, levels actually go up again after 2 and 4 hours. How is that explained, and what would this mean with respect to tight junction formation seen at 24 h of differentiation? The TROP2 cleavage shown in Figure 7A should be quantified.

      This comment is related to Comment 1-1. Using the M69 antibody, which specifically recognizes the activated matriptase, we directly demonstrated that matriptase activation occurs during the differentiation of granular layer-like cells (Figures 7A and 7D). Furthermore, we performed quantitative analysis of TROP2 cleavage and found that, compared with undifferentiated cells, differentiation into granular-layer like cells was accompanied by an increase in the cleaved TROP2 fragments (Figures 7A and 7B).

      Minor points:

      Comment 1-4 - Figure 1B and C: Including orthogonal views would be a nice add-on to appreciate the findings.

      In the revised version, we have added the corresponding orthogonal views to Figure 1B and Figure 1C.

      Comment 1-5 - Figure 2D: last row: indication of orthogonal view.

      We stated that the bottom panels are orthogonal views in the figure legend of Figure 2D.

      Comment 1-6 - Figure 3A: quantification is missing. GST-Rhotekin assay is missing in methods.

      In the revised manuscript, we have added quantitative analysis for Figure 3A. We have also supplemented the Materials and Methods section with detailed information on the GST–Rhotekin assay used to quantify levels of active RhoA.

      Comment 1-7 - Figure 4H: quantification of the Western blot is missing.

      In the revised manuscript, we have added quantitative analysis for Figure 4H as Figure 4I.

      Comment 1-8 - Figure 5 and 6: Quantifications of Western blots are missing.

      In the revised manuscript, we have added quantitative analyses for Figure 5D as Figure 5F and for Figure 6A as Figure 6B.

      Comment 1-9 - Figure 7C: quantification of the Western blot is missing.

      Figure 7C does not present western blotting data. For the other western blotting results, we have added quantitative analyses as suggested by Reviewer 1.

      Comment 1-10 - Figure 8I: Including Hai1 overexpression would be good for a complete picture.

      Following Reviewer 1’s suggestion, we have added staining data for Hai1-overexpressing cells to Figure 8J.

      Comment 1-11 - Line 377: The authors say they found Matriptase always present in lateral membranes. I did not find evidence for this in the manuscript.

      Previous studies have demonstrated that in polarized epithelial cells, matriptase is localized to the basolateral membrane below TJs (Buzza et al., 2010; Wang et al., 2009). We also found that matriptase consistently localizes to the basolateral membrane but more crucially that it becomes activated there during differentiation into granular layer cells. We added these new data as Figures 7C-7E in the revised manuscript. These findings suggest that matriptase activation occurs without a change in its subcellular localization.

      Comment 1-12 - Line 381: should most likely say: and ADAM17 but it is not known whether...

      We corrected the sentence in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The authors have added a significant number of quantifications verifying their observations, which was a major comment in a previous version of the manuscript and thus I have only a few minor comments which should be addressed.

      Comment 2-1 - It is not required to have scale bars in every image of a panel if the same scale is used.

      Unnecessary scale bars were removed. Specifically, scale bars were removed from Figure 1B, 1C, 1F, 8F, 8G, and 8H.

      Comment 2-2 - Throughout all figures: Please state for non-quantified images whether this is a representative example and for how many technical or biological repeats this is representative. Also for "N" number, state what the N stands for and if this is what the dots in the graph represent. Are these the number of junctions or technical, experimental or biological repeats?

      In the revised manuscript, we have added the number of independent experiments and corresponding “N” values to the Quantification and Statistical Analysis subsection of the Materials and Methods.

      Comment 2-3 - Some Zooms have a scale bar (6d), and some do not (e.g. 5b).

      The scale bar was removed from the magnified image in Figure 6D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Strengths:

      The subject is of importance.

      Weaknesses:

      The conclusions are too strong for the presented data. The lack of statistical analysis makes this paper incomplete. The novelty of the findings is not clear.

      We have strengthened the data analysis by including appropriate statistical tests to support our conclusions more convincingly. Additionally, we have refined the description of the research background to better emphasize the novelty and significance of our findings. Please see the detailed responses below for further information.

      Major issues:

      (1) The novelty is in question since in the Abstract the authors highlight their main finding, which is that both the chemotaxis complex and the flagella localize to the same pole, as surprising. However, in the Introduction they state that "pathway-related receptors that mediate chemotaxis, as well as the flagellum are localized at the same cell pole17,18". I am not a pseudomonas researcher and from my short glance at these references, I could not tell whether they report colocalization of the two structures to the same pole. However, I trust the authors that they know the literature on the localization of the chemotaxis complex and flagella in their organism. See also major issue number 5 on the novelty regarding the involvement of c-di-GMP.

      We thank the reviewer for this valuable comment and appreciate the opportunity to clarify our statements.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      (2) Statistics for the microscopy images, on which most conclusions in this manuscript are based, are completely missing. Given that most micrographs present one or very few cells, together with the fact that almost all conclusions depend on whether certain macromolecules are at one or two poles and whether different complexes are in the same pole, proper statistics, based on hundreds of cells in several fields, are absolutely required. Without this information, the results are anecdotal and do not support the conclusions. Due to the importance of statistics for this manuscript, strict statistical tests should be used and reported. Moreover, representative large fields with many cells should be added as supportive information.

      We thank the reviewer for this important comment, which significantly improves the rigor and persuasiveness of our manuscript.

      For the colocalization analyses presented in Fig. 1D and Fig. 2B, we quantified 145 and 101 cells with fluorescently labeled flagella, respectively, and observed consistent colocalization of the chemoreceptor complexes and flagella in all examined cells (now added in the figure legends). Regarding the distribution patterns of chemoreceptors shown in Fig. 3A, we have now included comprehensive statistical analyses for both wild-type and mutant strains. For each strain, more than 300 cells were analyzed across at least three independent microscopic fields, providing robust statistical power (detailed data are presented in Fig. 3C).

      To further strengthen the evidence, statistical tests were applied to confirm the significance and reproducibility of our findings (Fig. 3C). In addition, representative large-field fluorescence images containing numerous cells have been added to the supplementary materials (Fig. S1 and Fig. S3).

      The problem is more pronounced when the authors make strong statements, as in lines 157-158: "The results revealed that the chemoreceptor arrays no longer grow robustly at the cell pole (Figure 2A)". Looking at the seven cells shown in Figure 2A, five of them show polar localization of the chemoreceptors. The question is then: what is the percentage of cells that show precise polar, near-polar, or mid cell localization (the three patterns shown here) in the mutant and in the wild type? Since I know that these three patterns can also be observed in WT cells, what counts is the difference, and whether it is statistically significant.

      We thank the reviewer for raising this important point. Following the reviewer's suggestion, we have now analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization. For each strain, more than 200 cells across three independent fields of view were quantified.

      Our statistical analysis shows that in the wild-type strain, approximately 98% of cells exhibit precise polar localization of the chemotaxis complex. In contrast, the ΔflhF mutant displays a clear shift in distribution, with about 5% of cells showing mid-cell localization and 9.5% showing near-polar localization. These differences demonstrate a significant alteration in the spatial pattern upon flhF deletion.

      We have revised the relevant text in lines 166-170 accordingly and included the detailed statistical data in the newly added Fig. S4.

      Even for the graphs shown in Figures 3C and 3D, where the proportion of cells with obvious chemoreceptor arrays and absolute fluorescence brightness of the chemosensory array are shown, respectively, the questions that arise are: for how many individual cells these values hold and what is the significance of the difference between each two strains?

      The number of cells analyzed for each strain is indicated in the original manuscript: 372 wild-type cells (line 123), 221 ΔflhF cells (line 172), 234 ΔfliG cells (line 197), 323 ΔfliF cells (line 200), 672 ΔflhFΔfliF cells (line 202), and 242 ΔmotAΔmotCD cells (line 207). For each strain, data were collected from three independent fields of view. We have now also provided the number of cells in Fig. 3 legend.

      We have now performed statistical comparisons using t-tests between strains. Notably, the measured values in Fig. 3C exhibit a clear, monotonic decrease with successive gene knockouts, supporting the robustness of the observed trend.

      Regarding the absolute fluorescence intensity shown in the original Fig. 3D, the mutants did not display consistent directional changes compared to the wild type. Reliable comparison of absolute fluorescence intensity requires consistent fluorescent protein maturation levels across strains. Given the likely variability in maturation levels between strains, we concluded that this data may not accurately reflect true differences in protein concentrations. Therefore, we have removed the fluorescence intensity graph from the revised manuscript to avoid potential misinterpretation.

      (3) The authors conclude that "Motor structural integrity is a prerequisite for chemoreceptor self-assembly" based on the reduction in cells with chemoreceptor clusters in mutants deleted for flagellar genes, despite the proper polar localization of the chemotaxis protein CheY. They show that the level of CheY in the WT and the mutant strains is similar, based on Western blot, which in my opinion is over-exposed. "To ascertain whether it is motor integrity rather than functionality that influences the efficiency of chemosensory array assembly", they constructed a mutant deleted for the flagella stator and found that the motor is stalled while CheY behaves like in WT cells. The authors further "quantified the proportion of cells with receptor clusters and the absolute fluorescence intensity of individual clusters (Figures 3C-D)". While Figure 3DC suggests that, indeed, the flagella mutants show fewer cells with a chemotaxis complex, Figure 3D suggests that the differences in fluorescence intensity are not statistically significant. Since it is obvious that the regulation of both structures' production and localization is codependent, I think that it takes more than a Western blot to make such a decision.

      We thank the reviewer for the suggestions. To further clarify that the assembly of flagellar motors and chemoreceptor clusters occurs in an orderly manner rather than being merely codependent, we performed additional experiments. Specifically, we constructed a ΔcheA mutant strain, in which chemoreceptor clusters fail to assemble. Using in vivo fluorescent labeling of flagellar filaments, we observed that the proportion of cells with flagellar filaments in the ΔcheA strain was comparable to that of the wild type (Fig. S5).

      In contrast, mutants lacking complete motor structures, such as ΔfliF and ΔfliG, showed a significant reduction in the proportion of cells with obvious receptor clusters (Fig. 3C). Based on these results, we conclude that the structural integrity of the flagellar motor is, to a certain extent, a prerequisite for the self-assembly of chemoreceptor clusters.

      Accordingly, we have revised the relevant statement in lines 213-217 of the manuscript to reflect this clarification.

      (4) I wonder why the authors chose to label CheY, which is the only component of the chemotaxis complex that shuttles back and forth to the base of the flagella. In any case, I think that they should strengthen their results by repeating some key experiments with labeled CheW or CheA.

      We thank the reviewer for this valuable suggestion. In our study, we initially focused on the positional relationship between chemoreceptor clusters and flagella, then investigated factors influencing cluster distribution and assembly efficiency. The physiological significance of motor and cluster co-localization was ultimately proposed with CheY as the starting point.

      Previous work by Harwood's group demonstrated that both CheY-YFP and CheA-GFP localize to the old poles of dividing Pseudomonas aeruginosa cells. Since our physiological hypothesis centers on CheY, we chose to label CheY-EYFP in our experiments.

      To further strengthen our conclusions, we constructed a plasmid expressing CheA-CFP and introduced it into the cheY-eyfp strain via electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP (Fig. S2), confirming that CheY-EYFP accurately marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 119-123) and added these data as Fig. S2.

      (5) The last section of the results is very problematic, regarding the rationale, the conclusions, and the novelty. As far as the rationale is concerned, I do not understand why the authors assume that "a spatial separation between the chemoreceptors and flagellar motors should not significantly impact the temporal comparison in bacterial chemotaxis". Is there any proof for that?

      We apologize for the lack of clarity in our original explanation. The rationale behind the statement was initially supported by comparing the timescales of CheY-P diffusion and temporal comparison in chemotaxis. Specifically, the diffusion time for CheY-P to traverse the entire length of a bacterial cell is approximately 100 ms (refs 39&40), whereas the timescale for bacterial chemotaxis temporal comparison is on the order of seconds (ref 41).

      To clarify and strengthen this argument, we have expanded the discussion as follows:

      The diffusion coefficient of CheY in bacterial cells is about 10 µm2/s, which corresponds to an estimated end-to-end diffusion time on the order of 100 ms (refs 40&41). If the chemotaxis complexes were randomly distributed rather than localized, diffusion times would be even shorter. In contrast, the timescale for the chemotaxis temporal comparison is on the order of seconds (ref. 42). Additionally, a study by Fukuoka and colleagues reported that intracellular chemotaxis signal transduction requires approximately 240 ms beyond CheY or CheY-P diffusion time (ref. 41). Moreover, the intervals of counterclockwise (CCW) and clockwise (CW) rotation of the P. aeruginosa flagellar motor under normal conditions are 1-2 seconds, as determined by tethered cell or bead assays (refs. 30&43).

      Taken together, these indicate that for P. aeruginosa, which moves via a run-reverse mode, the potential 100 ms reduction in response time due to co-localization of the chemotaxis complex and motor has a limited effect on overall chemotaxis timing.

      We have revised the corresponding text accordingly (lines 238-245) to better explain this rationale.

      More surprising for me was to read that "The signal transduction pathways in E. coli are relatively simple, and the chemotaxis response regulator CheY-P affects only the regulation of motor switching". There are degrees of complexity among signal transduction pathways in E. coli, but the chemotaxis seems to be ranked at the top. CheY is part of the adaptation. Perfect adaptation, as many other issues related to the chemotaxis pathway, which include the wide dynamic range, the robustness, the sensitivity, and the signal amplification (gain), are still largely unexplained. Hence, such assumptions are not justified.

      We apologize for the confusion and imprecision in our original statements. Our intention was to convey that the chemotaxis pathway in E. coli is relatively simple compared to the more complex chemosensory systems in P. aeruginosa. We did not mean to generalize this simplicity to all signal transduction pathways in E. coli.

      We acknowledge that E. coli chemotaxis is a highly sophisticated system, involving processes such as perfect adaptation, wide dynamic range, robustness, sensitivity, and signal amplification, many aspects of which remain incompletely understood. CheY indeed plays a crucial role in adaptation and motor switching regulation.

      Accordingly, we have revised the original text (lines 249-255) to avoid any misunderstanding.

      More perplexing is the novelty of the authors' documentation of the effect of the chemotaxis proteins on the c-di-GMP level. In 2013, Kulasekara et al. published a paper in eLife entitled "c-di-GMP heterogeneity is generated by the chemotaxis machinery to regulate flagellar motility". In the same year, Kulasekara published a paper entitled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa". The authors did not cite these works and I wonder why.

      We apologize for having been unaware of these important references and thank the reviewer for bringing them to our attention. We have now cited the eLife paper and the PhD thesis titled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa" by Kulasekara et al.

      Regarding novelty, there are key differences between our findings and those reported by Kulasekara et al. While they proposed that CheA influences c-di-GMP heterogeneity through interaction with a specific phosphodiesterase (PDE), our results demonstrate that overexpression of CheY leads to an increase in intracellular c-di-GMP levels.

      We have revised the original text accordingly (lines 358-362) to clarify these distinctions.

      (6) Throughout the manuscript, the authors refer to foci of fluorescent CheY as "chemoreceptor arrays". If anything, these foci signify the chemotaxis complex, not the membrane-traversing chemoreceptors.

      We thank the reviewer for this clarification. We have revised the manuscript accordingly to refer to the fluorescent CheY foci as representing the chemotaxis complex rather than the chemoreceptor arrays.

      Conclusions:

      The manuscript addresses an interesting subject and contains interesting, but incomplete, data.

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. FlhF is known to cause the motor to localize to the pole in a different bacterial species, Vibrio cholera, but it is not involved in receptor localization in that bacterium. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data look to be high-quality.

      Weaknesses:

      However, the interpretations and conclusions drawn from the experimental observations are not fully justified in my opinion.

      I see two main issues with the evidence provided for the authors' claims.

      (1) Assumptions about receptor localization:

      The authors rely on YFP-tagged CheY to identify the location of the receptor cluster, but CheY is a diffusible cytoplasmic protein. In E. coli, CheY has been shown to localize at the receptor cluster, but the evidence for this in PA is less strong. The authors refer to a paper by Guvener et al 2006, which showed that CheY localizes to a cell pole, and CheA (a receptor cluster protein) also localizes to a pole, but my understanding is that colocalization of CheY and CheA was not shown. My concern is that CheY could instead localize to the motor in PA, say by binding FliM. This "null model" would explain the authors' observations, without colocalization of the receptors and motor. Verifying that CheY and CheA are colocalized in PA would be a very helpful experiment to address this weakness.

      We thank the reviewer for this valuable suggestion. We agree that verifying the colocalization of CheY and CheA would strengthen our conclusions. To address this, we constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex rather than the flagellar motor.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      (2) Argument for the functional importance of receptor-motor colocalization at the pole:

      The authors argue that colocalization of the receptors and motors at the pole is important because it could keep phosphorylated CheY, CheY-p, restricted to a small region of the cell, preventing crosstalk with other signaling pathways. Their evidence for this is that overexpressing CheY leads to higher intracellular cdG levels and cell aggregation. Say that the receptors and motors are colocalized at the pole. In E. coli, CheY-p rapidly diffuses through the cell. What would prevent this from occurring in PA, even with colocalization?

      We appreciate the reviewer's insightful question. The colocalization of both the signaling source (the kinase) and sink (the phosphatase) at the chemoreceptor complex at the cell pole results in a rapid decay of CheY-P concentration within approximately 0.2 µm from the cell pole, leading to a nearly uniform distribution elsewhere in the cell, as demonstrated by Vaknin and Berg (ref. 46). This spatial arrangement effectively confines high CheY-P levels to the pole region. When the motor is also localized at the cell pole, this reduces the need for elevated CheY-P concentrations throughout the cytoplasm, thereby minimizing potential crosstalk with other signaling pathways.

      We have revised the manuscript accordingly (lines 280-286) to clarify this point.

      Elevating CheY concentration may increase the concentration of CheY-p in the cell, but might also stress the cells in other unexpected ways. It is not so clear from this experiment that elevated CheY-p throughout the cell is the reason that they aggregate, or that this outcome is avoided by colocalizing the receptors and motor at the same pole. If localization of the receptor array and motor at one pole were important for keeping CheY-p levels low at the opposite pole, then we should expect cells in which the receptors and motor are not at the pole to have higher CheY-p at the opposite pole. According to the authors' argument, it seems like this should cause elevated cdG levels and aggregation in the delta flhF mutants with wild-type levels of CheY. But it does not look like this happened. Instead of varying CheY expression, the authors could test their hypothesis that receptor-motor colocalization at the pole is important for preventing crosstalk by measuring cdG levels in the flhF mutant, in which the motor (and maybe the receptor cluster) are no longer localized in the cell pole.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using ΔflhF mutants.

      First, as noted above, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in <i.E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild-type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness in this paper is that the authors never discussed how the flagellar gene expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? How many classes are there for these genes? Is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the insightful comments. P. aeruginosa possesses a four-tiered transcriptional regulatory hierarchy controlling flagellar biogenesis. Within this system, fliF and fliG belong to class II genes and are regulated by the master regulator FleQ. In contrast, chemotaxis-related genes such as cheA and cheW are regulated by intracellular free FliA, and currently, there is no evidence that FliA activity is influenced by proteins like FliG.

      To verify that the expression of core chemotaxis proteins was not affected by deletion of fliG, we performed Western blot analyses to compare CheY levels in wild-type, ΔfliF, and ΔfliG strains. We observed no significant differences, indicating that the reduced presence of receptor clusters in these mutants is not due to altered expression of chemotaxis proteins.

      Accordingly, we have revised the manuscript (lines 341-348) and updated Fig. 3B to reflect these findings.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers comment on several important aspects that should be addressed, namely: the lack of statistical analysis; the need for clarifications regarding assumptions made regarding receptor localization; the functional importance of receptor-motor colocalization; and the need for an elaborate discussion of flagellar gene expression. Also, two reviewers pointed out the need to prove the co-localization of CheY and CheA; This is important since CheY is dynamic, shuttling back and forth from the chemotaxis complex to the base of the flagella, whereas CheA (or cheW or, even better, the receptors) is considered less dynamic and an integral part of the chemotaxis complex.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      Line 43: "ubiquitous" - I would choose another word.

      We changed "ubiquitous" to "widespread".

      Line 49: "order" - change to organize.

      We changed "order" to "organize".

      Line 52: "To grow and colonize within the host, bacteria have evolved a mechanism for migrating...". Motility "towards more favorable environments" is an important survival strategy of bacteria in various ecological niches, not only within the host.

      We revised it to "grow and colonize in various ecological niches".

      Line 72: Define F6 in "F6 pathway-related receptors".

      The proteins encoded by chemotaxis-related genes collectively constitute the F6 pathway, which we have now explained in the manuscript text.

      Line 72-73: Do references 17 &18 really report colocalization of the chemotaxis receptor and flagella to the same pole? If these or other reports document such colocalization, then the sentence in the Abstract "Surprisingly, we found that both are located at the same cell pole..." is not correct.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      Line 108: "CheY has been shown to colocalize with chemoreceptors". The authors rely here (reference 29) and in other places on findings in E. coli. However, in the Introduction, they describe the many differences between the motility systems of P. aeruginosa and E. coli, e.g., the number of chemosensory systems and their spatial distribution (E. coli is a peritrichous bacterium, as opposed to the monotrichous bacterium P. aeruginosa). There seem to be proofs for colocalization of the Che and MCP proteins in P. aeruginosa, which should be cited here.

      Thank you for pointing this out. Harwood's group reported that a cheY-YFP fusion strain exhibited bright fluorescent spots at the cell pole, which disappeared upon knockout of cheA or cheW-genes encoding structural proteins of the chemotaxis complex. This strongly suggests colocalization of CheY with MCP proteins in P. aeruginosa. We have now cited this study as reference 17 in the manuscript.

      Figure 1B: Please replace the order of the schematic presentations, so that the cheY-egfp fusion, which is described first in the text, is at the top.

      We have modified the order of related images in Fig. 1B.

      Line 127: "by introducing cysteine mutations". Replace either by "by introducing cysteines" or by "by substituting several residues with cysteines".

      We changed the relevant statement to "by introducing cysteines".

      Line 144-145: "Given that the physiological and physical environments of both cell poles are nearly identical.". I think that also the physical, but certainly the physiological environment of the two poles is not identical. First, one is an old pole, and the other a new pole. Second, many proteins and RNAs were detected mainly or only in one of the poles of rod-shaped Gram-negative bacteria that are regarded as symmetrically dividing. Although my intuition is that the authors are correct in assuming that "it is unlikely that the unipolar distribution of the chemoreceptor array can be attributed to passive regulatory factors", relating it to the (false) identity between the poles is incorrect.

      We thank the reviewer for this important correction. We agree that the physiological environments of the two poles are not identical, given that one is the old pole and the other the new pole, and that many proteins and RNAs show polar localization in rod-shaped Gram-negative bacteria. Accordingly, we have revised the original text (lines 150-152) to read:

      “Despite potential differences in the physical and especially physiological environments at the two cell poles, it is unlikely that the unipolar distribution of the chemotaxis complex can be attributed to passive regulatory factors.”

      Lines 151-154: "Considering the consistent colocalization pattern between chemosensory arrays and flagellar motors in P. aeruginosa". Does the word consistent relate to different reports on such colocalization or to the results in Figure 1D? In case it is the latter, then what is the word consistent based on? All together only 7 cells are presented in the 5 micrographs that compose Figure 1D (back to statistics...).

      We thank the reviewer for raising this point. To clarify, the word "consistent" refers to the observation of colocalization shown in Figure 1D & Figure S3. As noted in the revised figure legend for Figure 1D, a total of 145 cells with labeled flagella were analyzed, all exhibiting consistent colocalization between flagella and chemosensory arrays. Additionally, we have included a new image showing a large field of co-localization in the wild-type strain as Figure S3 to better illustrate this consistency.

      Figure 2A: Omit "Subcellular localization of" from the beginning of the caption.

      We removed the relevant expression from the caption.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend checking that CheY localizes to the receptor cluster in PA. This could be done by tagging cheA with a different fluorophore and demonstrating their colocalization. It would also be helpful to check that they are colocalized in the delta flhF mutant.

      We thank the reviewer for this valuable suggestion. We constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      The experiments under- and over-expressing CheY part seemed too unrelated to receptor-motor colocalization. I think the authors should think about a more direct way of testing whether colocalization of the motor and receptors is important for preventing signaling crosstalk. One way would be to measure cdG levels in WT and in delta flhF mutants and see if there is a significant difference.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using flhF mutants.

      First, as noted in the response to your 2nd comment in Public Review, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Recommendations For The Authors):

      (1) Can the authors elaborate more on the hierarchy of flagellar gene expression in P. aeruginosa and how this relates to their work?

      We thank the reviewer for the suggestion. We have now described the hierarchy of flagellar gene expression in P. aeruginosa in lines 341-348.

      (2) I would suggest that the authors check other flagellar mutants (than FliF and FliG) where the motor is partially assembled (e.g., any of the rod proteins or the P-ring protein), together with FlhF mutant, to see how a partially assembled motor affects the assembly of the chemosensory cluster.

      We thank the reviewer for this valuable suggestion. The P ring, primarily composed of FlgI, acts as a bushing for the peptidoglycan layer, and its absence leads to partial motor assembly. We constructed a ΔflgI mutant and observed that the proportion of cells exhibiting distinct chemotactic complexes was similar to that of the wild-type strain, suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring. We have revised the original text accordingly (lines 217-220) and added the corresponding data as Figure S6.

      (3) I would suggest that the authors check the levels of CheY in cells induced with different concentrations of arabinose (i.e., using western blotting just like they did in Figure 3B).

      We have assessed the levels of CheY in cells induced with different concentrations of arabinose using western blotting, as suggested. The results have been incorporated into the manuscript (lines 274-275) and are presented in Figure S7.

      (4) To my eyes, most of the foci in FliF-FlhF mutant in Figure 3A are located at the pole (which is unlike the FlhF mutant in Figure 2). Is this correct? I would suggest that the authors also investigate this to see where the chemosensory cluster is located.

      We thank the reviewer for pointing this out. The distribution of the chemotaxis complex in the ΔflhFΔfliF strain was investigated and showed in Fig. S4. Indeed, most of the chemoreceptor foci in this mutant are located at the pole. This probably suggests that, in the absence of both FlhF and an assembled motor, the position of the receptor complex may be largely influenced by passive factors such as membrane curvature. This interesting possibility warrants further investigation in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this work, the authors recorded the dynamics of the 5-HT with fiber photometry from CA1 in one hemisphere and LFP from CA1 in the other hemisphere. They observed an ultra-slow oscillation in the 5-HT signal during both wake fulness and NREM sleep. The authors have studied different phases of the ultra-slow oscillation to examine the potential difference in the occurrence of some behavioral state-related physiological phenomena hippocampal ripples, EMG, and inter-area coherence).

      Strengths

      The relation between the falling/rising phase of the ultra-slow oscillation and the ripples is sufficiently shown. There are some minor concerns about the observed relations that should be addressed with some further analysis.

      Systematic observations have started to establish a strong relation between the dynamics of neural activity across the brain and measures of behavioral arousal. Such relations span a wide range of temporal scales that are heavily inter-related. Ultra-slow time-scales are specifically under-studied due to technical limitations and neuromodulatory systems are the strongest mechanistic candidates for controlling/modulating the neural dynamics at these time-scales. The hypothesis of the relation between a specific time-scale and one certain neuromodulator (5-HT in this manuscript) could have a significant impact on the understanding of the hierarchy in the temporal scales of neural activity.

      Weaknesses:

      One major caveat of the study is that different neuromodulators are strongly correlated across all time scales and related to this, the authors need to discuss this point further and provide more evidence from the literature (if any) that suggests similar ultra-slow oscillations are weaker or lack from similar signals recorded for other neuromodulators such as Ach and NA.

      The reviewer is correct to point out that the levels of different neuromodulators are often correlated. For example, most monoaminergic neurons, including serotonergic neurons of the raphe nuclei, show similar firing rates across behavioral states, firing most during wake behavior, less during NREM, and ceasing firing during ‘paradoxical sleep’ or REM (Eban-Rothschild et al 2018). Notably, other neuromodulators, such as acetylcholine (ACh), show the opposite pattern across states, with highest levels observed during REM, an intermediate level during wake behavior, and the lowest level during NREM (Vazquez et al. 2001). Despite these differences, ultraslow oscillations of both monoaminergic and non-monoaminergic neuromodulators, have been described, albeit only during NREM sleep (Zhang et al. 2021, Zhang et al. 2024, Osorio-Ferero et al. 2021, Kjaerby et al. 2022). How ultraslow oscillations of different neuromodulators are related has been only recently explored (Zhang et al. 2024). In this study, dual recording of oxytocin (Oxt) and ACh with GRAB sensors showed that the levels of the two neuromodulators were indeed correlated at ultraslow frequencies with a 2 s temporal shift. Furthermore, this shift could be explained by a hippocampal-to-lateral septum intermediate pathway, in which the level of ACh causally impacts hippocampal activity, which then in turn controls Oxt levels. Given the known temporal relationship between ripples, ACh and Oxt, and now with our work, between ripples and 5-HT, one could infer the relative timing of ultraslow oscillations of ACh, Oxt and 5-HT. While dual recordings of norepinephrine (NE) and 5-HT have not been performed, a similar correlation with temporal shift could be hypothesized given the parallel relationships between NE and spindles (OsorioFerero et al. 2021), and 5-HT and ripples, with the known temporal delay between ripples and spindles (Staresina et al. 2023). The fact that the locus coerulus receives particularly dense projections from the dorsal raphe nucleus (Kim et al. 2004) further suggests that 5-HT ultraslow oscillations could drive NE oscillations. How exactly ultraslow oscillations of serotonin are related to ultraslow oscillations of different neuromodulators in different brain regions remains to be studied.

      We have further addressed this question and how it relates to the issue of causality in the Discussion section of the manuscript (p. 13):

      “In addition to the difficulties involved with typical causal interventions already mentioned, the fact that the levels of different neuromodulators are interrelated and affected by ongoing brain activity makes it very hard to pinpoint ultraslow oscillations of one specific neuromodulator as controlling specific activity patterns, such as ripple timing. While a recent paper purported to show a causative effect of norepinephrine levels on ultraslow oscillations of sigma band power, the fact that optogenetic inhibition of locus coerulus (LC) cells, but also excitation, only caused a minor reduction of the ultraslow sigma power oscillation suggests that other factors also contribute (Osorio-Forero et al., 2021). Generally, it is thought that many neuromodulators together determine brain states in a combinatorial manner, and it is probable that the 5-HT oscillations we measure, like the similar oscillations in NE, are one factor among many.

      Nevertheless, given the known effects of 5-HT on neurons, it is not unlikely that the 5-HT fluctuations we describe have some impact on the timing of ripples, MAs, hippocampal-cortical coherence, or EMG signals that correlate with either the rising or descending phase. In fact, causal effects of 5-HT on ripple incidence (Wang et al. 2015, ul Haq et al. 2016 and Shiozaki et al. 2023), MA frequency (Thomas et al. 2022), sensory gating (Lee et al. 2020), which is subserved by inter-areal coherence (Fisher et al. 2020), and movement (Takahashi et al. 2000, Alvarez et al. 2022, Jacobs et al. 1991 and Luchetti et al. 2020) have all been shown. Our added findings that serotonin affects ripple incidence in hippocampal slices in a dose-dependent manner (Figure S1) further suggests that the relationship between ultraslow 5-HT oscillations and ripples we report may indeed result, at least in part, from a direct effect of serotonin on the hippocampal network.

      Whether these ‘causal’ relationships between 5-HT and the different activity measures we describe can be used to support a causal link between ultraslow 5-HT oscillations and the correlated activity we report remains an open question. To that point, some studies have described changes in ultraslow oscillations due to manipulation of serotonin signaling. Specifically, reduction of 5-HT1a receptors in the dentate gyrus was recently shown to reduce the power of ultraslow oscillations of calcium activity in the same region (Turi et al. 2024). Furthermore, psilocin, which largely acts on the 5-HT2a receptor, decreased NREM episode length from around 100 s to around 60 s, and increased the frequency of brief awakenings (Thomas et al. 2022). While ultraslow oscillations were not explicitly measured in this study, the change in the rhythmic pattern of NREM sleep episodes and brief awakenings, or microarousals, suggests an effect of psilocin on ultraslow oscillations during NREM. Although these studies do not necessarily point to an exclusive role for 5-HT in controlling ultraslow oscillations of different brain activity patterns, they show that changes in 5-HT can contribute to changes in brain activity at ultraslow frequencies.”

      A major question that has been left out from the study and discussion is how the same level of serotonin before and after the peak could be differentially related to the opposite observed phenomenon. What are the possible parallel mechanisms for distinguishing between the rising and falling phases? Any neurophysiological evidence for sensing the direction of change in serotonin concentration (or any other neuromodulator), and is there any physiological functionality for such mechanisms?

      We have added a paragraph in the discussion to address how this differentiation of the 5-HT signal may be carried out (Discussion, paragraph #3, p. 10):

      “In order for the ultraslow oscillation phase to segregate brain activity, as we have observed, the hippocampal network must somehow be able to sense the direction of change of serotonin levels. While single-cell mechanisms related to membrane potential dynamics are typically too fast to explain this calculation, a theoretical work has suggested that feedback circuits can enable such temporal differentiation, also on the slower timescales we observe (Tripp and Eliasmith, 2010). Beyond the direction of change in serotonin levels, temporal differentiation could also enable the hippocampal network to discern the steeper rising slope versus the flatter descending slope that we observe in the ultraslow 5-HT oscillations (Figure S2), which may also be functionally relevant (Cole and Voytek, 2017). The distinction between the rising and falling phase of ultraslow oscillations is furthermore clearly discernible at the level of unit responses, with many units showing preferences for either half of the ultraslow period (Figure S6). Another factor that could help distinguish the rising from the falling phase is the level of other neuromodulators, as it is likely the combination of many neuromodulators at any given time that defines a behavioral substate. Given the finding that ACh and Oxt exhibit ultraslow oscillations with a temporal shift (Zhang et al. 2024), one could posit that distinct combinations of different levels of neuromodulators could segregate the rising from the falling phase via differential effects of the combination of neuromodulators on the hippocampal network.”

      Functionally, the ability to distinguish between the rising and falling phases of an oscillatory cycle is a form of phase coding. A well-known example of this can be seen in hippocampal place cells, which fire relative to the ongoing theta oscillations. The key advantage of phase coding is that it introduces an additional dimension, i.e. phase of firing, beyond the simple rate of neural firing. This allows for the multiplexing of information (Panzeri et al., 2010), enabling the brain to encode more complex patterns of activity. Moreover, phase coding is metabolically more efficient than traditional spike-rate coding (Fries et al., 2007).

      Reviewer #2 (Public review):

      Summary:

      In their study, Cooper et al. investigated the spontaneous fluctuations in extracellular 5-HT release in the CA1 region of the hippocampus using GRAB5-HT3.0. Their findings revealed the presence of ultralow frequency (less than 0.05 Hz) oscillations in 5-HT levels during both NREM sleep and wakefulness. The phase of these 5-HT oscillations was found to be related to the timing of hippocampal ripples, microarousals, electromyogram (EMG) activity, and hippocampal-cortical coherence. In particular, ripples were observed to occur with greater frequency during the descending phase of 5-HT oscillations, and stronger ripples were noted to occur in proximity to the 5-HT peak during NREM. Microarousal and EMG peaks occurred with greater frequency during the ascending phase of 5-HT oscillations. Additionally, the strongest coherence between the hippocampus and cortex was observed during the ascending phase of 5-HT oscillations. These patterns were observed in both NREM sleep and the awake state, with a greater prevalence in NREM. The authors posit that 5-HT oscillations may temporally segregate internal processing (e.g., memory consolidation) and responsiveness to external stimuli in the brain.

      Strengths:

      The findings of this research are novel and intriguing. Slow brain oscillations lasting tens of seconds have been suggested to exist, but to my knowledge they have never been analyzed in such a clear way. Furthermore, although it is likely that ultra-slow neuromodulator oscillations exist, this is the first report of such oscillations, and the greatest strength of this study is that it has clarified this phenomenon both statistically and phenomenologically.

      Weaknesses:

      As with any paper, this one has some limitations. While there is no particular need to pursue them, I will describe ten of them below, including future directions:

      (1) Contralateral recordings: 5-HT levels and electrophysiological recordings were obtained from opposite hemispheres due to technical limitations. Ipsilateral simultaneous recordings may show more direct relationships.

      Although we argue that bilateral symmetry defines both the serotonin system and many hippocampal activity patterns (Methods: Dual fiber photometry and silicon probe recordings), we agree that ipsilateral recordings would be superior to describe the link between serotonin and electrophysiology in the hippocampus. In addition to noting that a recent study has adopted the same contralateral design (Zhang et al. 2024), we add a reference further supporting bilateral hippocampal synchrony, specifically of dentate spikes (Farrell et al. 2024). However, as functional lateralization has been recently proposed to underlie certain hippocampal functions in the rodent (Jordan 2020), future studies should ideally include both imaging and electrophysiology in a single hemisphere to guarantee local correlations rather than assuming inter-hemispheric synchrony. This could be accomplished using an integrated probe with attached optical fibers, as described in Markowitz et al. 2018, which is however technically more challenging and has, to our knowledge, not yet been implemented with fiber photometry recordings with GRAB sensors. Given the required separation of a few hundred micrometers between the probe shanks and the optical fiber cannula, it is important to consider whether the recordings are capturing the same neuronal populations. For example, there is a risk of recording electrical activity from dorsal hippocampal neurons while simultaneously measuring light signals from neurons in the intermediate hippocampus, which are functionally distinct populations (Fanselow and Dong 2009).

      (2) Sample size: The number of mice used in the experiments is relatively small (n=6). Validation with a larger sample size would be desirable.

      While larger sample sizes generally reduce the influence of random variability and minimize the impact of outliers on conclusions, our use of mixed-effects models mitigates these concerns by accounting for both inter-session and inter-mouse variability. With this approach, we explicitly model random effects, such as the variability between individual mice and sessions, alongside fixed effects (such as treatment), which ensures that our results are not driven by random fluctuations in a few individual mice or sessions. Furthermore, the inclusion of random intercepts and slopes in the models allows for the possibility that different animals and/or sessions have different baseline characteristics and respond to different degrees of magnitude to the treatment. In summary, while validating these findings with a larger sample size would certainly help detect more subtle effects, we are confident in the robustness of the conclusions presented.

      (3) Lack of causality: The observed associations show correlations, not direct causal relationships, between 5-HT oscillations and neural activity patterns.

      We agree that the data we present in this study is largely correlational and generally avoid claims of causality in the manuscript. In the Discussion section, we discuss barriers to interpreting typical causal interventions in vivo, such as optogenetic activation of raphe nuclei: “The two previously mentioned in vivo studies showing reduced ripple incidence…”(paragraph #10, pg. 12), as well as an added section on further causality considerations in the Discussion section of the manuscript (paragraph #12, pg. 13): “In addition to the difficulties involved with…”

      Due to these barriers, as a first step, we wanted to describe how physiological changes in serotonin levels are correlated to changes in the hippocampal activity. Equipped with a deeper understanding of physiological serotonin dynamics, future studies could explore interventions that modulate serotonin in keeping with the natural range of serotonin fluctuations for a given state. On that point, another challenge which we have not mentioned in the manuscript is that modulating serotonin, or any neuromodulator’s levels, has the potential, depending on the degree of modulation, to transition the brain to an entirely different behavioral state. This then complicates interpretation, as one is not sure whether effects observed are due to the changes in the neuromodulator itself, or secondary to changes in state. At the same time, 5-HT activity drives networks which in return can change the release of other neurotransmitters, leading to indirect effects.

      The results of our in vitro experiments suggest that a causal relationship between serotonin and ripples is possible (Figure S1). Though the hippocampal slice preparation is clearly an artificial model, it provides a controlled environment to isolate the effects of serotonin manipulation on the hippocampal formation, without the confounding influence of systemic 5-HT fluctuations in other brain regions. Notably, the dose-dependent effects of serotonin (5-HT) wash-in on ripple incidence observed in vitro closely mirror the inverted-U dose-response curve seen in our in vivo experiments across states, where small increases in serotonin lead to the highest ripple incidence, and both lower and higher levels correspond to reduced ripple activity. This parallel suggests that the gradual washing of serotonin in our in vitro system may mimic the tonic firing changes in serotonergic neurons that occur during state transitions in vivo. These findings underscore the importance of studying how different dynamics of serotonin modulation can differentially affect hippocampal network activity.

      (4) Limited behavioral states: The study focuses primarily on sleep and quiet wakefulness. Investigation of 5-HT oscillations during a wider range of behavioral states (e.g., exploratory behavior, learning tasks) may provide a more complete understanding.

      We agree that future studies should investigate a broader range of behavioral states. For this study, as we were focused on general sleep and wake patterns, our recordings were done in the home cage, and we limited ourselves to the basic behavioral states described in the paper. Future studies should be designed to investigate ultraslow 5-HT oscillations during different behaviors, such as continuous treadmill running. Specifically, a finer segregation of extended wake behaviors by level of arousal could greatly add to our understanding of the role of ultraslow serotonin oscillations.

      (5) Generalizability to other brain regions: The study focuses on the CA1 region of the hippocampus. It's unclear whether similar 5-HT oscillation patterns exist in other brain regions.

      Given the reported ultraslow oscillations of population activity in serotonergic neurons of the dorsal raphe nucleus (Kato et al. 2022) as well as the widespread projections of the serotonergic nuclei, we would expect a broad expression of ultraslow 5-HT oscillations throughout the brain. So far, ultraslow 5-HT oscillations have been described in the basal forebrain, as well as in the dentate gyrus, in addition to what we have shown in CA1 (Deng et al. 2024 and Turi et al. 2024). Furthermore, our results showing that hippocampal-cortical coherence changes according to the phase of hippocampal ultraslow 5-HT oscillations suggests that 5-HT can affect oscillatory activity either indirectly by modulating hippocampal cells projecting to the cortical network or directly by modulating the cortical postsynaptic targets. Given the heterogeneity in projection strength, as well as in pre- and postsynaptic serotonin receptor densities across brain regions (de Filippo & Schmitz, 2024), it would be interesting to see whether local ultraslow 5-HT oscillations are differentially modulated, e.g. in terms of oscillation power. Future studies investigating different brain regions via implantation of multiple optic fibers in different brain areas or using the mesoscopic imaging approach adopted in Deng et al. 2024, will be needed to examine the extent of spatial heterogeneity in this ultraslow oscillation.

      (6) Long-term effects not assessed: Long-term effects of ultra-low 5-HT oscillations (e.g., on memory consolidation or learning) were not assessed.

      While beyond the scope of our current study, we agree that an important next step would involve modulating the ultraslow serotonin oscillation after learning, and then examining potential effects on memory consolidation, presumably via changes in ripple dynamics, though many possibilities could explain potential effects. There, our results suggest it would be important to isolate effects due to the change in ultraslow oscillation features, rather than simply overall levels of 5-HT. To that end, it would be important to test different modulation dynamics, specifically modulating the oscillation strength, around a constant mean 5-HT level by carefully timed optogenetic stimulation/inhibition. Afterwards, showing a clear correlation between the strength of the 5-HT modulation and memory performance would be important to establishing the relationship, as done in Lecci et al 2017, where more prominent ultraslow oscillations of sigma power in the cortex during sleep, alongside a higher density of spindles, were correlated with better memory consolidation. Given the tight coupling of spindles and ripples during sleep, it is possible that a similar effect on memory consolidation would be observed following changes in ultraslow 5-HT oscillation power.

      (7) Possible species differences: It's uncertain whether the findings in mice apply to other mammals, including humans.

      We agree that the experiments should ultimately be replicated in humans. In the 2017 study by Lecci et al., the authors highlighted the shared functional requirements for sleep across species, despite apparent differences, such as variations in sleep volume. To explore these commonalities, the researchers conducted parallel experiments in both mice and humans, aiming to identify a universal organizing structure. They discovered that the ultraslow oscillation of sigma power serves this role, enabling both species to balance the competing demands of arousability and sleep imperviousness. Based on this finding, it is plausible that ultraslow oscillations of serotonin, which similarly modulate activity according to arousal levels, would serve a comparable function in humans.

      (8) Technical limitations: The temporal resolution and sensitivity of the GRAB5-HT3.0 sensor may not capture faster 5-HT dynamics.

      The kinetics of the GRAB5-HT3.0 sensor used in this study limit the range of serotonin dynamics we can observe. However, the ultraslow oscillations we measure reflect temporal changes on the scale of 20 s and greater, whereas the GRAB sensor we use has sub-second on kinetics and below 2 s off kinetics (Deng et al. 2024). Therefore, the sensor is capable of reporting much faster activity than the ultraslow oscillations we observe, indicating that the ultraslow 5-HT signal accurately reflects the dynamics on this time scale. Furthermore, the presence of ultraslow oscillations in spiking activity—observed in the hippocampal formation (Gonzalo Cogno et al., 2024; Aghajan et al., 2023; Penttonen et al., 1999) and in the dorsal raphe (Mlinar et al., 2016), which are not affected by the same temporal smoothing, suggests that the oscillations we record are not likely due to signal aliasing, but instead reflect genuine oscillatory activity. Of course, this does not preclude that other, faster serotonin dynamics are also present in our signal, some of which may be too fast to be observed. For instance, rapid serotonin signaling via the ionotropic 5-HT3a receptors could be missed in our recordings. Additionally, with the fiber photometry approach we adopted, we are limited to capturing spatially broad trends in serotonin levels, potentially overlooking more localized dynamics.

      (9) Interactions with other neuromodulators: The study does not explore interactions with other neuromodulators (e.g., norepinephrine, acetylcholine) or their potential ultraslow oscillations.

      We agree that the interaction between neuromodulators in the context of ultraslow oscillations is an important issue, which we have addressed in our response to reviewer #1 under ‘Weaknesses.’

      (10) Limited exploration of functional significance: While the study suggests a potential role for 5-HT oscillations in memory consolidation and arousal, direct tests of these functional implications are not included.

      We agree and reference our answer to (6) regarding memory consolidation. Regarding arousal, direct tests of arousability to different sensory stimuli during different phases of the ultraslow 5-HT oscillation during sleep would be beneficial, in addition to the indirect measures of arousal we examine in the current study, e.g. degree of movement (icEMG) and long range coherence. In line with what we have shown, Cazettes et al. (2021) has demonstrated a direct relationship between 5-HT levels and pupil size, an indicator of arousal level, which like our findings, is consistent across behavioral states.

      Reviewer #3 (Public review):

      Summary:

      The activity of serotonin (5-HT) releasing neurons as well as 5-HT levels in brain structures targeted by serotonergic axons are known to fluctuate substantially across the animal's sleep/wake cycle, with high 5-HT levels during wakefulness (WAKE), intermediate levels during non-REM sleep (NREM) and very low levels during REM sleep. Recent studies have shown that during NREM, the activity of 5HT neurons in raphe nuclei oscillates at very low frequencies (0.01 - 0.05 Hz) and this ultraslow oscillation is negatively coupled to broadband EEG power. However, how exactly this 5-HT oscillation affects neural activity in downstream structures is unclear.

      The present study addresses this gap by replicating the observation of the ultraslow oscillation in the 5-HT system, and further observing that hippocampal sharp wave-ripples (SWRs), biomarkers of offline memory processing, occur preferentially in barrages on the falling phase of the 5-HT oscillation during both wakefulness and NREM sleep. In contrast, the raising phase of the 5-HT oscillation is associated with microarousals during NREM and increased muscular activity during WAKE. Finally, the raising 5-HT phase was also found to be associated with increased synchrony between the hippocampus and neocortex. Overall, the study constitutes a valuable contribution to the field by reporting a close association between raising 5-HT and arousal, as well as between falling 5-HT and offline memory processes.

      Strengths:

      The study makes compelling use of the state-of-the-art methodology to address its aims: the genetically encoded 5-HT sensor used in the study is ideal for capturing the ultraslow 5-HT dynamics and the novel detection method for SWRs outperforms current state-of-the-art algorithms and will be useful to many scientists in the field. Explicit validation of both of these methods is a particular strength of this study.

      The analytical methods used in the article are appropriate and are convincingly applied, the use of a general linear mixed model for statistical analysis is a particularly welcome choice as it guards against pseudoreplication while preserving statistical power.

      Overall, the manuscript makes a strong case for distinct sub-states across WAKE and NREM, associated with different phases of the 5-HT oscillation.

      Weaknesses:

      All of the evidence presented in the study is correlational. While the study mostly avoids claims of causality, it would still benefit from establishing whether the 5-HT oscillation has a direct role in the modulation of SWR rate via e.g. optogenetic activation/inactivation of 5-HT axons. As it stands, the possibility that 5-HT levels and SWRs are modulated by the same upstream mechanism cannot be excluded.

      We agree that causality claims cannot be made with our data, and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One major question in the presented data is the nature of the asymmetrical shape of the targeted slow events. How much does it reflect the 5-HT concentration and how much is this shape affected by the dynamics of the designed 5-HT sensor? This needs to be addressed in more detail referencing the original paper for the used sensor.

      We have added a paragraph in the Results section of the manuscript to address the asymmetric waveform of the ultraslow 5-HT oscillations and whether it could be affected by the asymmetric kinetics of the GRAB sensor we use: “The waveform of these ultraslow 5-HT oscillations…” (Results, paragraph #4, pg. 5). We include an extended answer to the question here:

      Indeed, the GRAB5-HT3.0 sensor we use in the study shows activation response kinetics which are faster than their deactivation time, with time constants at 0.25 s and 1.39 s, respectively (Deng et al. 2024). Likewise, the slope of the rising phase of the ultraslow serotonin oscillation we measure is faster than the slope of the falling phase, and the ratio of time spent in the rising phase versus the falling phase is less than 1, indicating longer falling phases (Figure S2). Although we cannot completely rule out that the asymmetric shape of the ultraslow serotonin oscillations we record is affected by this asymmetry in the 5-HT sensor kinetics, we believe this is unlikely, as the 5-HT signal clearly contains reductions in 5-HT levels that are much faster than the descending phase of the ultraslow oscillation. Although it is difficult to directly compare the different-sized signals, the reported timescales of off kinetics, on the order of a few seconds (Deng et al. 2024), are far below the tens of seconds timescale of the ultraslow oscillation. Furthermore, the finding that some dorsal raphe neurons modulate their firing rate at ultraslow frequencies, and moreover that all examples of such ultraslow oscillations shown display clear asymmetry in rising time versus decay, suggests that the asymmetry we observe in our data could be due to neural activity rather than temporal smoothing by the sensor (Mlinar et al. 2016). In this same direction, another study found similar asymmetry in extracellular 5-HT levels measured with fast scan cyclic voltammetry (FSCV), a technique with greater temporal resolution (sampling rate of 10 Hz) than GRAB sensors, after single pulse stimulation (Bunin and Wightman 1998). In this study, 5-HT was shown to be released extrasynaptically, making the longer clearing time compared to the release time intuitive. Finally, the observation that the onsets and offsets of ripple clusters, recorded with a sampling rate of 20 kHz, are precisely aligned with the peaks and troughs of ultraslow serotonin oscillations (Figure 1, H1-2, columns 2-3) suggests that the duration of the falling phase is not artificially distorted by the temporal smoothing of the sensor dynamics.

      Regardless of the dynamics of the serotonin concentration, it should be noted that the elicited neuronal effect might have different dynamics compared to the 5-HT concentration that need to be more studied: to address this one can either examine the average of the broadband LFP (not high passfiltered by the amplifier) or the distribution of simultaneously recorded spiking activity around the peak of ultra-slow oscillations.

      We have added Figure S6, showing unit activity relative to the phase of ultraslow serotonin oscillations.

      From this analysis, we uncover three groups of units which are largely preserved across states (Figure S6, E vs. F), albeit with a slight temporal shift rightward from NREM to WAKE (Figure S6, C vs. D). Namely, some units spike preferentially during the rising phase, some during the falling phase, and a third group have no clear phase preference. Unit activity during the falling phase is unsurprising, as it is where ripples largely occur, which themselves are associated with spike bursts. During the rising phase, the unit activity we observe could correspond to firing of the hippocampal subpopulation known to be active during NREM interruption states (Jarosiewicz et al. 2002, Miyawaki et al. 2017). While the units’ phase preference was tested based on the category of rising vs. falling phase, as this division described most variation in the data, a few units in the ‘No preference’ group showed heightened activity near the oscillation peak. However, given the very small number of units with this preference, more unit data is needed to describe this group, ideally with high-density recordings. Overall, most units showed a falling vs. rising phase preference, indicating a phase coding of hippocampal activity by 5-HT ultraslow oscillations.

      Related to the previous point, it would be helpful to show the average cycle shape of these oscillations (relative to the phase 0 extracted in Figure 3) and do the shape comparison across sessions and also wake/NREM

      We agree, and to this end we have added Figure S2. From this waveform analysis, we show that the ultraslow serotonin oscillation is asymmetric, with the rising phase having a greater slope, but shorter length, than the falling phase. While this asymmetry is observed both in NREM and WAKE, the slope difference and length ratio difference in rising vs. falling phase is greater in NREM (Figure S2. B).

      In Figure 3D, there seem to be oscillatory rhythms with faster cycles on top of the targeted oscillations. That would make the phase estimation less accurate, e.g. in the left panel, in the second cycle, it is not clear if there are two faster cycles or it is one slow cycle as targeted, and if noted in the rising phase of the second fast cycle there are no ripples. This might suggest that regardless of specific oscillation frequency whenever 5-HT is started to get released, the ripples are suppressed and once the 5-HT is not synaptically effective anymore the ripples start to get generated while the photometry signal starts to wane with the serotonin being cleared. Still, if there is any rhythmicity between bouts of no ripple, it would suggest an ultra-slow regularity in the 5-HT release.

      The reviewer is correct to point out that some faster increases in serotonin, which occur on top of the ultraslow oscillations we measure, seem to be associated with decreased ripple incidence, as in the example referenced. The dominance of ultraslow frequencies in the power spectrum of the 5-HT signal suggests, however, that oscillations faster than the ultraslow oscillations we describe are far less prevalent in the data. While there may be some coupling of ripples and other measures to serotonin oscillations of different frequencies, this may be hard or impossible to detect with phase analysis based on their infrequent occurrence and nonstationary nature. In fact, we show in Figure S3 that the strongest phase modulation of ripples by ultraslow serotonin oscillations is observed in the frequencies we use (0.01-0.06 Hz). Methodologically, phase analysis indeed assumes stationary signals, which are rare if not absent in physiological data (Lo et al. 2009), however generally the narrower the frequency band, the better the phase estimation. The narrow frequency band we use provides phase estimates that are largely robust and unaffected by the presence of faster oscillations, as can be seen in the example phase traces shown in Figure 4.

      The hypothesis that the rising phase burst of synaptic serotonin is what silences ripples, and that with the clearing of serotonin from the synapses, ripples recover, is a possible explanation of our findings. However, if this were the case, one could expect the ripple rate to increase over the course of the falling phase of ultraslow 5-HT oscillations, as 5-HT decreases, and peak at the trough. This is at odds with what we observe, namely a fairly uniform distribution of ripples along the falling phase (Figure 3F2,F4). Furthermore, the Mlinar et al. 2016 study describes a subpopulation of raphe neurons whose firing rates themselves oscillate at ultraslow frequencies, rather than on-off bursting at ultraslow frequencies, which would argue against this hypothesis. However, as this study looks at a small number of neurons in slices, further in vivo experiments examining firing rates of median raphe neurons are required to understand how the ultraslow oscillation of extracellular serotonin that we measure is generated as well as how it is related to ripple rates.

      In Figure 3B, it is not clear why IRI is z-scored. It would be informative to have the actual value of IRI. What is the z relative to? Is it the mean value of IRI in each recording session? Is this to reduce the variability across sessions?

      We have now included in Figure 3D a box plot displaying the IRI distributions across different states and sessions. To minimize inter-session variability, data were z-scored within each session for visualization purposes. However, all general linear models were based on raw data, and as a result, the raw differences in IRI are shown in Figure 3C.

      Figure 3E, panel labels don't match with the caption

      We are grateful to the reviewer for pointing out this mistake, which we have corrected in the updated version of the manuscript.

      In the text related to Figure 3E, the related analysis can be more clearly described. "phase preference of individual ripples" does not immediately suggest that the occurring phase of each ripple relative to the targeted oscillation is extracted. I suggest performing this analysis individually for each session and summarizing the results across the sessions.

      We have reworded the sentence in Results: 5-HT and ripples to better reflect the analysis performed: “Next, we calculated the ultraslow 5-HT phases at which individual ripples occurred during both NREM and WAKE (3E-F) ...”. Regarding session-level data, we have added Figure S3, which shows session level mean phase vectors, as well as the grand mean across sessions for both NREM and WAKE. Included in this figure are session level means for frequency bands outside of the ultraslow band we used in our study, intended to show that ripples are most strongly timed by the ultraslow band (0.01-0.06 Hz), reflected by the greater amplitude of the mean phase vector for this band.

      Figure 3E2, based on the result of ripple-triggered 5-HT in left panels of 2H1-2, one would expect to see a preferred phase closer to 180 (toward the end of the falling phase), it would be helpful to compare and discuss the results of these two analyses.

      The reviewer is correct to point out the apparent discrepancy in where the mean ripple falls with respect to the ongoing serotonin oscillation between the two figures mentioned. We have addressed this point in Results: 5-HT and ripples, paragraph #4: “This result appear to be at odds with…”.

      Regarding the analysis in 3F, please also compare the power distribution of ripples between NREM and wake. This will help to better understand the potential difference behind the observed difference: how much the strong ripples are comparable between wake and NREM. It is also necessary to report the ripple detection failure rate across ripples with different strengths.

      We have added a figure showing analysis done on a subset of the data in which ripples were manually curated in order to evaluate the performance of the ripple detection model (Figure S7) and explanatory text in Methods: Model performance: ‘To ensure that our model …’. In summary, while missed ripples did tend to have lower power than correctly detected ripples, including them did not change the distribution of ripples by the phase of the ultraslow serotonin oscillation (Figure S7C). We would also note that while the phase preference is noisier than what is presented in Figure 3F because this analysis was done with a small subset of all recorded ripples, the fact that ripples occur more clearly on the falling phase is visible for both detected ripples and detected + false negative ripples.

      The mixed-effects model examining the influence of 5-HT ultraslow oscillation phase on ripple power revealed no significant effect of state (p = 0.088). This indicates that whether the data were collected during NREM or wake periods did not significantly impact ripple power and that the lack of a significant effect (in Figure 3G,H) in WAKE is probably not due to a difference in the distribution of ripple power between states.

      4D, y label is z?

      We are grateful for the reviewer to point that out, yes, the y label should be ‘z-score’, as the two traces represent z-scored 5-HT (blue) and z-scored shuffled data (orange). Figure 4D2 and Figure 2H1-2, which show similar data, have been corrected to address this oversight.

      Relating to Figure 4, EMG comparison across phases of the oscillations is insightful. Two related and complementary analyses are to compare the theta and gamma power between the falling and rising phases.

      We have addressed this suggestion in Figure S5 A-C. While low gamma, high gamma and theta power are modulated identically in NREM, with higher power observed during the falling phase than the rising phase, during WAKE, different patterns can be seen. Specifically, low gamma power shows no phase preference, while high gamma shows a peak near the center of the ultraslow 5-HT oscillation. Theta power, as in NREM, is higher during the falling phase of ultraslow 5-HT oscillations. Increased power across many frequency bands was shown to coincide with decreases in DRN population activity during NREM, which matches with what we report here (Kato et al. 2022). In summary, while NREM patterns are consistent in all frequency bands tested, aligning with the pattern of ripple incidence, in WAKE low and high gamma power show different relationships to ultraslow 5-HT phase.

      In the manuscript, we have used the data in both Figure S5 and S6 (unit activity relative to ultraslow 5-HT oscillations), to argue against the idea that our coherence findings result from a lack of activity in the rising phase (see next question), which would have the effect of ‘artificially’ reducing coherence in the falling phase relative the rising phase. The text can be found in Results: 5-HT and hippocampal cortical coherence, paragraph #2.

      The results presented in Figure 5 could be puzzling and need to be further discussed: if the ripple band activity is weak during the rising phase, in what circumstances the coherence between cortex and CA1 is specifically very strong in this band?

      As mentioned in the previous answer, we have addressed this concern in Results: 5-HT and hippocampal-cortical coherence, paragraph #2. In summary, it is true that the higher coherence in rising phase than in the falling phase for the highest frequency band (termed ‘high frequency oscillation’ (HFO), 100-150 Hz) could be unexpected, given that ripples occur largely during the falling phase. A few points could help explain this finding. Firstly, it should be noted that power in the 100-150 Hz band can arise from physiological activity outside of ripples, such as filtered non-rhythmic spike bursts (Liu et al. 2022), whose coherent occurrence in the rising phase could explain the coherence findings. Secondly, coherence is a compound measure which is affected by both phase consistency and amplitude covariation (Srinath and Ray 2014), thus from only amplitude one cannot predict coherence. Furthermore, HFO power in the cortex is highest near the peak of ultraslow 5-HT oscillations (Figure S5D), as opposed to the falling phase peak in the hippocampus. This shows a lack of covariation in amplitude by phase between the hippocampus and cortex at this frequency band. An alternative explanation of our findings regarding coherence could be that in the rising phase, there is simply little to no activity, which is easier to ‘synchronize’ than bouts of high activity. Hippocampal unit activity in the rising phase (Figure S6) suggests however, that it is not likely to be the absence of activity supporting higher coherence in the rising phase across frequencies. Additional experiments using high density recordings should be conducted to examine 5-HT ultraslow oscillations and their role in gating activity across brain regions, though these results strongly suggest some role exists.

      Reviewer #2 (Recommendations for the authors):

      I would like to offer two comments. I believe that these are not unusual requests, and thus I would like the authors to respond.

      (1) It would be prudent to investigate the possibility that the observed correlation between ultraslow and hippocampal ripples/microarousals is merely superficial and that there are unidentified confounding factors at play. For example, it would be beneficial to provide evidence that administering a serotonin receptor inhibitor result in the disappearance of the slow oscillation of ripples and microarousals, or that the correlation with ultraslow is no longer present. Please note that the former experiments do not require GRAB5-HT3.0 imaging.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3. We would further like to note that given the large number of serotonin receptors and the lack of selectivity of many serotonin receptor antagonists, a pharmacological approach would be difficult, though the results certainly useful. Finally, we highlight the psilocin study, which reported changes in the rhythmic occurrence of microarousals, and therefore likely ultraslow oscillations, after administering a 5-HT2a receptor agonist, suggesting a potential causal effect of 5-HT (via 5-HT2a receptor) on MA occurrence (Thomas et al. 2022).

      (2) The slow frequency appears to be associated with the default mode network as observed in fMRI signals. The neural basis of the default mode network remains unclear; therefore, a more detailed examination of this possibility would be beneficial.

      We agree that it would be interesting to investigate the role of 5-HT in the neural basis of the DMN.

      The DMN as described in humans (Raichle et al. 2001) and rodents (Lu et al. 2012) may indeed include some parts of the hippocampus and perhaps some of our neocortical recordings could also be considered part of the DMN. The fact that the activity across the inter-connected brain structures of the DMN is correlated at ultraslow time scales (Gutierrez-Barragan et al. 2019, Mantini et al. 2007), as well as serotonin’s ability to modulate the DMN is intriguing (Helmbold et al. 2016). Further studies simultaneously recording DMN activity via fMRI and electrical activity via silicon probes, as done in Logothetis et al. 2001, could elucidate further a potential link between ultraslow oscillations and the DMN, with serotonergic modulation as a means to understand any potential contribution of serotonin.

      Reviewer #3 (Recommendations for the authors):

      (1) The impact of the study would benefit from an experiment causally testing the effect of hippocampal 5-HT levels on hippocampal physiology, e.g. using optogenetic manipulations.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      (2) Data presentation: the figures are of poor resolution, making some diagram details and, more importantly, some example traces (e.g. Figure 1A, right) impossible to see. This should be corrected by either increasing figure resolution or making important figure elements large enough to be readable.

      We apologize for the poor resolution and have corrected it in the updated version of the manuscript.

      (3) Differences in some figure panels are not statistically assessed: Figure 1H (differences in spectrum peak power), Figure 3E1 & Figure 3E3 (directional bias of the circular distributions), Figure 4C (difference from 0 mean).

      We acknowledge this oversight and have added statistical tests for all three figures, as well as further information regarding the models used in Methods: Statistics.

      (4) Lines 279-280: the claim that the study shows "organization of activity by ultraslow oscillations of 5-HT" implies a causal role of 5-HT in organizing hippocampal activity. I suggest that this statement be toned down to reflect the correlational nature of the presented evidence.

      We have rephrased the sentence in question to the following: “In our study, including both NREM and WAKE periods allowed us to additionally show that the temporal organization of activity relative to ultraslow 5-HT oscillations operates according to the same principles in both states...”, which we believe better reflects the temporal correlation we describe.

      (5) While the study claims to use the EMG (i.e. electromyograph) signal, it does not describe any electrodes placed inside the muscle in the methods section. The SleepScoreMaster toolbox used in the study estimates the EMG using high-frequency activity correlated across recording channels, so I assume this is how this signal was obtained. While such activity may well reflect muscular noise to some degree, it is an indirect measure as the electrodes are not in the muscle. Since the EMG signal is central to the message of the manuscript, the method for calculating it should be described in the methods section and it should be explicitly labelled as an indirect measure in the main text, e.g. by referring to this signal as pseudo-EMG.

      We agree and have added explanatory text to the State Scoring subsection in Methods. Given that the EMG we refer to is derived from intracranial data, and not from traditional EMG probes, we now refer to the EMG as intracranial EMG, or icEMG for short, throughout the main text.

      (6) Is ripple frequency or ripple duration different across the rising and falling phases of the ultraslow oscillation?

      We have now investigated this suggestion in Figure S4, where we show that ripple frequency is higher in the falling phase than rising phase, while ripple duration appears to show no phase preference.

      (7) Lines 315-317: I am not sure why the manuscript refers to the coupling between EMG and 5-HT levels as 'puzzling' given that, as stated, the locomotion-inducing effects of 5-HT are well documented. While the fact that even non-locomotory motor activity may be associated with 5-HT rise is certainly interesting (although not sure if 'puzzling'), the manuscript does not directly compare the association of 5-HT levels with locomotory and non-locomotory EMG spikes. Thus, I think this discussion point is not fully warranted.

      We agree and have rephrased the discussion point in question to reflect that the EMG link to serotonin oscillations is not necessarily surprising, given both the literature linking 5-HT and spontaneous movement in the hippocampus, as well as the involvement of 5-HT in repetitive movements, where the role for a regularly-occurring oscillation is perhaps more intuitive.

      (8) Line 441: Reference #67 does not describe the use of fiber photometry.

      The reviewer is to correct to point out this typo, which has been now corrected. The reference in question should be 64, where fiber photometry experiments are described. For further clarity, we have changed our referencing scheme to include authors and years in in-text references.

      (9) In Figures 3E1-3, the phase has different bounds than in the other Figures in the manuscript (0:360 vs -180:180), this should be corrected for consistency.

      We agree and have made changes so that all figures have a phase range of -180 to 180°.

      References

      (1) Z. M Aghajan, G. Kreiman, I. Fried, Minute-scale periodicity of neuronal firing in the human entorhinal cortex. Cell Rep 42, 113271 (2023).

      (2) M.A. Bunin, R.M. Wightman (1998). Quantitative Evaluation of 5-Hydroxytryptamine (Serotonin) Neuronal Release and Uptake: An Investigation of Extrasynaptic Transmission. J. Neurosci. 18 (13) 4854-4860

      (3) F. Cazettes, D. Reato, J. P. Morais, A. Renart, Z. F. Mainen, Phasic Activation of Dorsal Raphe Serotonergic Neurons Increases Pupil Size. Curr Biol 31, 192-197.e4 (2021).

      (4) Cole SR, Voytek B. Brain Oscillations and the Importance of Waveform Shape. Trends Cogn Sci. 21(2):137-149 (2017).

      (5) F. Deng, et al., Improved green and red GRAB sensors for monitoring spatiotemporal serotonin release in vivo. Nat Methods 21, 692–702 (2024).

      (6) C. Dong, et al., Psychedelic-inspired drug discovery using an engineered biosensor. Cell 184, 2779-2792.e18 (2021).

      (7) A. Eban-Rothschild, L. Appelbaum, L. de Lecea, Neuronal Mechanisms for Sleep/Wake Regulation and Modulatory Drive. Neuropsychopharmacol. 43, 937–952 (2018).

      (8) M. S. Fanselow, H.-W. Dong, Are the dorsal and ventral hippocampus functionally distinct structures? Neuron 65, 7–19 (2010).

      (9) J. S. Farrell, E. Hwaun, B. Dudok, I. Soltesz, Neural and behavioural state switching during hippocampal dentate spikes. Nature 1–6 (2024). https://doi.org/10.1038/s41586-024-07192-8.

      (10) De Filippo, R., & Schmitz, D. (2024). Transcriptomic mapping of the 5-HT receptor landscape. Patterns (New York, N.Y.), 5(10), 101048.

      (11) M. J. Fisher, et al., Neural mechanisms of sensory gating: Insights from human and animal studies. NeuroImage 207, 116374 (2020).

      (12) P. Fries, D. Nikolić, W. Singer, The gamma cycle. Trends in Neurosciences 30, 309–316 (2007).

      (13) S. Gonzalo Cogno, et al., Minute-scale oscillatory sequences in medial entorhinal cortex. Nature 625, 338–344 (2024).

      (14) D. Gutierrez-Barragan, M. A. Basson, S. Panzeri, A. Gozzi, Infraslow State Fluctuations Govern Spontaneous fMRI Network Dynamics. Current Biology 29, 2295-2306.e5 (2019).

      (15) K. Helmbold, et al., Serotonergic modulation of resting state default mode network connectivity in healthy women. Amino Acids 48, 1109–1120 (2016).

      (16) B. Jarosiewicz, B. L. McNaughton, W. E. Skaggs, Hippocampal Population Activity during the Small-Amplitude Irregular Activity State in the Rat. J. Neurosci. 22, 1373–1384 (2002).

      (17) J. T. Jordan, The rodent hippocampus as a bilateral structure: A review of hemispheric lateralization. Hippocampus 30, 278–292 (2020).

      (18) T. Kato, et al., Oscillatory Population-Level Activity of Dorsal Raphe Serotonergic Neurons Is Inscribed in Sleep Structure. J. Neurosci. 42, 7244–7255 (2022).

      (19) M.A. Kim, H. S. Lee, B. Y. Lee, B. D. Waterhouse, Reciprocal connections between subdivisions of the dorsal raphe and the nuclear core of the locus coeruleus in the rat. Brain Research 1026, 56–67 (2004).

      (20) C. Kjaerby, et al., Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci 25, 1059–1070 (2022).

      (21) S. Lecci, et al., Coordinated infraslow neural and cardiac oscillations mark fragility and offline periods in mammalian sleep. Sci Adv 3, e1602026 (2017).

      (22) A. A. Liu, et al., A consensus statement on detection of hippocampal sharp wave ripples and differentiation from other fast oscillations. Nat Commun 13, 6000 (2022).

      (23) M.-T. Lo, P.-H. Tsai, P.-F. Lin, C. Lin, Y. L. Hsin, The nonlinear and nonstationary properties in eeg signals: probing the complex fluctuations by hilbert–huang transform. Adv. Adapt. Data Anal. 01, 461–482 (2009).

      (24) N. K. Logothetis, J. Pauls, M. Augath, T. Trinath, A. Oeltermann, Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001).

      (25) H. Lu, et al., Rat brains also have a default mode network. Proc Natl Acad Sci U S A 109, 3979–3984 (2012).

      (26) D. Mantini, M. G. Perrucci, C. Del Gratta, G. L. Romani, M. Corbetta, Electrophysiological signatures of resting state networks in the human brain. Proc Natl Acad Sci U S A 104, 13170– 13175 (2007).

      (27) J. E. Markowitz, et al., The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44-58.e17 (2018).

      (28) H. Miyawaki, Y. N. Billeh, K. Diba, Low Activity Microstates During Sleep. Sleep 40, zsx066 (2017).

      (29) B. Mlinar, A. Montalbano, L. Piszczek, C. Gross, R. Corradetti, Firing Properties of Genetically Identified Dorsal Raphe Serotonergic Neurons in Brain Slices. Front Cell Neurosci 10, 195 (2016).

      (30) A. Osorio-Forero, et al., Noradrenergic circuit control of non-REM sleep substates. Current Biology 31, 5009-5023.e7 (2021).

      (31) S. Panzeri, N. Brunel, N. K. Logothetis, C. Kayser, Sensory neural codes using multiplexed temporal scales. Trends in Neurosciences 33, 111–120 (2010).

      (32) M. E. Raichle, et al., A default mode of brain function. Proc Natl Acad Sci U S A 98, 676–682 (2001).

      (33) R. Srinath, S. Ray, Effect of amplitude correlations on coherence in the local field potential. J Neurophysiol 112, 741–751 (2014).

      (34) B. P. Staresina, J. Niediek, V. Borger, R. Surges, F. Mormann, How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nat Neurosci 26, 1429–1437 (2023).

      (35) C. W. Thomas, et al., Psilocin acutely alters sleep-wake architecture and cortical brain activity in laboratory mice. Transl Psychiatry 12, 77 (2022).

      (36) G. F. Turi, et al., Serotonin modulates infraslow oscillation in the dentate gyrus during Non-REM sleep. eLife 13 (2025).

      (37) J. Vazquez, H. A. Baghdoyan, Basal forebrain acetylcholine release during REM sleep is significantly greater than during waking. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 280, R598–R601 (2001).

      (38) J. Wan, et al., A genetically encoded sensor for measuring serotonin dynamics. Nat Neurosci 24, 746–752 (2021).

      (39) Y. Zhang, et al., Cholinergic suppression of hippocampal sharp-wave ripples impairs working memory. Proc. Natl. Acad. Sci. U.S.A. 118, e2016432118 (2021).

      (40) Y. Zhang, et al., Interaction of acetylcholine and oxytocin neuromodulation in the hippocampus. Neuron (2024).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      We would like to thank Reviewer 1 for recognising the importance of our findings on the heterogeneity in bacterial responses to tachyplesin.

      (1) A double deletion of acrA and tolC (two out of the three components of the major constitutive RND efflux pump) reduces the appearance of the low accumulator phenotype, but interestingly, the single deletions have no effect, and a well-characterised inhibitor of RND efflux pumps also has no effect. The authors identify a two-component system, qseCB, that appears necessary for the appearance of low accumulators, but this system has pleiotropic effects on many cellular systems, with only tenuous connections to efflux. The selected pharmacological agents that could prevent the appearance of low accumulators do not offer clear insight into the mechanism by which low accumulators arise, because they have diverse modes of action.

      We have added that “QseBC, was previously inferred to mediate resistance to a tachyplesin analogue by upregulating efflux genes based on transcriptomic analysis and hyper susceptibility of ΔqseBΔqseC mutants[113]”. However, we have also acknowledged that “it is conceivable that the deletion of QseBC has pleiotropic effects on other cellular mechanisms involved in tachyplesin accumulation.” and that “it is also conceivable that sertraline prevented the formation of the low accumulator phenotype via efflux independent mechanisms”

      These amendments are reported on lines 525-527, 532-534 and 539-541 of our revised manuscript.

      (2) The transcriptomics data collected for low and high accumulator sub-populations are interesting, but in my opinion, the conclusions that can be drawn from these data remain overstated. It is not possible to make any claims about the total amount of "protein synthesis, energy production, and gene expression" on the basis of RNA-Seq data. The reads from each sample are normalised, so there is no information about the total amount of transcript. Many elements of total cellular activity are post-transcriptionally regulated, so it is impossible to assess from transcriptomics alone. Finally, the transcriptomic data are analysed in aggregated clusters of genes that are enriched for biological processes, for example: "Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators". However, this obscures the fact that these clusters include genes that are generally inhibitory of the process named, as well as genes that facilitate the process.

      We have now acknowledged that “that our data do not take into account post-transcriptional modifications that represent a second control point to survive external stressors.”

      These amendments are reported on lines 534-535 of our revised manuscript.

      The raw transcript counts can be found in Figure 3 – Source Data, we had added these data in our previous manuscript as requested by this reviewer.

      We would also like to clarify that we have analysed our transcriptomic data via both clustering (i.e. Figure 3) and direct comparison of genes of interest (Table S1) and transcription factors (i.e. genes that are generally inhibitory of the process named, as well as genes that facilitate the process, Figure S12).

      Finally, we would like to point out that in our revised manuscript (both this and its previous version) we are stating “Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators”. We do not think this is an overstatement, we do not use these data to make conclusions on the total amount of "protein synthesis, energy production, and gene expression".

      (3) The authors have added an experiment to attempt to assess overall metabolic activity in the low accumulator and high accumulator populations, which is a welcome addition. They apply the redox dye resazurin and observe lower resorufin (reduced form) fluorescence in the low accumulator population, which they take to indicate a lower respiration rate. This seems possible, however, an important caveat is that they have shown the low accumulator population to retain substantially lower amounts of multiple different fluorescent molecules (tachyplesin-NBD, propidium iodide, ethidium bromide) intracellularly compared to the high accumulator population. It seems possible that the low accumulator population is also capable of removing resazurin or resorufin from the intracellular space, regardless of metabolic rate. Indeed, it has previously been shown that efflux by RND efflux pumps influences resazurin reduction to resorufin in both P. aeruginosa and E. coli. By measuring only the retained redox dye using flow cytometry, the results may be confounded by the demonstrated ability of the low accumulator population to remove various fluorescent dyes. More work is needed to strongly support broad conclusions about the physiological states of the low and high accumulator populations. The phenomenon of the emergence of low accumulators, which are phenotypically tolerant to the antimicrobial peptide tachyplesin, is interesting and important even if there is still work to be done to understand the mechanism by which it occurs.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

      Reviewer 2:

      We would like to thank the reviewer for recognising that all their previous comments have now been satisfactorily addressed.

      (1) Some mechanistic questions regarding tachyplesin-accumulation and survival remain. One general shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´ cells. As the authors state themselves, this makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern of if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we had explicitly acknowledged this possibility on lines 281-285 (of the previous and current version of this manuscript).

      (2) The statement ´ Moreover, we found that the fluorescence of low accumulators decreased over time when bacteria were treated with 20 μg mL´ is, in my opinion, not supported by the data shown in Figure S4C. That figure shows that the abundance of ´low accumulator´ cells decreases over time. Following the rationale that protease K treatment may cleave surface associated/ extracellular tachyplesin-NDB, this should lead to a shift of ´low accumulator´ population to the left, indicating reduced fluorescence intensity per cell. This is not so case, but the population just disappears. However, after 120 min of treatment more cells appear in the ´high accumulator´ state. This result is somewhat puzzling.

      We agree with the reviewer that our previous discussion of this data could have been misleading. We have now reworded this part of the text as following: “We found that the fluorescence of high accumulators did not decrease over time when tachyplesin-NBD was removed from the extracellular environment and bacteria were treated with 20 μg mL<sup>-1</sup> (0.7 μM) proteinase K, a widely-occurring serine protease that can cleave the peptide bonds of AMPs [43–45] (Figure S4B and C). These data suggest that tachyplesin-NBD primarily accumulates intracellularly in high accumulators.”

      It is conceivable that extended exposure to proteinase K (i.e. we see a decrease in the abundance of low accumulators after 90 min treatment with proteinase K) increased the permeability to tachyplesin-NBD of low accumulators allowing tachyplesin-NBD to move from either the extracellular space or the membrane to the cell interior. However, we do not have data to prove this point.

      Therefore, we have now removed our claim that the data obtained using proteinase K suggest that tachyplesin-NBD accumulates primarily in the membranes of low accumulators. We believe that our two separate microscopy analyses provide more direct, stronger and less ambiguous evidence that tachyplesin-NBD accumulates primarily in the membranes of low accumulators.

      (3) The authors used the metabolic dye resazurin to measure the metabolic activity of low vs. high accumulators. I am not entirely convinced that the lower fluorescence resorufin fluorescence in tachyplesin-NBD accumulators really indicates lower metabolic activity, since a cell's fluorescence levels would also be affected by the cellular uptake and efflux. It appears plausible that the lower resorufin-fluorescence may result from reduced accumulation/increased efflux in the ‘low-tachyplesin NBD´ population.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      (4) P8 line 343. The text should refer to Figure. 13B, instead of 14B

      We have now changed the text accordingly on line 337.

      Reviewer 3:

      We would like to thank the reviewer for recognising that we have done a very impressive job in taking care of their comments.

      (1) Despite these advances, the contribution of efflux may require more direct evidence to further dissect whether efflux is necessary, sufficient, or contributory. The facts that the key low efflux mutant still retains a small fraction of survivors and that the inhibitors used may cause other physiological changes leading to higher efflux are still unaccounted for. The lipidomic and vesicle findings, while intriguing, remain descriptive, and direct tests of their functional relevance would further solidify the mechanistic models.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, in cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Author response image 1 below). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      Author response image 1.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 2 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 2.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (7) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution. We have updated in the cryoEM table

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see figure; response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reports the development of a novel organoid system for studying the emergence of autorhythmic gut peristaltic contractions through the interaction between interstitial cells of Cajal and smooth muscle cells. While the utility of the organoids for studying hindgut development is well illustrated by showing, for example, a previously unappreciated potential role for smooth muscle cells in regulating the firing rate of interstitial cells of Cajal, some of the functional analyses are incomplete. There are some concerns about the specificity and penetrance of perturbations and the reproducibility of the phenotypes. With these concerns properly addressed, this paper will be of interest to those studying the development and physiology of the gut.

      We greatly appreciate constructive comments raised by the Editors and all the Reviewers. We have newly conducted pharmacological experiments using Nifedipine, a L-type Ca<sup>2+</sup> blocker known to operate in smooth muscles (new Fig 7). The treatment abrogated not only the oscillation of SMCs but also that in ICCs, further corroborating our model that not only ICC-to-SMC interactions but also the reverse direction, namely SMC-to-ICC feedback signals, are operating to achieve coordinated/stable rhythm of gut contractile organoids.

      Concerning the issues of the specificity and penetrance in pharmacological experiments with gap junction inhibitors, we have carefully re-examined effects by multiple blockers (CBX and 18b-GA) at different concentrations (new Fig 5D and Fig. S3B).We have newly found that: (1) the effects observed by CBX (100 µM) that the latency of Ca<sup>2+</sup> peaks between ICCs (preceding) and SMCs (following) was abolished are not seen by 18b-GA at any concentrations including 100 µM, implying that the latency of Ca<sup>2+</sup> peaks between these cells is governed by connexin(s) that are not inhibited by18bGA. Such difference in inhibiting effects by these two drugs were previously reported in multiple model systems including guts (Daniel et al., 2007; Parsons & Huizinga, 2015; Schultz et al., 2003).

      Regarding the penetrance of the drugs, we have carried out earlier administration (Day 3) of the gap junction inhibitor, either CBX (100 µM) or 18b-GA (100 µM), in the course of organoidal formation in culture when cells are still at 2D to exclude a possible penetrance problem (new Fig. S3C). There treatments render no or little effects to the patterns of organoidal contractions in a way similar to the drug administration at Day 7. As already shown in the first version, CBX (100 µM) eliminates the latency of Ca<sup>2+</sup> peaks, we believe that this drug successfully penetrates into the organoid and exerts its specific effects.

      Unfortunately, due to very unstable condition in climate including extreme heat and sporadically occurring bird flu epidemic since the last summer in Japan, the poultry farm must have faced problems. In the course of revision experiments, we got in a serious trouble at multiple times with unhealthy eggs/embryos lasting from last summer until present. These unfortunate incidents did not allow us to engage in the revision experiments as fully as we originally planned. Nevertheless, we did our very best within a limited time fame, and we believe that the revised version is suitable as a final version of an eLife article.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed an organoid system that contains smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs; pacemaker) but few enteric neurons, and generates rhythmic contractions as seen in the developing gut. The stereotypical arrangements of SMCs and ICCs in the organoid allowed the authors to identify these cell types in the organoid without antibody staining. The authors took advantage of this and used calcium imaging and pharmacology to study how calcium transients develop in this system through the interaction between the two types of cells. The authors first show that calcium transients are synchronized between ICC-ICC, SMC-SMC, and SMC-ICC. They then used gap junction inhibitors to suggest that gap junctions are specifically involved in ICC-to-SMC signaling. Finally, the authors used an inhibitor of myosin II to suggest that feedback from SMC contraction is crucial for the generation of rhythmic activities in ICCs. The authors also show that two organoids become synchronized as they fuse and SMCs mediate this synchronization.

      Strengths:

      The organoid system offers a useful model in which one can study the specific roles of SMCs and ICCs in live samples.

      Thank you very much for the constructive comments.

      Weaknesses:

      Since only one blocker each for gap junction and myosin II was used, the specificities of the effects were unclear.

      We appreciate these comments. We have addressed those of “weaknesses” as described in “Responses to the eLife assessment” (please see above).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Yagasaki et al. describe an organoid system to study the interactions between smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs). While these interactions are essential for the control of rhythmic intestinal contractility (i.e., peristalsis), they are poorly understood, largely due to the complexity of and access to the in vivo environment and the inability to co-culture these cell types in vitro for long term under physiological conditions. The "gut contractile organoids" organoids described herein are reconstituted from stromal cells of the fetal chicken hindgut that rapidly reorganize into multilayered spheroids containing an outer layer of smooth muscle cells and an inner core of interstitial cells. The authors demonstrate that they contract cyclically and additionally use calcium imagining to show that these contractions occur concomitantly with calcium transients that initiate in the interstitial cell core and are synchronized within the organoid and between ICCs and SMCs. Furthermore, they use several pharmacological inhibitors to show that these contractions are dependent upon non-muscle myosin activity and, surprisingly, independent of gap junction activity. Finally, they develop a 3D hydrogel for the culturing of multiple organoids and found that they synchronize their contractile activities through interconnecting smooth muscle cells, suggesting that this model can be used to study the emergence of pacemaking activities. Overall, this study provides a relatively easy-to-establish organoid system that will be of use in studies examining the emergence of rhythmic peristaltic smooth muscle contractions and how these are regulated by interstitial cell interactions. However, further validation and quantification will be necessary to conclusively determine show the cellular composition of the organoids and how reproducible their behaviors are.

      Strengths:

      This work establishes a new self-organizing organoid system that can easily be generated from the muscle layers of the chick fetal hindgut to study the emergence of spontaneous smooth muscle cell contractility. A key strength of this approach is that the organoids seem to contain few cell types (though more validation is needed), namely smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs). These organoids are amenable to live imaging of calcium dynamics as well as pharmacological perturbations for functional assays, and since they are derived from developing tissues, the emergence of the interactions between cell types can be functionally studied. Thus, the gut contractile organoids represent a reductionist system to study the interactions between SMCs and ICCs in comparison to the more complex in vivo environment, which has made studying these interactions challenging.

      Thank you very much for the constructive comments.

      Weaknesses:

      The study falls short in the sense that it does not provide a rigorous amount of evidence to validate that the gut organoids are made of bona fide smooth muscle cells and ICCs. For example, only two "marker" proteins are used to support the claims of cell identity of SMCs and ICCs. At the same time, certain aspects of the data are not quantified sufficiently to appreciate the variance of organoid rhythmic contractility. For example, most contractility plots show the trace for a single organoid. This leads to a concern for how reproducible certain aspects of the organoid system (e.g. wavelength between contractions/rhythm) might be, or how these evolve uniquely over time in culture. Furthermore, while this study might be able to capture the emergence of ICC-SMC interactions as they related to muscle contraction and pacemaking, it is unclear how these interactions relate to adult gastrointestinal physiology given that the organoids are derived from fetal cells that might not be fully differentiated or might have distinct functions from the adult. Finally, despite the strength of this system, discoveries made in it will need to be validated in vivo. Thank you very much for the comments, which are helpful to improve our MS. In the revised version, we have additionally used antibody against desmin, known to be a maker for mature SMCs (new Fig 3B). The signal is seen only in the peripheral cells overlapping with the αSMA staining (line 169-170).

      Concerning the reproducibility, while contractility changes were shown for a representative organoid in the original version, experiments had been carried out multiple times, and consistent data were reproduced as already mentioned in the text of the first version of MS. However, we agree with this reviewer that it must be more convincing if we assess quantitatively. We have therefore conducted quantitative assessments of organoidal contractions and Ca<sup>2+</sup> transients (new Fig. 2B, new Fig. 4D, new Fig 5D, E, new Fig. 6B, new Fig. 7B, new Fig. 8C, new Fig. S2, S3). Details such as repeats of experiments and size of specimens are carefully described in the revised version (Figure legends)

      In particular, in place of contraction numbers/time, we have plotted “contraction intervals” between two successive peaks (Fig. 2B and others). Actually, with your suggestion, we have tried to perform a periodicity analysis of organoid contractions. Unfortunately, no clear value has been obtained, probably because the contractions/Ca<sup>2+</sup> transitions are not as “regularly periodical” as seen in conventional physics. This led us to perform the peak-interval analysis. Methods to quantify the contraction intervals are carefully explained in the revised version.

      As already mentioned in the “Our provisional responses” following the receipt of Reviewers’ comments, we agree that our organoids derived from embryonic hind gut (E15) might not necessarily recapitulate the full function of cells in adult. However, it has well been accepted in the field of developmental biology that studies with embryonic tissue/cells make a huge contribution to unveil complicated physiological cell functions. Nevertheless, we have carefully considered in the revised version so that the MS would not send misleading messages. We agree that in vivo validation of our gut contractile organoid must be wonderful, and this is a next step to go.

      Reviewer #3 (Public Review):

      Summary:

      The paper presents a novel contractile gut organoid system that allows for in vitro studying of rudimentary peristaltic motions in embryonic tissues by facilitating GCaMPlive imaging of Ca<sup>2+</sup> dynamics, while highlighting the importance and sufficiency of ICC and SMC interactions in generating consistent contractions reminiscent of peristalsis. It also argues that ENS at later embryonic stages might not be necessary for coordination of peristalsis.

      Strengths:

      The manuscript by Yagasaki, Takahashi, and colleagues represents an exciting new addition to the toolkit available for studying fundamental questions in the development and physiology of the hindgut. The authors carefully lay out the protocol for generating contractile gut organoids from chick embryonic hindgut, and perform a series of experiments that illustrate the broader utility of these organoids for studying the gut. This reviewer is highly supportive of the manuscript, with only minor requests to improve confidence in the findings and broader impact of the work. These are detailed below.

      Thank you very much for the constructive comments.

      Weaknesses:

      (1) Given that the literature is conflicting on the role GAP junctions in potentiating communication between intestinal cells of Cajal (ICCs) and smooth muscle cells (SMCs), the experiments involving CBX and 18Beta-GA are well-justified. However, because neither treatment altered contractile frequency or synchronization of Ca++ transients, it would be important to demonstrate that the treatments did indeed inhibit GAP junction function as administered. This would strengthen the conclusion that GAP junctions are not required, and eliminate the alternative explanation that the treatments themselves failed to block GAP junction activity.

      Thank you for these comments, and we agree. In the revised version, we have verified the drugs, CBX and 18b-GA, using dissociated embryonic heart cells in culture, a well-established model for the gap junction study (new Fig. S3D, line 237-239). Expectedly, both inhibitors abrogate the rhythmic beats of heart cells, and importantly, cells’ beats resume after wash-out of the drug.

      (2) Given that 5uM blebbistatin increases the frequency of contractions but 10uM completely abolishes contractions, confirming that cell viability is not compromised at the higher concentration would build confidence that the phenotype results from inhibition of myosin activity. One could either assay for cell death, or perform washout experiments to test for recovery of cyclic contractions upon removal of blebbistatin. The latter may provide access to other interesting questions as well. For example, do organoids retain memory of their prior setpoint or arrive at a new firing frequency after washout?

      We greatly appreciate these suggestions and also interesting ideas to explore! In the revised version, we have newly conducted washout experiments (new Fig. 6B) (10 µM drug is washed-out from culture medium), and found that contractions resume, showing that cell viability is not compromised at 10 µM concentration (line 257-259). Intriguingly, the resumed rhythm appears more regular than that before drug administration. Thus, the contraction rhythm of the organoid might be determined by cellcell interactions at any given time rather than by memory of their prior setpoint. This is an interesting issue we would like to further explore in the future. These issues, although potentially interesting, are not mentioned in the text of the revised version, since it is too early to interpret there observations.

      (3) Regulation of contractile activity was attributed to ICCs, with authors reasoning that Tuj1+ enteric neurons were only present in organoids in very small numbers (~1%).

      However, neuronal function is not strictly dependent on abundance, and some experimental support for the relative importance of ICCs over Tuj1+ cells would strengthen a central assumption of the work that ICCs the predominant cell type regulating organoid contraction. For example, one could envision forming organoids from embryos in which neural crest cells have been ablated via microdissection or targeted electroporation. Another approach would be ablation of Tuj1+ cells from the formed organoids via tetrodotoxin treatment. The ability of organoids to maintain rhythmic contractile activity in the total absence of Tuj1+ cells would add confidence that the ICCs are indeed the driver of contractility in these organoids.

      We agree. In the revised version, we have conducted TTX administration (new Fig. S2C). Changes in contractility by this treatment is not detected, supporting the argument that neural cells/activities are not essential for rhythmic contractions of the organoid (line 178-181).

      (4) Given the implications of a time lag between Ca++ peaks in ICCs and SMCs, it would be important to quantify this, including standard deviations, rather than showing representative plots from a single sample.

      In the revised version, we have elaborated a series of quantitative assessments as mentioned above (please see our responses to the “eLife assessments” at the beginning of these correspondences). The latency between Ca<sup>2+</sup> peaks in ICCs and SMCs is shown in new Fig. 4D, in which measured value is 700 msec-terraced since the time-lapse imaging was performed with 700 msec intervals (as already described in the first version).

      117 peaks for 14 organoids have been assessed (line 218).

      (5) To validate the organoid as a faithful recreation of in vivo conditions, it would be helpful for authors to test some of the more exciting findings on explanted hindgut tissue. One could explant hindguts and test whether blebbistatin treatment silences peristaltic contractions as it does in organoids, or following RCAS-GCAMP infection at earlier stages, one could test the effects of GAP junction inhibitors on Ca++ transients in explanted hindguts. These would potentially serve as useful validation for the gut contractile organoid, and further emphasize the utility of studying these simplified systems for understanding more complex phenomena in vivo.

      Thank you very much for insightful comments. We would love to explore these issues in near future. Just a note is that it was previously reported that Nifedipine silences peristaltic contractions in ex-vivo cultured gut (Chevalier et al., 2024; Der et al., 2000).

      (6) Organoid fusion experiments are very interesting. It appears that immediately after fusion, the contraction frequency is markedly reduced. Authors should comment on this, and how it changes over time following fusion. Further, is there a relationship between aggregate size and contractile frequency? There are many interesting points that could be discussed here, even if experimental investigation of these points is left to future work.

      It would indeed be interesting to explore how cell communications affect/determine the contraction rhythm, and our novel organoids must serve as an excellent model to address these fundamental questions. We have observed multiple times that when two organoids fuse, they undergo “pause”, and resume coordinated contractions as a whole, and we have mentioned such notice briefly in the revised version (line 282). To know what is going on during this pause time should be tempting. In addition, we have an impression that the larger in size organoids grow, the slower rhythm they count. We would love to explore this in near future.

      (7) Minor: As seen in Movie 6 and Figure 6A, 5uM blebbistatin causes a remarkable increase in the frequency of contractions. Given the regular periodicity of these contractions, it is a surprising and potentially interesting finding, but authors do not comment on it. It would be helpful to note this disparity between 5 and 10 uM treatments, if not to speculate on what it means, even if it is beyond the scope of the present study to understand this further.

      We assume that the increase in the frequency of contractions at 5 µM might be due to a shorter refractory period caused by a decreasing magnitude (amplitude) of contraction. We have made a short description in the revised text (line 256-257).

      (8) Minor: While ENS cells are limited in the organoid, it would be helpful to quantify the number of SMCs for comparison in Supplemental Figure S2. In several images, the number of SMCs appears quite limited as well, and the comparison would lend context and a point of reference for the data presented in Figure S2B.

      In the revised version, the number of SMCs has been counted and added in Fig. S2B. Contrary to that SMCs are more abundant than ICCs in an intact gut, the proportion is reversed in our organoid (line 181-183). It might due to treatments during cell dissociation/plating.

      (9) Minor: additional details in the Figure 8 legend would improve interpretation of these results. For example, what is indicated in orange signal present in panels C, G and H? Is this GCAMP?

      We apologize for this confusion. In the revised version, we have added labeling directly in the photos of new Fig. 9 (old Fig. 8). For C, G and H, the left photo is mRuby3+GCaMP6s, and the right one is GCaMP6s only.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments for the authors to consider:

      (1) Figure 4C: The authors propose that calcium signals propagate from ICC to SMC based on the results presented in this figure. While it is observed that the peak of the calcium signal in ICC precedes that in SMC, it's worth noting that the onset of the rise in calcium signals occurs simultaneously in ICC and SMC. Doesn't this suggest that they are activated simultaneously? The latency observed for the peaks of calcium signals could reflect different kinetics of the rise in calcium concentration in the two types of cells rather than the order of calcium signal propagation.

      We greatly appreciate these comments. We have re-examined kinetics of GCaMP signals in ICC and SMC, but we did not succeed in validating rise points precisely. We agree that the possibility that the rise in calcium signals could be occurring simultaneously. To clarify these issues, analyses with higher resolution is required, such as using GCaMP6f or GCaMP7/8. Nevertheless, the disappearance of the latency of Ca<sup>2+</sup> peak by CBX implies a role of gap junction in ICC to SMC signaling. In the revised version, we replaced the wording “rise” by “peak” when the latency is discussed.

      (2) Figure 5C: The specific elimination of the latency in the calcium signal peaks between ICC and SMC is interesting. However, I am curious about how gap junction inhibitors specifically eliminate the latency between ICC and SMC without affecting other aspects of calcium transients in these cells, such as amplitude and synchronization among ICCs and/or SMCs. Readers of the manuscript would expect some discussion on possible mechanisms underlying this specificity. Additionally, I wonder if the elimination of the latency was observed consistently across all samples examined. The authors should provide information on the frequency and number of samples examined, and whether the elimination occurs when 18-beta-GA is used.

      In the revised version, we have elaborated quantitative demonstration. For the effects by CBX on latency or Ca<sup>2+</sup> peaks, a new graph has been added to new Fig 5, in which 100 µM eliminated the latency. Intriguingly, the latency appears to be attributed to a gap junction that is not inhibited by18-beta-GA (please see new Fig. S3E). As already mentioned above, inhibiting activity of both CBX and 18-beta-GA has been verified using dissociated cells of embryonic heart, a popular model for gap junction studies.

      At present, we do not know how gap junction(s) contribute to the latency of Ca<sup>2+</sup> peaks without affecting synchronization among ICCs and/or SMCs (we have not addressed amplitude of the oscillation in this study). Actually, it was surprising to us to find that GJ’s contribution is very limited. We do not exclude the importance of GJs, and currently speculate that GJs might be important for the initiation of contraction/oscillation signals, whereas the requirement of GJs diminishes once the ICC-SMC interacting rhythm is established. What we observed in this study might be the synchronization signals AFTER these interactions are established (Day 7 of organoidal culture). Upon the establishment, it is possible that mechanical signaling elicited by smooth muscles’ contraction might become prominent as a mediator for the (stable) synchronization, as implicated by experiments with blebbistatin and Nifedipin, the latter being newly added to the revised version (new Fig. 7). We have added such speculation, although briefly in Discussion (line 374-377)

      (3) Figure 6: The significant effects of blebbistatin on calcium dynamics in both ICC and SMC are intriguing. However, since only one blocker is utilized, the specificity of the effects is unclear. If other blockers for muscle contraction are available, they should be employed. Considering that a rise in calcium concentration precedes contraction, calcium transients should persist even if muscle contraction is inhibited. One concern is whether blebbistatin inadvertently rendered the cells unhealthy. The authors should demonstrate at least that contraction and calcium transients recover after removal of the drug. The frequency and number of samples examined should be shown, as requested for Figure 5C above.

      Thank you for these critical comments. A possible harmfulness of the drugs was also raised by other reviewers, and we have therefore conducted wash-out experiments in the revised version (new Fig. 6B). Contractions resume after wash-out showing that cell viability is not compromised at 10 µM concentration. The number of samples examined has been described more explicitly in the revised version. Regarding the blocker of SMC, we have newly carried out pharmacological assays using nifedipine, a blocker of a L-type Ca<sup>2+</sup> channel known to operate in smooth muscle cells (new Fig 7) (Chevalier et al., 2024; Der et al., 2000). As already explained in the “Responses to eLife assessment”, the treatment abrogated ICCs’ rhythm and synchronous Ca<sup>2+</sup> transients between ICCs and SMCs, further corroborating our model that not only ICC-to-SMC interactions but also SMC-to-ICC feedback signals are operating to achieve coordinated/stable rhythm of gut contractile organoids of Day 7 culture (please also see our responses shown above for Comment (2)).

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (1) The claim that organoids contain functional SMCs and ICCs is insufficient as it currently relies on only c-Kit and aSMA antibodies. This conclusion could be additionally supported by staining with other markers of contractile smooth muscle (e.g. TAGLN and MYH14) and an additional accepted marker of ICCs (e.g. ANO1/TMEM16). Moreover, it should be demonstrated whether these cells are PDGFRA+, as PDGFRA is a known marker of other mesenchymal fibroblast cell types. These experiments would additionally rule out whether these cells were simply less differentiated myofibroblasts. Given that there might not be available antibodies that react with chicken protein versions, the authors could support their conclusions using alternative approaches, such as fluorescent in situ hybridization. A more thorough approach, such as single-cell RNA sequencing to compare the cell composition of the in vitro organoids to the in vivo colon, would fully justify the use of these organoids as a system for studying in vivo cell physiology.

      With these suggestions provided, we have newly stained contractile organoids with anti-desmin antibody, known to be a marker for differentiated SMCs. As shown in new Fig. 3B, desmin-positive cells perfectly overlapped with aSMA-staining, indicating that the peripherally enclosing cells are SMCs. Regarding the interior cells, as this Reviewer concerned, there are no antibodies against ANO1/TMEM16 which are available for avian specimens. The anti- c-Kit antibody used in this study is what we raised in our hands by spending years (Yagasaki et al., 2021)), in which the antibody was carefully validated in intact guts of chicken embryos by multiple methods including Western Blot analyses, immunostaining, and in situ hybridization. We have attempted several times to perform organoidal whole-mount in situ hybridization for expression of PDGFRα, but we have not succeeded so far. In addition, as explained to the Editor, the very unhealthy condition of purchased eggs these past 7 months did not allow us to continue any further. We are planning to interrogate cell types residing in the central area of the organoid, results of which will be reported in a separate paper in near future.

      (2) The key ICC-SMC relationship and physiological interaction seems to arise developmentally, but the mechanisms of this transition are not well defined (Chevalier 2020). To further support the claim that ICC-SMC interactions can be interrogated in this system, this study would benefit from establishing organoids at distinct developmental stages to (a) show that they have unique contractile profiles, and (b) demonstrate that they evolve over time in vitro toward an ICC-driven mechanism.

      We agree with these comments. We tried to prepare gut contractile organoids derived from different stages of development, and we had an impression that slightly younger hindguts are available for the organoid preparations. In addition, not only the hindgut, but also midgut and caecum also yield organoids. However, since formed organoids derived from these “non-E15 hindgut” vary substantially in shapes, contraction frequencies/amplitudes etc., we are currently not ready to report these preliminary observations. Instead, we decided to optimize and elaborate in vitro culture conditions by focusing on the E15 hindgut, which turned out to be most stable in our hands. Nevertheless, it is tempting to see how organoid evolves over time during gut development.

      (3) This manuscript would be greatly enhanced by a functional examination of the prospective organoid ICCs. For example, the authors could test whether the c-Kit inhibitor Imatinib, which has previously been used to impair ICC differentiation and function in the developing chick gut (Chevalier 2020), has an effect on contractility at different stages.

      Following the paper of (Chevalier 2020), we had already conducted similar experiments with Imatinib in the culture with our organoids, but we did not see detectable effects. In that paper, the midgut of younger embryos was used, whereas we used E15 hindgut to prepare organoids. It would be interesting to see if we add Imanitib earlier during organoidal formation, and this is a next step to go.

      (4) It is claimed that there is a 690s msec delay in SMC spike relative to ICC spike, however, it is unclear where this average is derived from and whether the organoid calcium trace shown in Figure 4C is representative of the data. The latency quantification should be shown across multiple organoids, and again in the case of carbenoxolone treatment, to better understand the variations in treatment.

      We apologize that the first version failed to clearly demonstrate quantitative assessments. In the revised version, we have elaborated quantitative assessments (117 peaks for 14 organoids) (line 216-218). In new Fig. 4D, measured value is 700 msecterraced since as already mentioned in the first version, the time-lapse imaging was performed with 700 msec intervals.

      (5) As above, a larger issue is that only single traces are shown for each organoid. This makes it challenging to understand the variance in contractile properties across multiple organoids. While contraction frequencies are shown several times, the manuscript would benefit from additional quantifications, such as rhythm (average wavelength between events) in control and perturbed conditions.

      We have substantially elaborated quantitative assessments (please also see our responses to the “Public Review”). In particular, in place of contraction numbers/time, we have plotted “contraction intervals” between two successive peaks (Fig. 2B and others). Actually, we have tried to perform a periodicity analysis of organoid contractions. Unfortunately, no clear value has been obtained, probably because the contractions/Ca<sup>2+</sup> transitions are not as “regularly periodical” as seen in conventional physics. This led us to perform the peak-interval analysis. Methods to quantify the contraction intervals are carefully explained in the revised version.

      (6) The synchronicity observed between ICCs and SMCs within the organoid is interesting, and should be emphasized by making analyses more quantitative so as to understand how consistent and reproducible this phenomenon is across organoids. Moreover, one of the most exciting parts of the study is the synchronicity established between organoids in the hydrogel system, but it is insufficiently quantified. For example, how rapidly is pacemaking synchronization achieved?

      As we replied above to (5), and described in the responses to the “Public Review”, we have substantially elaborated quantitative assessments in the revised version. Concerning the synchronicity between ICCs and SMCs, our data explicitly show that as long as the organoid undergoes healthy contraction, they perfectly match their rhythm (Fig. 4) making it difficult to display quantitatively. Instead, to demonstrate such synchronicity more convincingly, we have carefully described the number of peaks and the number of independent organoids we analyzed in each of Figure legends. In the experiments with hydrogels, the time required for two organoids to start/resume synchronous contraction varies greatly. For example, for the experiment shown in new Fig 9F, it takes 1 day to 2 days for cells crawling out of organoids and cover the surface of the hydrogel. In the experiments shown in new Fig. 8, two organoids undergo “pause” before resuming contractions. In the revised version, we have briefly mentioned our notice and speculation that active cell communications take place during this pausing time, (line 282-283 in Result and line 437-439 in Discussion). We agree with this reviewer saying that the pausing time is potentially very interesting. However, it is currently difficult to quantify these phenomena. More elaborate experimental design might be needed.

      (7) Smooth muscle layers in vivo are well organized into circular and longitudinal layers. To establish physiological relevance, the authors should demonstrate if these organoids have multiple layers (though it looks like just a single outer layer) and if they show supracellular organization across the organoid.

      The immunostaining data suggest that peripherally lining cells are of a single layer, and we assume that they might be aligned in register with contracting direction. However, to clarify these issues, observation with higher resolution would be required.

      (8) To further examine whether the organoids contain true functional ICCs, the authors should test whether their calcium transients are impacted by inhibitors of L-type calcium channels, such as nifedipine and nicardipine. These channels have been demonstrated to be important for SMCs but not ICCs, so one might expect to see continued transients in the core ICCs but a loss of them in SMCs (Lee et al., 1999; PMID: 10444456)

      We appreciate these comments. We have accordingly conducted new experiments with Nifedipine. Contrary to the expectation, Nifedipine ceases not only organoidal contractions, but also ICC activities (and its resulting synchronization) (new Fig. 7). These findings actually corroborate our model already mentioned in the first version that ICCs receive mechanical feedback from SMC’s contraction to stably maintain their oscillatory rhythm. We believe that the additional findings with Nifedipine have improved the quality of our paper. Concerning the central cells in the organoid, we have additionally used anti-desmin antibody known to mark differentiated SMCs. Desmin signals perfectly overlap with those of aSMA in the peripheral single layer, supporting that the peripheral cells are SMCs and central cells are ICCs. The anti c-Kit antibody used in this study is what we raised in our hands by spending years (Yagasaki et al., 2021)), in which the antibody was carefully validated in intact guts of chicken embryos by multiple methods including Western Blot analyses, immunostaining, and in situ hybridization.

      ANO1/TMEM16 are known to stain ICCs in mice. Antibodies against ANO1/TMEM16 available for avian specimens are awaited.

      (9) Despite Tuj1+ enteric neurons only making up a small fraction of the organoids, the authors should still functionally test whether they regulate any aspect of contractility by treating organoids with an inhibitor such as tetrodotoxin to rule out a role for them.

      Thank you for these advices, which are also raised by other reviewers. We have conducted TTX administration (new Fig. S2C). Changes in contractility by this treatment is not detected, supporting the argument that neural cells/activities are not essential for rhythmic contractions of the organoid (line 178-181).

      (10) Finally, the manuscript is written to suggest that the focus of the study is to establish a system to interrogate ICC-SMC interactions in gut physiology and peristalsis. However, the organoids designed in this study are derived from the fetal precursors to the adult cell types. Thus, they might not accurately portray the adult cell physiology. I don't believe that this is a downfall, but rather a strength of the study that should be emphasized. That is, the focus could be shifted toward stressing the power of this new system as a reductionist, self-organizing model to examine the developmental emergence of contractile synchronization in the intestine - in particular that arising through ICC-SMC interactions.

      We appreciate these advices. In the revised MS, we are careful so that our findings do not necessarily portray the physiological functions in adult gut.

      Minor:

      More technical information could be used in the methods:

      (1) What concentration of Matrigel is used for coating, and what size were the wells that cells were deposited into?

      We have added, “14-mm diameter glass-bottom dishes (Matsunami, D11130H)” and “undiluted Matrigel (Corning, 354248) at 38.5°C for 20 min” (line 471473).

      (2) How were organoids transferred to the hydrogels? And were the hydrogels coated?

      We have added “Organoids were transferred to the hydrogel using a glass capillary” (line 560-561).

      (3) Tests for significance and p values should be added where appropriate (e.g. Figure S3B).

      We have added these in Figure legend of new Fig. S3.

      Reviewer #3 (Recommendations For The Authors):

      This is an exciting study, and while the majority of our comments are minor suggestions to improve the clarity and impact of findings, it would be important to verify the effective disruption of GAP junction function with CBX or 18Beta-GA treatments before concluding they are not required for coordination of contractility and initiation by ICCs. It is possible that sufficient contextual support exists in the literature for the nature of treatments used, but this may need to be conveyed within the manuscript to allay concerns that the results could be explained by ineffective inhibition of GAP junctions.

      Thank you very much for these advices. In the revised version, we have newly carried out experiments with dissociated embryonic heart cells cultured in vitro, a model widely used for gap junction studies (Fig. S3D). Both CBX or 18b-GA exert efficient inhibiting activity on contractions of heart cells. We have added the following sentence, “The inhibiting activity of the drugs used here was verified using embryonic heart culture (line 237-239)”.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Comments on revisions)

      The authors have done a good job at revising the manuscript to put this work into the context of earlier work on brainstem central pattern generators.

      Thank you.

      I still believe the case for the method is not as convincing as it would have been if the method had been validated first on oscillations produced by a known CPG model. Why would the inference of synaptic types from the model CPG voltage oscillations be predetermined? Such inverse problems are quite complicated and their solution is often not unique or sufficiently constrained. Recovering synaptic weights (or CPG parameters) from limited observations of a highly nonlinear system is not warranted (Gutenkunst et al., Universally sloppy parameter sensitivities in systems biology models, PLoS Comp. Biol. 2007; www.doi.org/10.1371/journal.pcbi.0030189) especially when using surrogate biological models like Hodgkin-Huxley models.

      The model of the CPG is irrelevant for such a test of validity because what we reconstruct are postsynaptic conductances of an individual neuron. The network creates a periodic input to this neuron and thus forms a periodic pattern of excitatory and inhibitory conductances. The nature of this input, whether autonomously generated or created artificially (say by periodic optogenetic stimulation), is generally not important. To illustrate this, we used a one-compartment conductance-based (Hodgkin-Huxley style) model neuron incorporating a certain common set of channels (fast sodium (I<sub>NaF</sub>), potassium delayed rectifier (I<sub>Kdr</sub>), persistent sodium (I<sub>NaP</sub>), calcium-dependent potassium (I<sub>KCa</sub>), and cationic non-specific current (I<sub>CAN</sub>)), as well as excitatory and inhibitory synaptic channels whose conductances were implemented as predefined periodic functions. The test suggested by the reviewer would be to implement a current-step protocol similar to the experiments and apply our technique to see if the reconstructed conductance profiles match those predefined functions. Below we show the reconstruction steps for the following arbitrarily chosen pattern:

      𝑔<sub>𝐸𝑋𝐶</sub>(𝑡) /𝑔<sub>𝐿𝐸𝐴𝐾</sub> = 0.1(1 + sin(π𝑡)) and 𝑔<sub>𝐼𝑁𝐻</sub>(𝑡)/𝑔<sub>𝐿𝐸𝐴𝐾</sub> = 0.1 (1 + cos(π𝑡)). Author response image 1 below shows the baseline activity of this model neuron in the absence of the injected current.

      Author response image 1.

      Then we applied a current-step protocol with four steps producing different levels of hyperpolarization and applied our method by calculating the total conductance using linear regression (see the current-voltage plots below) and then decomposing it into the excitatory and inhibitory components.

      Author response image 2.

      As one can see, the reconstructed conductances in Author response image 3 below are nearly identical to their theoretical profiles. This is not surprising because all voltage-dependent currents in the model neuron were inactive in the range of voltages matching our experimental conditions. Therefore, the model could be reduced to just the leak current, synaptic currents and the injected current, which matches precisely the model we used in our manuscript.

      Author response image 3.

      In p.2, the edited section refers to the interspike interval being much smaller than the period of the network. More important is to mention the relationship between the decay time of inhibitory synapses and the period of the network.

      This interpretation misunderstands the focus of our method. The edited sections (including in the theory section of Results) highlight the conditions under which the capacitive current becomes negligible, emphasizing that the membrane time constant must be much smaller than the network oscillation period. This separation of time scales ensures that the membrane potential adjusts quickly to changes in postsynaptic conductance, rendering the capacitive current insignificant over the network’s rhythm. In contrast, the synaptic decay time governs how presynaptic inputs are transduced into postsynaptic conductances—a process relevant to understanding synaptic dynamics but not directly tied to our method’s core objective. Our approach reconstructs postsynaptic conductances from intracellular recordings, not presynaptic spike trains. While interpreting these conductance profiles in terms of specific synaptic connections would indeed involve synaptic decay dynamics, such an analysis exceeds the scope of our paper. Thus, the condition emphasized in the edited sections—concerning the membrane time constant and network period—is the critical one for our method’s applicability, and the synaptic decay time, while relevant to broader synaptic modeling, does not undermine our conclusions.

      We have added the requirement for a much smaller membrane time constant in the Introduction on page 2. The Results theory section already incorporates an extensive discussion of this requirement.

      Comments from the editors:

      We apologize for the delay in coming to this decision, but there was quite a bit of post-review discussion that needed to be resolved. There are two issues that the reviewers agree should be addressed. They remain unconvinced that the simplifying assumptions of the approach are valid. 1) The main issue with the phase argument is that the biological synaptic conductance depends on time and not on the phase of the respiratory cycle as mentioned in the first round of reviews. The approximation g(t)=g(phase) seems to be far too simple to be biologically realistic.

      As we elaborate below, time and phase are fundamentally and mathematically equivalent representations of the same underlying dynamics in a periodic system, and thus, a phase-based representation—where conductances are expressed as functions of the cycle’s phase—is a justified and effective approach for capturing their behavior. We have added this explanation to the theory section of Results. Below are the bases for our assertion.

      In a periodic system, such as the respiratory CPG, the system’s behavior repeats at regular intervals, defined by a period T. For the respiratory cycle in our experimental preparation, this period is approximately 3–4 seconds, encompassing phases like inspiration, post-inspiration, and expiration. In such systems:

      Time (t) is a continuous variable that progresses linearly.

      Phase (φ) represents the position within one cycle, typically normalized between 0 and 1 (or 0 to 2π in some contexts). It can be mathematically related to time via: φ(t) = (t mod T)/T, where (t mod T) is the time elapsed within the current cycle.

      Because the system is periodic, any variable that repeats with period T—such as synaptic conductance in a rhythmically active network—can be expressed as a function of either time or phase. Specifically, if g(t) is periodic with period T, then g(t) = g(t+T). This periodicity allows us to redefine g(t) in terms of phase: g(t) = g(φ(t)), where φ(t) maps time onto a repeating cycle. Thus, in a periodic system, time and phase are fundamentally equivalent representations of the same underlying dynamics. Saying that synaptic conductance depends on phase is mathematically equivalent to saying it depends on time in a periodic manner.

      In a rhythmically active network like the respiratory central pattern generator (CPG), the synaptic conductances, regardless of the specific mechanisms by which they are formed, exhibit periodicity that matches the network’s oscillatory cycle. This occurs because the conductances are driven by the repetitive activity of presynaptic neurons, which are synchronized to the network’s overall rhythm. As a result, the synaptic conductances vary with the same period as the network, making a phase-based representation—where conductances are expressed as functions of the cycle’s phase—a justified and effective approach for capturing their behavior. In our study, we utilized the in situ arterially perfused brainstem-spinal cord preparation from mature rats, which is known to produce a highly periodic respiratory rhythm. To ensure the consistency of this periodicity, we carefully selected recordings where the coefficient of variation of the respiratory cycle period was less than 10%, as outlined in our methods. This strict selection criterion confirms the stability and regularity of the rhythm, supporting the validity of using a phase representation to analyze the synaptic conductances.

      (2) Figure S1 is problematic. First, the currents injected appear to be infinitesimally small.

      There was a typo in the current units, which should be nA and not pA, as evident from the injected current–membrane potential plots in Figure 1B. Figure S1 has been corrected.

      Second, the input resistance is completely independent of voltage, as though there was little or no contribution from hyperpolarization activated currents, which would be surprising.

      While hyperpolarization-activated currents are indeed present in many neuronal types and could theoretically affect input resistance, our data consistently show linear I-V relationships across the voltage range tested (-60 to -100 mV) for the neurons analyzed (see Figure S1 and Author response image 4-9 below). This linearity suggests that, under our experimental conditions, the contribution of voltage-dependent currents, such as h-currents, is negligible within this range.

      Additionally, we now indicate in the manuscript in the theory section of Results how the presence of significant hyperpolarization-activated h-currents would impact our synaptic conductance reconstruction method. In current-clamp recordings, non-linearity from h-currents could introduce voltage-dependent changes in total conductance unrelated to synaptic inputs, potentially skewing the reconstruction. However, this concern does not apply to voltage-clamp recordings, where the membrane potential is held constant, eliminating contributions from voltage-dependent intrinsic currents. As strong evidence of the minimal influence of h-currents, we directly compared synaptic conductance reconstructions using both current-clamp and voltage-clamp protocols in a subset of neurons. The results from these two approaches were highly consistent, indicating that h-currents do not significantly affect our findings. This robustness across experimental methods reinforces the reliability of our conclusions.

      Together, the linear I-V relationships and the agreement between current- and voltage-clamp reconstructions provide compelling evidence that our method accurately captures synaptic conductances without interference from h-currents.

      Typical examples of I-V relationships for each respiratory neuron firing phenotype:

      Author response image 4.

      ramp-I

      Author response image 5.

      pre-I/I

      Author response image 6.

      post-I

      Author response image 7.

      aug-E

      Author response image 8.

      early-I

      Author response image 9.

      late-I

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study aims to create a comprehensive repository about the changes in protein abundance and their modification during oocyte maturation in Xenopus laevis.

      Strengths:

      The results contribute meaningfully to the field.

      Weaknesses:

      The manuscript could have benefitted from more comprehensive analyses and clearer writing. Nonetheless, the key findings are robust and offer a valuable resource for the scientific community.

      We would like to thank the reviewer for his/her positive feedback on our article. The public review points out that "The manuscript could have benefitted from more comprehensive analyses and clearer writing." We have rewritten several sections and provided more detailed explanations of the analysis and interpretation of some data (see below for details). We have also followed all of the reviewer's recommendations, some of which specifically highlighted areas lacking clarity. We would also like to thank the reviewer for pointing out some errors, for which we apologize, and which have now been corrected. We sincerely appreciate the reviewer's thorough work, as it has greatly enhanced the clarity and precision of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors analyzed Xenopus oocytes at different stages of meiosis using quantitative phosphoproteomics. Their advanced methods and analyses revealed changes in protein abundances and phosphorylation states to an unprecedented depth and quantitative detail. In the manuscript they provide an excellent interpretation of these findings putting them in the context of past literature in Xenopus as well as in other model systems.

      Strengths:

      High quality data, careful and detailed analysis, outstanding interpretation in the context of the large body of the literature.

      Weaknesses:

      Merely a resource, none of the findings are tested in functional experiments.

      I am very impressed by the quality of the data and the careful and detailed interpretation of the findings. In this form the manuscript will be an excellent resource to the cell division community in general, and it presents a very large number of hypotheses that can be tested in future experiments. Xenopus has been and still is a popular and powerful model system that led to critical discoveries around countless cellular processes, including the spindle, nuclear envelope, translational regulation, just to name a few. This also includes a huge body of literature on the cell cycle describing its phosphoregulation. It is indeed somewhat frustrating to see that these earlier studies using phosphomutants and phospho-antibodies were just scratching the surface. The phosphoproteomics analysis presented here reveals much more extensive and much more dynamic changes in phosphorylation states. Thereby, in my opinion, this manuscript opens a completely new chapter in this line of research, setting the stage for more systematic future studies.

      We thank the reviewer for his/her extremely positive comments. The public review points out that "none of the findings are tested in functional experiments." This is entirely accurate. We focused our work on obtaining the highest quality proteomic and phosphoproteomic data possible, and then sought to highlight these data by connecting them with existing functional data from the literature. This approach has opened up research avenues with enormous, previously unforeseen potential, in a wide range of biological fields (cell cycle, meiosis, oogenesis, embryonic development, cell biology, cellular physiology, signaling, evolution, etc.). We chose not to delay publication by experimentally investigating the narrow area in which we are specialists (meiotic maturation), while our data offer a vast array of research opportunities across various fields. Our goal was, therefore, to present this extensive dataset as a resource for different scientific communities, who can explore their specific biological questions using our data. This is why we submitted our article to the "Repository" section of eLife. Nevertheless, in the context of the comparative analysis of the mouse and Xenopus phosphoproteomes performed at the reviewer’s request, we felt it was important to complement this new section with functional experiments that not only validate the proteomic data but also provide new insights into certain proteins and their regulation by Cdk1 (new paragraph lines 824-860 and new Figure 9).

      We are also grateful to the reviewer for the recommendation to improve the manuscript by including more comparisons between our Xenopus data and those from other systems. We have followed this suggestion (see below), which has significantly enriched the article (new paragraph lines 824-860 and new Figure 9).

      Reviewer #3 (Public review):

      Summary:

      The authors performed time-resolved proteomics and phospho-proteomics in Xenopus oocytes from prophase I through the MII arrest of the unfertilized egg. The data contains protein abundance and phosphorylation sites of a large number set of proteins at different stages of oocyte maturation. The large sets of the data are of high quality. In addition, the authors discussed several key pathways critical for the maturation. The data is very useful for the researchers not only researchers in Xenopus oocytes but also those in oocyte biology in other organisms.

      Strengths:

      The data of proteomics and phospho-proteomics in Xenopus oocyte maturation is very useful for future studies to understand molecular networks in oocyte maturation.

      Weaknesses:

      Although the authors offered molecular pathways of the phosphorylation in the translation, protein degradation, cell cycle regulation, and chromosome segregation. The author did not check the validity of the molecular pathways based on their proteomic data by the experimentation.

      We thank the reviewer for his/her positive comments. The public review points out that "The author did not check the validity of the molecular pathways based on their proteomic data by the experimentation." This is entirely accurate. We focused our work on obtaining the highest quality proteomic and phosphoproteomic data possible, and then sought to highlight these data by connecting them with existing functional data from the literature. This approach has opened up research avenues with enormous, previously unforeseen potential, in a wide range of biological fields (cell cycle, meiosis, oogenesis, embryonic development, cell biology, cellular physiology, signaling, evolution, etc.). We chose not to delay publication by experimentally investigating the very narrow area in which we are specialists (meiotic maturation), while our data offer a vast array of research opportunities across various fields. Our goal was, therefore, to present this extensive dataset as a resource for different scientific communities, who can explore their specific biological questions using our data. This is why we submitted our article to the "Repository" section of eLife. Nevertheless, in the context of the comparative analysis of the mouse and Xenopus phosphoproteomes performed at the reviewer’s request, we felt it was important to complement this new section with functional experiments that not only validate the proteomic data but also provide new insights into certain proteins and their regulation by Cdk1 (new paragraph lines 824-860 and new Figure 9).

      We have also followed all of the reviewer's recommendations and thank him/her, as the suggestions have significantly enhanced the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Fig. 1 -> In the Figure legend "mPRβ" is called "mPRb". In the Figure, it is indicated that PKA substrates are always activated by the phosphorylation. As the relevant substrates and the mode-of-action of the Arpp19 phosphorylation are not clear at the moment, this seems to be preliminary. It could for example also be conceivable that PKA phosphorylation inhibits a translation activator. In addition, the PG-dependent translation of RINGO/Speedy should be included in the model.

      We fully agree with the reviewer. PKA substrates can either be activators of the Cdk1 activation pathway, which are inhibited by phosphorylation by PKA, or repressors of the same pathway, which are activated by phosphorylation by PKA. This is now illustrated in the new Fig. 1. In addition, we have also included RINGO/Speedy in the model and in the text (lines 78-79) and corrected "mPRb" in the legend.

      (2) Lane 51-52 -> it is questionable if the meiotic divisions can be called "embryonic processes"

      We agree with the reviewer comment, and we have removed the word “embryonic”.

      (3) Lane 53 and lane 106-107 -> recent data have indicated that transcription already starts during cell cycle 12 and 13 in most cells (e.g. Blitz and Cho: Control of zygotic genome activation in Xenopus (2021))

      We apologize for this mistake. The text has been corrected and the reference added (lines 53 and 107).

      (4) Lane 61-62 -> "MI" and "MII" are given as abbreviation for "first and second meiotic spindle"

      The text has been clarified to explain that MI is referred to metaphase I and MII stands for metaphase II (lines 61-64).

      (%) Lane 131-132 -> "single-cell" is mentioned redundantly in this sentence.

      The sentence has been corrected (lines 131-132).

      (6) Fig. 2B -> it is not explained what is plotted as "Average levels" on the x-Axis. Is it the average of expression over all samples or at a given time point? Are the values given as a concentration or are the values normalized? If so, how were they normalized?

      We agree with the reviewer comment that “Average levels” may have been unclear. In the new Fig. 2B, we have re-plotted the graph using the average protein concentration during meiosis, measured as described in the Methods section.

      (7) In Fig. 2-supplement 3E -> from the descriptions it is not entirely clear to me what the difference to the data in Fig. 2B is?

      We thank the reviewer for his/her question regarding the relationship between the data in Fig. 2B and Fig. 2-supplement 3E. We confirm that the raw data visualized in Fig. 2-supplement 3E are the same as those in Fig. 2B. However, in Fig. 2-supplement 3E, the data are color-coded differently to highlight the number of proteins whose concentrations change during meiotic divisions, based on the threshold adopted. The legend of Fig. 2-supplement 3E has been modified to clarify this point.

      (8) Lane 225-226 -> Kifc1 is a minus-end directed motor

      This mistake has been corrected (lines 232-233).

      (9) Lane 271 -> Serbp1, here mentioned to be involved in stabilization of mRNAs, has also been implicated in the regulation of ribosomes (e.g. Leesch et al. 2023). Regarding the overall topic of this manuscript, this could be mentioned as well.

      We agree with the referee that the important role of Serbp1 in the control of ribosome hibernation needs to be mentioned. We have included this point in the revised manuscript together with the reference (lines 277-279).

      (10) Lane 360-363 -> it is mentioned that APPL1 and Akt2 act "to induce meiosis". Furthermore, in the Nader et al. 2020 paper, Akt2 phosphorylation is reported to happen within 30min after PG treatment. In the present work, they only seem to get phosphorylated when Cdk1 is activated. Is there an explanation for this discrepancy?

      Indeed, Nader et al. (2020) indicate that Akt2 is phosphorylated on Ser473 (actually, they should have mentioned Ser474, which is the phosphorylated residue on Akt2; Ser473 corresponds to the numbering of Akt1) between 5 and 30 minutes post-Pg, which supports their hypothesis of an early role for this kinase. However, these conclusions should be taken with caution, considering that their functional experiment using antisense against Akt2 depletes only 25% of the protein, the antibody used to visualize Akt2 phosphorylation also recognizes phosphorylated Akt1 and Akt3, and they did not analyze phosphorylation of the protein after 30 minutes. Therefore, we cannot determine whether the level observed at 30 minutes represents a maximum or if it is just the onset of the phosphorylation that peaks later, possibly after activation of Cdk1, for example.

      Regarding our measurements: we clearly observe phosphorylation of Akt2 following Cdk1 activation on Ser131. We did not detect Akt2 phosphorylation on Ser474, but since our measurements started 1 hour post-Pg, this protein may have returned to a dephosphorylated state on Ser474.

      Therefore, the observations of Nader et al. and ours involve different residues and different phosphorylation kinetics, Nader et al. limiting their analysis to the first 30 minutes, whereas we started at 1 hour.

      We have revised the manuscript text to make these aspects clearer (lines 387-392).

      (11) Fig. 3B -> it could be made clearer in the Figure that all these sites belong to class I

      A title “Class I proteins” has been added in Fig. 3B to clarify it.

      (12) Lane 433-434 -> the authors write that the proteomic data of this study confirm that PATL1 is accumulating during meiotic maturation. However, in Fig. 2B PATL1 is not among the significantly enriched proteins.

      We apologize for this error. Indeed, PATL1 protein is not significantly enriched. The text has been corrected (lines 461-465).

      (13) Fig. 4B -> Zar2 is color-coded to increase in abundance. This is clearly different to published results and what is shown in Fig. 2B of this manuscript.

      Indeed, our dataset shows that the quantity of Zar2 decreases. This does not appear anymore in Figure 2B since Zar2 average concentration cannot be estimated. We made an error in the color coding, which has now been corrected in Figure 4B.

      (14) Lane 442-444 -> it might be worth mentioning that the interaction between CPEB1 and Maskin, and thus probably its role in regulation of translation, could not be reproduced in other studies (Minshall et al.: CPEB interacts with an ovary-specific eIF4E and 4E-T in early Xenopus oocytes (2007) or Duran-Arque et al.: Comparative analyses of vertebrate CPEB proteins define two subfamilies with coordinated yet distinct functions in post-transcriptional gene regulation (2022)).

      This clarification is now mentioned in the text, supported by the two references that have been added (lines 471-477).

      (15) Lane 483-485 -> The meaning of these sentences is not entirely clear to me. What exactly is the similarity with the function of Emi1? What does "...binding of Cyclin B1..." mean (binding to which other protein?). What is the similarity between Emi1 and CPEB1/BTG4, both of which are regulators of mRNA stability/polyadenylation?

      We apologize if these sentences were unclear. Our intention was to emphasize the central role of ubiquitin ligases in regulating multiple events during meiotic divisions. We used SCF<sup>βTrCP</sup>, a wellstudied ubiquitin ligase in Xenopus and mouse oocytes during meiosis, as an example. SCF<sup>βTrCP</sup> regulates the degradation of several substrates, including Emi1, Emi2, CPEB1, and Btg4, whose degradation or stabilization is essential for the proper progression of meiosis. Lastly, we highlighted that these regulatory processes, mediated by protein degradation, may be conserved in mitosis, as for example the destruction of Emi1. We have rewritten this paragraph for clarity (lines 513-518).

      (16) Lane 521-522 and 572-573 -> the authors write that Myt1 was not detected in their proteome. However, in Fig. 6A they list "pkmyt1" as a class II protein. On Xenbase, "pkmyt1" is the Cdk1 kinase, "Myt1" is a transcription factor, so the authors might have been looking for the wrong protein.

      We thank the reviewer for this accurate observation. We have modified the text to correct this error (lines 554 and 607).

      (17) Lane 564-565 -> The authors state that Cdk1 activity can be measured by analyzing Cdc27 S428 phosphorylation. However, in vivo the net phosphorylation of a site is always depending on the relevant kinase and phosphatase activities. As S428 is a Cdk1 site, it is not unlikely that it is dephosphorylated by PP2A-B55, which by itself is under the control of Cdk1. Do the authors have direct evidence that the change in phosphorylation of S428 can only be attributed to the changes in Cdk1 activity?

      There is evidence in the literature that Cdc27 is dephosphorylated by PP2A (Torres et al., 2010). In Xenopus oocytes, PP2A activity is high during prophase (Lemonnier et al., 2021) and decreases at the time of Cdk1 activation, mediated by the Greatwall-ENSA/Arpp19 system, remaining low until MII (Labbé et al., 2021). Therefore, the period where fluctuations in Cdk1 activity are difficult to assess, from NEBD to MII, corresponds to a phase of inhibited PP2A activity. As a result, the phosphorylation level of Cdc27 reflects primarily the activity of Cdk1. We have added this clarification in the text (lines 597-600).

      (18) Fig. 7C and 7D -> in 7C, for Nup35/Nup53 there is a phospho-peptide GIMEVRS(60)PPLHSGG. In Fig. 7D phosphorylation of GVMEMRS(59)PLFSGG is analyzed. Is this the same phosphosite/region of Nup35/Nup53? How can there be a slightly different version of the same peptide in one protein? Are these the L- and S-version of Nup35/Nup53? It is also very surprising that the two phosphosites belong to different classes, class III and class II, respectively.

      We thank the reviewer for this observation. The peptides GIMEVRS(60)PPLHSGG and GVMEMRS(59)PLFSGG correspond to the same phosphorylation site in the L and S versions of Xenopus laevis Nup35, respectively. The L version peptide was classified as Class III, while the S version was not assigned to any class due to its high phosphorylation level in prophase, which prevented it from meeting the log<sub>2</sub> fold-change threshold of 1 required by our analysis to detect significant differences.

      (19) Table 1 -> second last column is headed "Whur, 2014"

      The typo has been corrected.

      (20) Fig. 8 -> Why are all the traces starting at t=1h after PG?

      The labeling of the graphs in Fig. 8 has been corrected, and the traces now begin at t0.

      (21) Lane 754 -> Although a minority, there are also some minus-end directed kinesins, e.g. Kifc1

      We agree with the reviewer. We should have mentioned that, in addition to dyneins, some kinesins are minus-end directed motors, especially since one of them, Kifc1, is regulated at the level of its accumulation. We have rephrased the relevant sentences to incorporate this observation (lines 790-793).

      (22) Section "Assembly of microtubule spindles and microtubule dynamics" -> Although this section clearly has a strong focus on phosphorylation, it might be worth mentioning again that many regulators of the microtubule spindle, e.g. TXP2, are among the upregulated proteins in Fig. 2B/C

      We have already discussed that the protein levels of certain key regulators of the mitotic spindle (Tpx2, PRC1, SSX2IP, Kif11/Eg5 among others) are subject to control during meiotic maturation in a previous chapter “Protein accumulation: the machinery of cell division and DNA replication” (lines 230-239). We agree with the reviewer that this important observation can be mentioned again at the beginning of this chapter on phosphorylation control. We have added a sentence regarding this at the start of the paragraph (lines 774-775).

      Reviewer #2 (Recommendations for the authors):

      While I find the manuscript excellent and detailed already in its current form, I would appreciate including even more comparisons to other systems. In particular, a similar phosphoproteomics experiment has been performed in starfish oocytes undergoing meiosis (Swartz et al, eLife, 2021), and there are several studies on mitosis of diverse mammalian cells. It would be very exciting to see to what extent changes are conserved.

      We thank the reviewer for this recommendation, which we have attempted to follow. We have matched our dataset of mass spectrometry using the the phosphor-occupancy_matlab package, available as part of our code repository (https://github.com/elizabeth-van-itallie) previously described in (Van Itallie et al, 2025). Unfortunately, we were unable to match our dataset with the data from Swartz et al. (2021) on starfish oocyte due to the low sequence conservation. However, we have compared our dataset with the dataset from Sun et al. (2024) on mouse oocyte maturation. We identified a total of 408 conserved phosphorylation sites, which mapped to 320 proteins in Xenopus and 277 in mice (refer to a new paragraph: lines 824-860, new Figure 9, Methods: lines 1011-1032 and 1060-1065, and Appendix 7). The phosphorylation patterns during meiosis showed a significant crossspecies correlation (Pearson r = 0.39, p < 0.0001; see new Figure 9A), demonstrating the evolutionary conservation of phosphoproteomic regulation. Important phosphorylation events, including Plk1 at T201, Gwl at S467, and Erk2 at T188, were upregulated in both species, in line with the activation of the Cdk1 and MAPK signaling cascades (Figure 6B, new Figure 9A-B). We validated several of these phosphorylation sites by western blotting and demonstrated their dependency on Cdk1 activation (new Figure 9C). Together, these findings reinforce the notion that fundamental phospho-regulatory pathways are conserved during oocyte maturation in vertebrates.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 6, the first paragraph of Results section: Please describe the method on how the authors measured and quantified the proteomes in different stages of Xenopus oocyte maturation briefly. Without the experimental design, it is very hard to evaluate the results in the following paragraphs.

      As requested by the reviewer, we added a few sentences describing the method of proteomics and phosphoproteomics measurements in oocytes resuming meiosis (lines 151-158).

      (2) In the phospho-proteome, it is better to classify the amino acids for the phosphorylation such as Ser, Thr, and Tyr. Particularly how many tyrosine phosphorylations are in the list.

      Our phosphosites dataset contains 80% Ser, 19.9% Thr, and 0.01% Tyr. Phospho-Tyr are slightly less abundant than what has been described in the literature (in most cells “roughly 85-90% of protein phosphorylation happens on Ser, ~10% on Thr, and less than 0.05% on Tyr" after Sharma et al., 2014. The same observation was made regarding the distribution of phosphorylated amino acids in mouse oocytes, where phospho-Tyr abundance is relatively diminished in oocytes compared to mouse organs (Sun et al., 2024). These observations are now reported in the manuscript (lines 309-313).

      (3) In class II (Figure 3), when Cdk1 (line 326) is a major kinase, how many phosphorylation sites are a target of Cdk1 (with the Cdk1-motif)? Moreover, do the authors find any other consensus sequences for the phosphorylation? Those are either known or unknown. This information would be useful for the readers.

      We thank the reviewer for this valuable comment. To address it, we used the kinase prediction server (https://kinase-library.phosphosite.org/kinase-library/score-site) to analyze Class II phosphosites. These new results are mentioned in lines 340-349 and illustrated in a new Figure (Figure 3—figure supplement 1A). We identified 303 sites predicted to be phosphorylated by Cdk1. Of these, 166 were also predicted as Erk1/2 targets, reflecting the similarity between Cdk1 and Erk1/2 consensus motifs.

      Cdk1 substrate phosphorylation is governed by more than just the presence of a consensus sequence. In addition to its preference for the (S/T)P×(K/R) motif, Cdk1/cyclin complexes achieve specificity through docking interactions with short linear motifs (SLiMs) recognized by the cyclin subunit (as LxF motifs)(Loog & Morgan, 2005), and via the Cdk-binding subunits Cks1 or Cks2, which interact with phosphorylated threonine residues in primed substrates (Örd et al, 2019). These mechanisms promote processive multisite phosphorylation and allow Cdk1 to target substrates even at non-canonical sites. Our motif-based analysis captures only part of this complexity and may underestimate the number of true Cdk1 targets.

      To further explore kinase involvement across phosphosite classes, we extended the analysis to all clusters and identified the most enriched kinase predictions for each (lines 360-365, new Figure 3— figure supplement 1B). In Class II, the most enriched kinases included Cdk1, Erk2, and Plk1, supporting the conclusions derived from the identification of the phosphosites of this Class. But others such as Cdk2, Cdk3, Cdk5, Cdk16, KIS, JNK1, and JNK3 were also identified.

      (4) Figure 3B: Why do the authors show this kind of Table only for Class I, not Classes II-V? It would be informative to show candidate proteins in other classes.

      We chose to present the candidate proteins from Class I in a table format because the number of phosphosites (136) was too small to allow a meaningful Gene Ontology (GO) enrichment analysis. Therefore, we manually curated the data and highlighted proteins whose Class I phosphosites are associated with specific biological processes. For Classes II–V, the higher number of phosphosites allowed us to perform GO enrichment analyses. Since several of the enriched processes were shared across different classes, and some proteins have phosphosites in multiple classes, we opted to organize the results by biological processes rather than by class. We agree with the reviewer that it is indeed valuable to highlight interesting proteins with Class II–V phosphosites. We have done so in Figures 4 through 8, using graphical representations instead of tables, in order to make the data more accessible and avoid long tables. Additionally, the Supplementary Figures provide detailed phosphorylation trends for many of the proteins discussed in the main figures.

      (5) It would be nice if the authors compare this phospho-proteome in Xenopus oocyte maturation with that in mouse oocyte maturation (Sun et al. 2024) in terms of evolutional conservation of the phospho-proteomes.

      We thank the reviewer for this suggestion. As now detailed in the manuscript, we compared our Xenopus phosphoproteome with the dataset from Sun et al. (2024) on mouse oocyte maturation using the the phospho_occupancy_matlab package, available as part of our code repository (https://github.com/elizabeth-van-itallie) previously described in (Van Itallie et al, 2025). We identified 408 conserved phosphorylation sites corresponding to 320 Xenopus and 277 mouse proteins (see new paragraph: lines 824-860, new Figure 9, Methods: lines 1011-1032 and 1060-1065, and Appendix 7). Phosphorylation dynamics across meiosis were significantly correlated between the species (Pearson r = 0.39, p < 0.0001; new Figure 9A), highlighting evolutionary conservation of the phosphoproteomes. Key phosphorylation events such as Plk1 at T201, Gwl at S467, and Erk2 at T188 increased in both species, consistent with activation of the Cdk1 and MAPK pathways (Figure 6B, new Figure 9A–B). We validated experimentally several of these phosphorylation sites by western blot (Erk2, Plk1, Fak1 and Akts1) and demonstrated their dependency on Cdk1 activation (new Figure 9C). Together, these new findings support the conservation of key phospho-regulatory mechanisms across vertebrate oocyte maturation.

      Minor points:

      (1) Reference lists: Please add Sun et al (2024) shown in line 115.

      This important reference has been added (lines 115, 134, 313 and 826).

      (2) Figure 1, red arrows for the inhibition: This should be "T" shape for a better understanding of these complicated pathways.

      We agree with the reviewer’s remark, and we have modified Figure 1.

      (3) Line 236-238: The authors referred to the absence of Cdc6 in oocyte maturation in Xenopus. However, Figure 2C shows that Cdc6 belongs to a list of accumulating proteins with Orc1 and Ocr2 etc. and the authors did not discuss this discrepancy in the text. Please clarity the claim.

      We apologize for the unclear wording in our text. The section of the manuscript regarding the pre-RC components may have been misleading. The text has been revised to clarify that Cdc6 was not detected in prophase-arrested oocytes by western blot and that it accumulates during meiotic maturation after MI, enabling oocytes to replicate DNA (lines 243-250).

      (4) Line 306: Please add the link to phosphosite.org.

      The link has been added (line 319).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use the theory of planned behavior to understand whether or not intentions to use sex as a biological variable (SABV), as well as attitude (value), subjective norm (social pressure), and behavioral control (ability to conduct behavior), across scientists at a pharmacological conference. They also used an intervention (workshop) to determine the value of this workshop in changing perceptions and misconceptions. Attempts to understand the knowledge gaps were made.

      Strengths:

      The use of SABV is limited in terms of researchers using sex in the analysis as a variable of interest in the models (and not a variable to control). To understand how we can improve on the number of researchers examining the data with sex in the analyses, it is vital we understand the pressure points that researchers consider in their work. The authors identify likely culprits in their analyses. The authors also test an intervention (workshop) to address the main bias or impediments for researchers' use of sex in their analyses.

      Weaknesses:

      There are a number of assumptions the authors make that could be revisited:

      (1) that all studies should contain across sex analyses or investigations. It is important to acknowledge that part of the impetus for SABV is to gain more scientific knowledge on females. This will require within sex analyses and dedicated research to uncover how unique characteristics for females can influence physiology and health outcomes. This will only be achieved with the use of female-only studies. The overemphasis on investigations of sex influences limits the work done for women's health, for example, as within-sex analyses are equally important.

      The Sex and Gender Equity in Research (SAGER) guidelines (1) provide guidance that “Where the subjects of research comprise organisms capable of differentiation by sex, the research should be designed and conducted in a way that can reveal sex-related differences in the results, even if these were not initially expected.”. This is a default position of inclusion where the sex can be determined and analysis assessing for sex related variability in response. This position underpins many of the funding bodies new policies on inclusion.

      However, we need to place this in the context of the driver of inclusion. The most common reason for including male and female samples is for those studies that are exploring the effect of a treatment and then the goal of inclusion is to assess the generalisability of the treatment effect (exploratory sex inclusion)(2). The second scenario is where sex is included because sex is one of the variables of interest and this situation will arise because there is a hypothesized sex difference of interest (confirmatory sex inclusion).

      We would argue that the SABV concept was introduced to address the systematic bias of only studying one sex when assessing treatment effect to improve the generalisability of the research. Therefore, it isn’t directly to gain more scientific knowledge on females. However, this strategy will highlight when the effect is very different between male and female subjects which will potentially generate sex specific hypotheses.

      Where research has a hypothesis that is specific to a sex (e.g. it is related to oestrogen levels) it would be appropriate to study only the sex of interest, in this case females. The recently published Sex Inclusive Research Framework gives some guidance here and allows an exemption for such a scenario classifying such proposals “Single sex study justified” (3).

      We plan to add an additional paragraph to the introduction to clarify the objectives behind inclusion and how this assists the research process.

      (2) It should be acknowledged that although the variability within each sex is not different on a number of characteristics (as indicated by meta-analyses in rats and mice), this was not done on all variables, and behavioral variables were not included. In addition, across-sex variability may very well be different, which, in turn, would result in statistical sex significance. In addition, on some measures, there are sex differences in variability, as human males have more variability in grey matter volume than females. PMID: 33044802.

      The manuscript was highlighting the common argument used to exclude the use of females, which is that females are inherently more variable as an absolute truth. We agree there might be situations, where the variance is higher in one sex or another depending on the biology. We will extend the discussion here to reflect this, and we will also link to the Sex Inclusive Research Framework (3) which highlights that in these situations researchers can utlise this argument provided it is supported with data for the biology of interest.

      (3) The authors need to acknowledge that it can be important that the sample size is increased when examining more than one sex. If the sample size is too low for biological research, it will not be possible to determine whether or not a difference exists. Using statistical modelling, researchers have found that depending on the effect size, the sample size does need to increase. It is important to bare this in mind as exploratory analyses with small sample size will be extremely limiting and may also discourage further study in this area (or indeed as seen the literature - an exploratory first study with the use of males and females with limited sample size, only to show there is no "significance" and to justify this as an reason to only use males for the further studies in the work.

      The reviewer raises a common problem: where researchers have frequently argued that if they find no sex differences in a pilot then they can proceed to study only one sex. The SAGER guidelines (1), and now funder guidelines (4, 5), challenge that position. Instead, the expectation is for inclusion as the default in all experiments (exploratory inclusion strategy) to allow generalisable results to be obtained. When the results are very different between the male and female samples, then this can be determined. This perspective shift (2) requires a change in mindset and understanding that the driver behind inclusion is of generalisability not exploration of sex differences. This will be added to the introduction as an additional paragraph exploring the drivers behind inclusion.

      We agree with the reviewer that if the researcher is interested in sex differences in an effect (confirmatory inclusion strategy, aka sex as a primary variable) then the N will need to be higher. However, in this situation, one, of course, must have male and female samples in the same experiment to allow the simultaneous exploration to assess the dependency on sex.

      Reviewer #2 (Public review):

      Summary:

      The investigators tested a workshop intervention to improve knowledge and decrease misconceptions about sex inclusive research. There were important findings that demonstrate the difficulty in changing opinions and knowledge about the importance of studying both males and females. While interventions can improve knowledge and decrease perceived barriers, the impact was small.

      Strengths:

      The investigators included control groups and replicated the study in a second population of scientists. The results appear to be well substantiated. These are valuable findings that have practical implications for fields where sex is included as a biological variable to improve rigor and reproducibility.

      Thank you for assessment and highlighting these strengths. We appreciate your recognition of the value and practical implications of this work.

      Weaknesses:

      I found the figures difficult to understand and would have appreciated more explanation of what is depicted, as well as greater space between the bars representing different categories.

      We plan to review the figures and figure legends to improve clarity of the data.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to determine cultural biases and misconceptions in inclusive sex research and evaluate the efficacy of interventions to improve knowledge and shift perceptions to decrease perceived barriers for including both sexes in basic research.

      Overall, this study demonstrates that despite the intention to include both sexes and a general belief in the importance of doing so, relatively few people routinely include both sexes. Further, the perceptions of barriers to doing so are high, including misconceptions surrounding sample size, disaggregation, and variability of females. There was also a substantial number of individuals without the statistical knowledge to appropriately analyze data in studies inclusive of sex. Interventions increased knowledge and decreased perception of barriers. Strengths:

      (1) This manuscript provides evidence for the efficacy of interventions for changing attitudes and perceptions of research.

      (2) This manuscript also provides a training manual for expanding this intervention to broader groups of researchers.

      Thank you for highlighting these strengths. We appreciate your recognition that the intervention was effect in changing attitudes and perception. We deliberately chose to share the material to provide the resources to allow a wider engagement.

      Weaknesses:

      The major weakness here is that the post-workshop assessment is a single time point, soon after the intervention. As this paper shows, intention for these individuals is already high, so does decreasing perception of barriers and increasing knowledge change behavior, and increase the number of studies that include both sexes? Similarly, does the intervention start to shift cultural factors? Do these contribute to a change in behavior?

      Measuring change in behaviour following an intervention is challenging and hence we had implemented an intention score as a proxy for behaviour. We appreciate the benefit of a long-term analysis, but it was beyond the scope of this study and would need a larger dataset size to allow for attrition. We agree that the strategy implemented has weaknesses. We plan to extend the limitation section in the discussion to include these.

      References

      (1) Heidari S, Babor TF, De Castro P, Tort S, Curno M. Sex and Gender Equity in Research: rationale for the SAGER guidelines and recommended use. Res Integr Peer Rev. 2016;1:2.

      (2) Karp NA. Navigating the paradigm shift of sex inclusive preclinical research and lessons learnt. Commun Biol. 2025;8(1):681.

      (3) Karp NA, Berdoy M, Gray K, Hunt L, Jennings M, Kerton A, et al. The Sex Inclusive Research Framework to address sex bias in preclinical research proposals. Nat Commun. 2025;16(1):3763.

      (4) MRC. Sex in experimental design - Guidance on new requirements https://www.ukri.org/councils/mrc/guidance-for-applicants/policies-and-guidance-for-researchers/sex-in-experimental-design/: UK Research and Innovation; 2022

      (5) Clayton JA, Collins FS. Policy: NIH to balance sex in cell and animal studies. Nature. 2014;509(7500):282-3.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling study identifying RBMX2 as a novel host factor upregulated during Mycobacterium bovis infection.

      The study demonstrates that RBMX2 plays a role in:

      (1) Facilitating M. bovis adhesion, invasion, and survival in epithelial cells.

      (2) Disrupting tight junctions and promoting EMT.

      (3) Contributing to inflammatory responses and possibly predisposing infected tissue to lung cancer development.

      By using a combination of CRISPR-Cas9 library screening, multi-omics, coculture models, and bioinformatics, the authors establish a detailed mechanistic link between M. bovis infection and cancer-related EMT through the p65/MMP-9 signaling axis. Identification of RBMX2 as a bridge between TB infection and EMT is novel.

      Strengths:

      This topic and data are both novel and significant, expanding the understanding of transcriptomic diversity beyond RBM2 in M. bovis responsive functions.

      Weaknesses:

      (1) The abstract and introduction sometimes suggest RBMX2 has protective anti-TB functions, yet results show it facilitates pathogen adhesion and survival. The authors need to rephrase claims to avoid contradiction.

      We sincerely appreciate the reviewer's valuable feedback regarding the need to clarify RBMX2's role throughout the manuscript. We have carefully revised the text to ensure consistent messaging about RBMX2's function in promoting M. bovis infection. Below we detail the specific modifications made:

      (1) Introduction Revisions:

      Changed "The objective of this study was to elucidate the correlation between host genes and the susceptibility of M.bovis infection" to "The objective of this study was to identify host factors that promote susceptibility to M.bovis infection"

      Revised "RBMX2 polyclonal and monoclonal cell lines exhibited favorable phenotypes" to "RBMX2 knockout cell lines showed reduced bacterial survival"

      Replaced "The immune regulatory mechanism of RBMX2" with "The role of RBMX2 in facilitating M.bovis immune evasion"

      (2) Results Revisions:

      Modified "RBMX2 fails to affect cell morphology and the ability to proliferate and promotes M.bovis infection" to "RBMX2 does not alter cell viability but significantly enhances M.bovis infection"

      Strengthened conclusion in Figure 4: "RBMX2 actively disrupts tight junctions to facilitate bacterial invasion"

      (3) Discussion Revisions:

      Revised screening description: "We screened host factors affecting M.bovis susceptibility and identified RBMX2 as a key promoter of infection"

      Strengthened concluding statement: "In summary, RBMX2 drives TB pathogenesis by compromising epithelial barriers and inducing EMT"

      These targeted revisions ensure that:

      All sections consistently present RBMX2 as promoting infection; the language aligns with our experimental finding; potential protective interpretations have been eliminated. We believe these modifications have successfully addressed the reviewer's concern while maintaining the manuscript's original structure and scientific content. We appreciate the opportunity to improve our manuscript and thank the reviewer for this constructive suggestion.

      (2) >While p65/MMP-9 is convincingly implicated, the role of MAPK/p38 and JNK is less clearly resolved.

      We sincerely appreciate the reviewer's insightful comment regarding the roles of MAPK/p38 and JNK in our study. Our experimental data clearly demonstrated that RBMX2 knockout significantly reduced phosphorylation levels of p65, p38, and JNK (Fig. 5A), indicating potential involvement of all three pathways in RBMX2-mediated regulation.

      Through systematic functional validation, we obtained several important findings:

      In pathway inhibition experiments, p65 activation (PMA treatment) showed the most dramatic effects on both tight junction disruption (ZO-1, OCLN reduction) and EMT marker regulation (E-cadherin downregulation, N-cadherin upregulation);

      p38 activation (ML141 treatment) exhibited moderate effects on these processes;

      JNK activation (Anisomycin treatment) displayed minimal impact.

      Most conclusively, siRNA-mediated silencing of p65 alone was sufficient to:

      Restore epithelial barrier function

      Reverse EMT marker expression

      Reduce bacterial adhesion and invasion

      These results establish a clear hierarchy in pathway importance: p65 serves as the primary mediator of RBMX2's effects, while p38 plays a secondary role and JNK appears non-essential under our experimental conditions. We have now clarified this relationship in the revised Discussion section to strengthen this conclusion.

      This refined understanding of pathway hierarchy provides important mechanistic insights while maintaining consistency with all our experimental data. We thank the reviewer for this valuable suggestion that helped improve our manuscript.

      (3) Metabolomics results are interesting but not integrated deeply into the main EMT narrative.

      Thank you for this constructive suggestion. In this article, we detected the metabolome of RBMX2 knockout and wild-type cells after Mycobacterium bovis infection, which mainly served as supporting evidence for our EMT model. However, we did not conduct an in-depth discussion of these findings. We have now added a detailed discussion of this section to further support our EMT model.

      ADD:Meanwhile, metabolic pathways enriched after RBMX2 deletion, such as nucleotide metabolism, nucleotide sugar synthesis, and pentose interconversion, primarily support cell proliferation and migration during EMT by providing energy precursors, regulating glycosylation modifications, and maintaining redox balance; cofactor synthesis and amino sugar metabolism participate in EMT regulation through influencing metabolic remodeling and extracellular matrix interactions; chemokine and cGMP-PKG signaling pathways may further mediate inflammatory responses and cytoskeletal rearrangements, collectively promoting the EMT process.

      (4) A key finding and starting point of this study is the upregulation of RBMX2 upon M. bovis infection. However, the authors have only assessed RBMX2 expression at the mRNA level following infection with M. bovis and BCG. To strengthen this conclusion, it is essential to validate RBMX2 expression at the protein level through techniques such as Western blotting or immunofluorescence. This would significantly enhance the credibility and impact of the study's foundational observation.

      Thank you for your comment. We have supplemented the experiments in this part and found that Mycobacterium bovis infection can significantly enhance the expression level of RBMX2 protein.

      (5) The manuscript would benefit from a more in-depth discussion of the relationship between tuberculosis (TB) and lung cancer. While the study provides experimental evidence suggesting a link via EMT induction, integrating current literature on the epidemiological and mechanistic connections between chronic TB infection and lung tumorigenesis would provide important context and reinforce the translational relevance of the findings.

      We sincerely appreciate the valuable comments from the reviewer. We fully agree with your suggestion to further explore the relationship between tuberculosis (TB) and lung cancer. In the revised manuscript, we will add a new paragraph in the Discussion section to systematically integrate the current literature on the epidemiological and mechanistic links between chronic tuberculosis infection and lung cancer development, including the potential bridging roles of chronic inflammation, tissue damage repair, immune microenvironment remodeling, and the epithelial-mesenchymal transition (EMT) pathway. This addition will help more comprehensively interpret the clinical implications of the observed EMT activation in the context of our study, thereby enhancing the biological plausibility and clinical translational value of our findings.

      ADD:There is growing epidemiological evidence suggesting that chronic TB infection represents a potential risk factor for the development of lung cancer. Studies have shown that individuals with a history of TB exhibit a significantly increased risk of lung cancer, particularly in areas of the lung with pre-existing fibrotic scars, indicating that chronic inflammation, tissue repair, and immune microenvironment remodeling may collectively contribute to malignant transformation 74. Moreover, EMT not only endows epithelial cells with mesenchymal features that enhance migratory and invasive capacity but is also associated with the acquisition of cancer stem cell-like properties and therapeutic resistance 75. Therefore, EMT may serve as a crucial molecular link connecting chronic TB infection with the malignant transformation of lung epithelial cells, warranting further investigation in the intersection of infection and tumorigenesis.

      Reviewer #2 (Public review):

      Summary:

      I am not familiar with cancer biology, so my review mainly focuses on the infection part of the manuscript. Wang et al identified an RNA-binding protein RBMX2 that links the Mycobacterium bovis infection to the epithelial-Mesenchymal transition and lung cancer progression. Upon mycobacterium infection, the expression of RBMX2 was moderately increased in multiple bovine and human cell lines, as well as bovine lung and liver tissues. Using global approaches, including RNA-seq and proteomics, the authors identified differential gene expression caused by the RBMX2 knockout during M. bovis infection. Knockout of RBMX2 led to significant upregulations of tight-junction related genes such as CLDN-5, OCLN, ZO-1, whereas M. bovis infection affects the integrity of epithelial cell tight junctions and inflammatory responses. This study establishes that RBMX2 is an important host factor that modulates the infection process of M. bovis.

      Strengths:

      (1) This study tested multiple types of bovine and human cells, including macrophages, epithelial cells, and clinical tissues at multiple timepoints, and firmly confirmed the induced expression of RBMX2 upon M. bovis infection.

      (2) The authors have generated the monoclonal RBMX2 knockout cell lines and comprehensively characterized the RBMX2-dependent gene expression changes using a combination of global omics approaches. The study has validated the impact of RBMX2 knockout on the tight-junction pathway and on the M. bovis infection, establishing RBMX2 as a crucial host factor.

      Weaknesses:

      (1) The RBMX2 was only moderately induced (less than 2-fold) upon M. bovis infection, arguing its contribution may be small. Its value as a therapeutic target is not justified. How RBMX2 was activated by M. bovis infection was unclear.

      Thank you for your valuable and constructive comments. In this study, we primarily utilized the CRISPR whole-genome screening approach to identify key factors involved in bovine tuberculosis infection. Through four rounds of screening using a whole-genome knockout cell line of bovine lung epithelial cells infected with Mycobacterium bovis, we identified RBMX2 as a critical factor.

      Although the transcriptional level change of RBMX2 was less than two-fold, following the suggestion of Reviewer 1, we examined its expression at the protein level, where the change was more pronounced, and we have added these results to the manuscript.

      Regarding the mechanism by which RBMX2 is activated upon M. bovis infection, we previously screened for interacting proteins using a Mycobacterium tuberculosis secreted and membrane protein library, but unfortunately, we did not identify any direct interacting proteins from M. tuberculosis (https://doi.org/10.1093/nar/gkx1173).

      (2) Although multiple time points have been included in the study, most analyses lack temporal resolution. It is difficult to appreciate the impact/consequence of M. bovis infection on the analyzed pathways and processes.

      We appreciate the valuable comments from the reviewers. Although our study included multiple time points post-infection, in our experimental design we focused on different biological processes and phenotypes at distinct time points:

      During the early phase (e.g., 2 hours post-infection), we focused on barrier phenotypes; during the intermediate phase (e.g., 24 hours post-infection), we concentrated more on pathway activation and EMT phenotypes;

      And during the later phase (e.g., 48–72 hours post-infection), we focused more on cell death phenotypes, which were validated in another FII article (https://doi.org/10.3389/fimmu.2024.1431207).

      We also examined the impact of varying infection durations on RBMX2 knockout EBL cellular lines via GO analysis. At 0 hpi, genes were primarily related to the pathways of cell junctions, extracellular regions, and cell junction organization. At 24 hpi, genes were mainly associated with pathways of the basement membrane, cell adhesion, integrin binding and cell migration By 48 hpi, genes were annotated into epithelial cell differentiation and were negatively regulated during epithelial cell proliferation. This indicated that RBMX2 can regulate cellular connectivity throughout the stages of M. bovis infection.

      For KEGG analysis, genes linked to the MAPK signaling pathway, chemical carcinogen-DNA adducts, and chemical carcinogen-receptor activation were observed at 0 hpi. At 24 hpi, significant enrichment was found in the ECM-receptor interaction, PI3K-Akt signaling pathway, and focal adhesion. Upon enrichment analysis at 48 hpi, significant enrichment was noted in the TGF-beta signaling pathway, transcriptional misregulation in cancer, microRNAs in cancer, small cell lung cancer, and p53 signaling pathway.

      Reviewer #3 (Public review):

      Summary:

      This study investigates the role of the host protein RBMX2 in regulating the response to Mycobacterium bovis infection and its connection to epithelial-mesenchymal transition (EMT), a key pathway in cancer progression. Using bovine and human cell models, the authors have wisely shown that RBMX2 expression is upregulated following M. bovis infection and promotes bacterial adhesion, invasion, and survival by disrupting epithelial tight junctions via the p65/MMP-9 signaling pathway. They also demonstrate that RBMX2 facilitates EMT and is overexpressed in human lung cancers, suggesting a potential link between chronic infection and tumor progression. The study highlights RBMX2 as a novel host factor that could serve as a therapeutic target for both TB pathogenesis and infection-related cancer risk.

      Strengths:

      The major strengths lie in its multi-omics integration (transcriptomics, proteomics, metabolomics) to map RBMX2's impact on host pathways, combined with rigorous functional assays (knockout/knockdown, adhesion/invasion, barrier tests) that establish causality through the p65/MMP-9 axis. Validation across bovine and human cell models and in clinical tissue samples enhances translational relevance. Finally, identifying RBMX2 as a novel regulator linking mycobacterial infection to EMT and cancer progression opens exciting therapeutic avenues.

      Weaknesses:

      Although it's a solid study, there are a few weaknesses noted below.

      (1) In the transcriptomics analysis, the authors performed (GO/KEGG) to explore biological functions. Did they perform the search locally or globally? If the search was performed with a global reference, then I would recommend doing a local search. That would give more relevant results. What is the logic behind highlighting some of the enriched pathways (in red), and how are they relevant to the current study?

      We appreciate the reviewer's thoughtful questions regarding our transcriptomic analysis. In this study, we employed a localized enrichment approach focusing specifically on gene expression profiles from our bovine lung epithelial cell system. This cell-type-specific analysis provides more biologically relevant results than global database searches alone.

      Regarding the highlighted pathways, these represent:

      (1) Temporally significant pathways showing strongest enrichment at each stage:

      • 0h: Cell junction organization (immediate barrier response)

      • 24h: ECM-receptor interaction (early EMT initiation)

      • 48h: TGF-β signaling (chronic remodeling)

      (2) Mechanistically linked to our core findings about RBMX2's role in:

      • Epithelial barrier disruption

      • Mesenchymal transition

      • Chronic infection outcomes

      We selected these particular pathways because they:

      (1) Showed the most statistically significant changes (FDR <0.001)

      (2) Formed a coherent biological narrative across infection stages

      (3) Were independently validated in our functional assays

      This targeted approach allows us to focus on the most infection-relevant pathways while maintaining statistical rigor.

      (2) While the authors show that RBMX2 expression correlates with EMT-related gene expression and barrier dysfunction, the evidence for direct association remains limited in this study. How does RBMX2 activate p65? Does it bind directly to p65 or modulate any upstream kinases? Could ChIP-seq or CLIP-seq provide further evidence for direct RNA or DNA targets of RBMX2 that drive EMT or NF-κB signaling?

      We sincerely appreciate the reviewer's in-depth questions regarding the mechanisms by which RBMX2 activates p65 and its association with EMT. Although the molecular mechanism remains to be fully elucidated, our study has provided experimental evidence supporting a direct regulatory relationship between RBMX2 and the p65 subunit of the NF-κB pathway. Specifically, we investigated whether the transcription factor p65 could directly bind to the promoter region of RBMX2 using CHIP experiments. The results demonstrated that the transcription factor p65 can physically bind to the RBMX2 region.

      Furthermore, dual-luciferase reporter assays were conducted, showing that p65 significantly enhances the transcriptional activity of the RBMX2 promoter, indicating a direct regulatory effect of RBMX2 on p65 expression.

      These findings support our hypothesis that RBMX2 activates the NF-κB signaling pathway through direct interaction with the p65 protein, thereby participating in the regulation of EMT progression and barrier function.

      In our subsequent work papers, we will also employ experiments such as CLIP to further investigate the specific mechanisms through which RBMX2 exerts its regulatory functions.

      (3) The manuscript suggests that RBMX2 enhances adhesion/invasion of several bacterial species (e.g., E. coli, Salmonella), not just M. bovis. This raises questions about the specificity of RBMX2's role in Mycobacterium-specific pathogenesis. Is RBMX2 a general epithelial barrier regulator or does it exhibit preferential effects in mycobacterial infection contexts? How does this generality affect its potential as a TB-specific therapeutic target?

      Thank you for your valuable comments. When we initially designed this experiment, we were interested in whether the RBMX2 knockout cell line could confer effective resistance not only against Mycobacterium bovis but also against Gram-negative and Gram-positive bacteria. Surprisingly, we indeed observed resistance to the invasion of these pathogens, albeit weaker compared to that against Mycobacterium bovis.

      Nevertheless, we believe these findings merit publication in eLife. Moreover, RBMX2 knockout does not affect the phenotype of epithelial barrier disruption under normal conditions; its significant regulatory effect on barrier function is only evident upon infection with Mycobacterium bovis.

      Importantly, during our genome-wide knockout library screening, RBMX2 was not identified in the screening models for Salmonella or Escherichia coli, but was consistently detected across multiple rounds of screening in the Mycobacterium bovis model.

      (4) The quality of the figures is very poor. High-resolution images should be provided.

      Thank you for your feedback; we provided higher-resolution images.

      (5) The methods are not very descriptive, particularly the omics section.

      Thank you for your comments; we have revised the description of the sequencing section.

      (6) The manuscript is too dense, with extensive multi-omics data (transcriptomics, proteomics, metabolomics) but relatively little mechanistic integration. The authors should have focused on the key mechanistic pathways in the figures. Improving the narratives in the Results and Discussion section could help readers follow the logic of the experimental design and conclusions.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      Thank you for your feedback. We understand your concern regarding the central claim that tolerance facilitates the evolution of resistance, while the paper can stand on its own without this claim, we think it provides an important layer to the interpretation of our findings. Considering your comments, we plan to revise the title and adjust to “Heat Stress Induces Phage Tolerance in Bacteria”.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.)

      Thank you for your helpful comment. Figure 2d presents colony counts from a plating assay following the phage killing experiment in Figure 2c. Bacteria collected after 0 and 2 hours of phage exposure were plated on both phage-free (−phage) and phage-containing (+phage) plates. The “−phage” condition reflects total survivors, while the “+phage” condition indicates the resistant subset.

      As seen in Figure 2d (left part), heat-treated bacteria showed markedly higher survival on phage-free plates than untreated cells, which were largely eliminated by phage. However, resistant colony counts on phage-containing plates were similar between two groups (as shown in figure 2d right part), suggesting that heat stress increased survival but did not promote resistance.

      To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates. This adjustment should eliminate any confusion and better reflect the experimental design.

      Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper and make it about heat tolerance rather than the evolution of heat resistance.

      Thank you for your thoughtful comment. We appreciate the opportunity to clarify the interpretation of Figure 2f and the rationale behind the experimental design. We agree that turbidity alone cannot fully distinguish resistance from persistence. However, our earlier experiments (Figures 2d and 2e) demonstrated that heat-treated survivors remained largely susceptible to phage, indicating that heat stress does not directly induce resistance. This led us to hypothesize that heat enhances phage tolerance, which in turn increases the likelihood of resistance emergence during subsequent infection.

      To test this, we used a low initial bacterial population (~10³ CFU per well) to minimize the chance of pre-existing resistance. Bacteria were exposed to phages at MOIs of 1, 10, and 100 and incubated for 24 hours in 100 µL volumes. This setup ensured:

      (1) The low initial population minimizes the presence of pre-existing resistant mutants, ensuring that any phage-resistant bacteria observed arise during the infection process.

      (2) The high MOI (≥ 1) ensures that each bacterial cell has a high probability of infection by at least one phage.

      (3) The small volume (100 µL per well) maximizes the interaction between bacteria and phages, ensuring rapid infection of susceptible bacteria, which leads to clear wells. If resistant mutants arise, they will grow and cause turbidity.

      Thus, the turbidity observed in heat-treated samples reflects de novo emergence and outgrowth of resistant mutants from a tolerant population. This assay supports the idea that heat-induced tolerance increases the probability of resistance evolution, rather than directly causing resistance.

      We have revised the text to better explain this experimental logic and adjust the framing of our conclusions accordingly.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      Thank you for your valuable comment. To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

      Thank you for your thoughtful comment. In our heat treatment evolutionary experiment, we isolated six distinct bacterial clones, of which two are highlighted in the manuscript as representative examples. One clone, BC2G11C1, acquired both heat tolerance and phage resistance, while another clone, BC3G11C2, became heat-tolerant but did not develop resistance to phage infection. This variation highlights the inherent diversity in evolutionary responses when exposed to selective pressures. It demonstrates that not all evolutionary pathways lead to the same outcome, even under similar stress conditions. This variability is a key observation in our study, illustrating that different genetic adaptations may arise depending on the specific mutations or genetic context, and not every strain will evolve phage resistance in parallel with heat tolerance. We have updated the manuscript to better reflect this diversity in the evolutionary trajectories observed.

      Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

      Thank you for your insightful comments. I appreciate your suggestion regarding the inclusion of relevant previous works. I have now incorporated additional citations to discuss these points, including studies on the relationship between heat stress and antibiotic resistance, as well as the tradeoffs between phage resistance and other stress factors.

      Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      Thank you for your insightful comments and for bringing up the role of PspA in quorum sensing and biofilm formation. However, we would like to clarify a potential misunderstanding: the PspA discussed in our manuscript refers to phage-shock protein A, a key regulator in the bacterial envelope stress response system. This is distinct from the pneumococcal surface protein A, which has been associated with quorum sensing and biofilm formation in Streptococcus pneumoniae (as referenced in your comment).

      To avoid any confusion for readers, we will ensure that our manuscript explicitly states “phage-shock protein A (PspA)” at its first mention. We appreciate your feedback and hope this clarification addresses your concern.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

      Thank you for your helpful suggestion. We have now included a figure, as appropriate summarizing the proteins of the lytic phage Kp11 (GenBank: ON148528.1) in supplementary Figure S1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Issues unrelated to those discussed in the public review

      (1) Figure 4a and its caption describe an evolution experiment, but they do not mention how many cycles of high-temperature treatment and growth this experiment lasted. I assume it lasted for more than one cycle, because the methods section mentions "cycles", but the number is not provided.

      Thank you for pointing this out. The evolutionary experiment shown in Figure 5a involved 11 cycles of high-temperature treatment and growth. We have now explicitly stated this in the figure legend to ensure clarity: BC: Batch culture, G: Evolution cycle number, C: Colony. BC2G11C1 refers to the first colony from batvh culture 2 after 11 rounds of heat treatment.

      (2) It is not clear what Figure 5F is supposed to show. What are the gray boxes? The caption claims that the figure shows non-synonymous mutations, but the only information it contains is about genes that seem to be affected by mutation. Judging from the mismatch between the main text and the figure, the mutants with these mutations may actually be mislabeled.

      Thank you for your careful review. Figure 5f highlights the non-synonymous mutations identified in the evolved strains. The gray boxes represent the ancestral strain’s whole genome without mutations, serving as a control. The corresponding labels indicate the specific mutations found in each evolved strain. We have clarified this in the figure caption to improve clarity. Additionally, we have carefully reviewed the labeling to ensure accuracy and consistency between the figure, main text, and sequencing data.

      (3) I think that the acronym NC, which is used in just about every figure, is explained nowhere in the paper. Spell out all acronyms at first use.

      Thank you for pointing this out. We have rivewed ensure that NC is clearly defined at its first mention in the text and figure legends to improve clarity. Additionally, we have reviewed the manuscript to ensure that all acronyms are properly introduced when first used.

      (4) The same holds for the acronym N.D. This is an especially important oversight because N.D. could mean "not determined" or "not detectable", which would lead to very different interpretations of the same figure.

      Thank you for your careful review. We have clarified the meaning of N.D., which stands for non-detectable, at its first use to avoid ambiguity and ensure accurate interpretation in the figure legend. Additionally, we have reviewed the manuscript to ensure that all acronyms are clearly defined.

      (5) The panel labels (a,b, etc.) in all figure captions are very difficult to distinguish from the rest of the text, and should be better highlighted, for example by using a bold font. However, this is a matter of journal style and will probably be fixed during typesetting.

      Thank you for your suggestion. We have adjusted the figure captions to better distinguish panel labels, such as using bold font, to improve readability and final formatting will follow the journal’s style during typesetting.

      (6) Line 224: enhanced insusceptibility -> reduced susceptibility.

      Thank you for your suggestion. We have revised “enhanced insusceptibility” to “reduced susceptibility” for clarity and precision.

      (7) Line 259: mice -> mouse.

      Thank you for catching this. We have corrected “mice” to “mouse”.

      Reviewer #2 (Recommendations for the authors):

      I have no concerns about the experimental design and conclusions of your work; however, I strongly recommend incorporating several relevant pieces of the literature related to your work, in the discussion of your manuscript, specifically:

      (1) Previous studies about the role of heat stress in phage infections, see:

      Greenrod STE, Cazares D, Johnson S, Hector TE, Stevens EJ, MacLean RC, King KC. Warming alters life-history traits and competition in a phage community. Appl Environ Microbiol. 2024 May 21;90(5):e0028624. doi: 10.1128/aem.00286-24. Epub 2024 Apr 16. PMID: 38624196; PMCID: PMC11107170.

      Thank you for your thoughtful comment. We have ensured to incorporate the study by Greenrod et al. (2024) into the discussion to enrich the context of our findings. As this article pointed out, a temperature of 42°C can indeed limit phage infection in bacteria, acting as a barrier from the phage’s perspective. Our study builds on this by demonstrating that bacteria pre-treated with high temperatures exhibit tolerance to phage infection. These findings, together with the work you referenced, underscore the importance of heat stress or elevated temperature in host-phage interactions, with 42°C being particularly relevant in the context of fever. We will make sure to clarify this connection in our revised manuscript.

      (2) The effect of heat stress and the tolerance/resistance against other antibiotics besides polymyxins, see:

      Lv B, Huang X, Lijia C, Ma Y, Bian M, Li Z, Duan J, Zhou F, Yang B, Qie X, Song Y, Wood TK, Fu X. Heat shock potentiates aminoglycosides against gram-negative bacteria by enhancing antibiotic uptake, protein aggregation, and ROS. Proc Natl Acad Sci U S A. 2023 Mar 21;120(12):e2217254120. doi: 10.1073/pnas.2217254120. Epub 2023 Mar 14. PMID: 36917671; PMCID: PMC10041086.

      Thank you for bringing this study to our attention. We have incorporated the findings from Lv et al. (2023) into the discussion of our manuscript, highlighting how sublethal temperatures may facilitate the killing of bacteria by antibiotics like kanamycin. This is consistent with our data showing enhanced susceptibility of heat-shocked bacteria to kanamycin. The study also provides insights into the potential role of PMF, which is relevant to our work on PspA, and strengthens the broader context of heat stress influencing both antibiotic resistance and tolerance.

      (3) Perhaps the most relevant overlooked fact was that recently it was demonstrated for E. coli, Klebsiella and Pseudomonas that pretreatment with lytic phages induced antibiotic persistence! Please discuss this finding and its implications for your work, see:

      Fernández-García L, Kirigo J, Huelgas-Méndez D, Benedik MJ, Tomás M, García-Contreras R, Wood TK. Phages produce persisters. Microb Biotechnol. 2024 Aug;17(8):e14543. doi: 10.1111/1751-7915.14543. PMID: 39096350; PMCID: PMC11297538.

      Sanchez-Torres V, Kirigo J, Wood TK. Implications of lytic phage infections inducing persistence. Curr Opin Microbiol. 2024 Jun;79:102482. doi: 10.1016/j.mib.2024.102482. Epub 2024 May 6. PMID: 38714140.

      Thank you for suggesting this important reference. We agree that the phenomenon of phage-induced bacterial persistence is highly relevant to our study. While our manuscript focuses on the role of heat stress in bacterial tolerance and resistance, we acknowledge that bacterial persistence against phages is an established concept. We have incorporated this finding into our discussion, emphasizing how persistence and tolerance can overlap in their effects on bacterial survival, especially under stress conditions like heat treatment. This will provide a more comprehensive understanding of how phage interactions with bacteria can lead to both persistence and resistance.

      (4) Finally, you observed a tradeoff pf the pspA* mutant increased phage/heat/polymyxin resistance and decreased immune evasion (perhaps by being unable to counteract phagocytosis), those tradeoffs between gaining phage resistance but losing resistance to the immune system, virulence impairment and resistance against some antibiotics had been extensively documented, see:

      Majkowska-Skrobek G, Markwitz P, Sosnowska E, Lood C, Lavigne R, Drulis-Kawa Z. The evolutionary trade-offs in phage-resistant Klebsiella pneumoniae entail cross-phage sensitization and loss of multidrug resistance. Environ Microbiol. 2021 Dec;23(12):7723-7740. doi: 10.1111/1462-2920.15476. Epub 2021 Mar 27. PMID: 33754440.

      Gordillo Altamirano F, Forsyth JH, Patwa R, Kostoulias X, Trim M, Subedi D, Archer SK, Morris FC, Oliveira C, Kielty L, Korneev D, O'Bryan MK, Lithgow TJ, Peleg AY, Barr JJ. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat Microbiol. 2021 Feb;6(2):157-161. doi: 10.1038/s41564-020-00830-7. Epub 2021 Jan 11. PMID: 33432151.

      García-Cruz JC, Rebollar-Juarez X, Limones-Martinez A, Santos-Lopez CS, Toya S, Maeda T, Ceapă CD, Blasco L, Tomás M, Díaz-Velásquez CE, Vaca-Paniagua F, Díaz-Guerrero M, Cazares D, Cazares A, Hernández-Durán M, López-Jácome LE, Franco-Cendejas R, Husain FM, Khan A, Arshad M, Morales-Espinosa R, Fernández-Presas AM, Cadet F, Wood TK, García-Contreras R. Resistance against two lytic phage variants attenuates virulence and antibiotic resistance in Pseudomonas aeruginosa. Front Cell Infect Microbiol. 2024 Jan 17;13:1280265. doi: 10.3389/fcimb.2023.1280265. Erratum in: Front Cell Infect Microbiol. 2024 Mar 06;14:1391783. doi: 10.3389/fcimb.2024.1391783. PMID: 38298921; PMCID: PMC10828002.

      Thank you for highlighting these important studies. We have incorporated the work by Majkowska-Skrobek et al. (2021), Gordillo Altamirano et al. (2021), and García-Cruz et al. (2024) into the discussion to provide further context to the evolutionary trade-offs observed in our study. The findings in these studies, which describe the cross-sensitization to antimicrobials and the loss of multidrug resistance in phage-resistant bacteria, align with our observations of trade-offs in the pspA mutant. Specifically, our results show that while the pspA mutant exhibits increased resistance to phage, heat, and polymyxins, it also experiences a decrease in immune evasion and potential virulence. These trade-offs are significant in understanding the broader consequences of developing resistance to phages and other stressors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Structural colors (SC) are based on nanostructures reflecting and scattering light and producing optical wave interference. All kinds of living organisms exhibit SC. However, understanding the molecular mechanisms and genes involved may be complicated due to the complexity of these organisms. Hence, bacteria that exhibit SC in colonies, such as Flavobacterium IR1, can be good models.

      Based on previous genomic mining and co-occurrence with SC in flavobacterial strains, this article focuses on the role of a specific gene, moeA, in SC of Flavobacterium IR1 strain colonies on an agar plate. moeA is involved in the synthesis of the molybdenum cofactor, which is necessary for the activity of key metabolic enzymes in diverse pathways.

      The authors clearly showed that the absence of moeA shifts SC properties in a way that depends on the nutritional conditions. They further bring evidence that this effect was related to several properties of the colony, all impacted by the moeA mutant: cell-cell organization, cell motility and colony spreading, and metabolism of complex carbohydrates. Hence, by linking SC to a single gene in appearance, this work points to cellular organization (as a result of cell-cell arrangement and motility) and metabolism of polysaccharides as key factors for SC in a gliding bacterium. This may prove useful for designing molecular strategies to control SC in bacterial-based biomaterials.

      Strengths:

      The topic is very interesting from a fundamental viewpoint and has great potential in the field of biomaterials.

      Thank you for this.

      The article is easy to read. It builds on previous studies with already established tools to characterize SC at the level of the flavobacterial colony. Experiments are well described and well executed. In addition, the SIBR-Cas method for chromosome engineering in Flavobacteria is the most recent and is a leap forward for future studies in this model, even beyond SC.

      We appreciate these comments.

      Weaknesses:

      The paper appears a bit too descriptive and could be better organized. Some of the results, in particular the proteomic comparison, are not well exploited (not explored experimentally). In my opinion, the problem originates from the difficulty in explaining the link between the absence of moeA and the alterations observed at the level of colony spreading and polysaccharide utilization, and the variation in proteomic content.

      We have looked at the organisation of the manuscript carefully in this revision, as suggested. In terms of the proteomics, there are a large number of proteins affected by the moeA deletion and not all could be followed up. We chose spreading, structural colour formation and starch degradation to follow up phenotypically, as the most likely to be relevant. For example, (L615-617) we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the moeA KO as a possible explanation for the reduced colony spreading of this mutant. Changes in polysaccharide (starch) utilization were seen on solid medium, as well as in the proteomic profile where we observed the upregulation of carbohydrate metabolism proteins linked to PUL (polysaccharide utilisation locus) operons (Terrapon et al., 2015), such as PAM95095-90 (Figure 8), and other carbohydrate metabolism-related proteins, including a pectate lyase (Table S7) which is involved in starch degradation (Aspeborg et al., 2012). And as noted in L555-566 and Figure 9, alterations in starch metabolism were investigated experimentally.

      First, the effect of moeA deletion on molybdenum cofactor synthesis should be addressed.

      MoeA is the last enzyme in the MoCo synthesis pathway, thus if only MoeA is absent the cell would accumulate MPT-AMP (molybdopterin-adenosine monophosphatase) (Iobbi-Nivol & Leimkühler, 2013), and the expressed molybdoenzymes would not be functional. In L582-585, we commented how the lack of molybdenum cofactor may affect the synthesis of molybdoenzymes. However, if you meant to analyse the presence of the small molecules, i.e. the cofactors involved in these pathways, that was an assay we were not able to perform. However, in L585-587, we addressed how the deletion of moeA affected the proteins encoded by the rest of genes in the operon which is relevant to the question.

      Second, as I was reading the entire manuscript, I kept asking myself if moeA (and by extension molybdenum cofactor) was really involved in SC or it was an indirect effect. For example, what if the absence of moeA alters the cell envelope because the synthesis of its building blocks is perturbed, then subsequently perturbates all related processes, including gliding motility and protein secretion? It would help to know if the effects on colony spreading and polysaccharide metabolism can be uncoupled. I don't think the authors discussed that clearly.

      The message of the paper is that the moeA gene, as predicted from a previous genomics analysis, is important in SC. This is based on the representation of the moeA gene in genomes of bacteria that display SC. This analysis does not predict the mechanism. When knocked out, a significant change in structural colour occurred, supporting this hypothesis. Whether this effect is direct or indirect is difficult to assess, as this referee rightly suggests. In order to follow up this central result, we performed proteomics (both intra- and extracellular). As we observed, the deletion of a single gene generated many changes in the proteomic profile, thus in the biological processes. Based on the known functions of molybdenum cofactor, we could only hypothesize that pterin metabolism is important for SC, not exactly how.

      We have discussed the links between gliding/spreading and polysaccharide metabolism more clearly, with reference to the literature, as quite a bit is known here including possible links to SC.

      “Polysaccharide metabolism in IR1 has been linked to changes in colony color and motility through the study of fucoidan metabolism (van de Kerkhof et al., 2022). Polysaccharide degradation and gliding motility are coupled to the same mechanism: the phylum-specific type IX secretion system, used for the secretion of enzymes and proteins involved in both functions (McKee et al., 2021).” [L622-626]

      Reviewer #2 (Public review):

      Summary:

      The authors constructed an in-frame deletion of moeA gene, which is involved in molybdopterin cofactor (MoCo) biosynthesis, and investigated its role in structural colors in Flavobacterium IR1. The deletion of moeA shifted colony color from green to blue, reduced colony spreading, and increased starch degradation, which was attributed to the upregulation of various proteins in polysaccharide utilization loci. This study lays the ground for developing new colorants by modifying genes involved in structural colors.

      Major strengths and weaknesses:

      The authors conducted well-designed experiments with appropriate controls and the results in the paper are presented in a logical manner, which supports their conclusions.

      We appreciate these comments.

      Using statistical tests to compare the differences between the wild type and moeA mutant, and adding a significance bar in Figure 4B, would strengthen their claims on differences in cell motility regarding differences in cell motility.

      Thank you. Figure 4B contains the significance bars that represent the standard deviation of the mean value of the three replicates, but we have modified it to make them more clear.

      Additionally, in the result section (Figure 6), the authors suggest that the shift in blue color is "caused by cells which are still highly ordered but narrower", which to my knowledge is not backed up by any experimental evidence.

      Thanks. We mentioned that the mutant cells are narrower than the wild type based on the estimated periodicity resulting from the goniometry analysis (L427-430). We will now say “likely to be narrower based on the estimated periodicity from the optical analysis” rather than just “narrower”.

      “This optical analysis aligns with visual observations, confirming the blue shift in ΔmoeA, and suggests that this change in SC is caused by cells which are likely to be narrower based on the estimated periodicity from the optical analysis.” [L409-411]

      Overall, this is a well-written paper in which the authors effectively address their research questions through proper experimentation. This work will help us understand the genetic basis of structural colors in Flavobacterium and open new avenues to study the roles of additional genes and proteins in structural colors.

      Much appreciated.

      Recommendations for the authors:

      Reviewing Editor Comments:

      As you will see, the reviewers were rather positive about the paper but suggested a number of points to improve it, including a discussion of the direct role of moeA as well as specific editorial comments.

      Reviewer #1 (Recommendations for the authors):

      More specific comments to the authors:

      (1( Line 300, Paragraph on bioinformatic analysis of molybdopterin operon : As written, it is not clear whether this operon is crucial for pterin cofactor synthesis or only some genes are involved. And what is the contribution of moeA?

      Based on the bioinformatic analysis done in Zomer et al., 2024, we know the score of which genes of the molybdopterin cofactor synthesis operon may be more relevant to the display of SC, in addition to moeA. We chose moeA to KO as it had the highest score, being careful to delete the coding sequence and not any upstream promoter. The other genes in the predicted operon are moaE, moaC2, and moaA. Then in the proteomic analysis (L435-442), we analysed how the encoded proteins from this operon were upregulated (MoaA, MoaC2, and MobA), indicating also the unaltered proteins (MoeZ and MoaE) and the undetected proteins (MoaD and SumT). Nevertheless, the operon is crucial for pterin cofactor synthesis because it contains all the genes involved in the pathway, and moeA encoded the enzyme for the last reaction of the pathway, being the the molecule produced in the mutated pathway the adenylated molybdopterin (MPT-AMP) instead of molybdenum cofactor (MoCo).

      (2) Paragraph line 342 on moeA mutant phenotyping :

      Is the reduction in colony spreading caused by a defect in single-cell gliding motility or is the cause more complex? This can be quantified.

      We believe the cause is more complex. As mentioned above, for example, in (L615-617) we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the moeA KO as a possible explanation for the reduced colony spreading of this mutant. This cannot be explained simply by spreading, but must (from the optical analysis) indicate changes in cell organisation/dimensions.

      (3) During the description of the moeA mutant phenotype (associated with Figures 2 and 4) and throughout the article, the optical properties are « functions » of colony spreading and moeA-dependent metabolism. However it is not quite clear if these two effects are independent or if one may be a consequence of the other.

      As noted above, colony spreading alone does not explain the blue-shift in SC observed. Given the function of MoeA (molybdate insertion into MPT-AMP [adenylated molybdopterin], MoMPT [molybdenum-molybdopterin] formation) for the synthesis of MoCo (molybdenum cofactor), the primary effect seems to be on metabolism but as we are dealing with an influential enzymatic cofactor a number of secondary effects are likely, and indeed the proteomics supports this. It is likely that the effect on spreading is secondary as seen with the downregulation of GldL (see above), but we cannot be sure.

      (4) Paragraph starting line 381 and Figure 5 on gliding motility:

      Gliding motility has to be tested at the level of single cells, allowing a more thorough characterization of the spreading defects. In addition, since gliding is entangled with Type IX-dependent secretion in Flavobacteria, the authors should test if Type IXdependent was perturbed in the absence of moeA.

      Based on the intracellular and extracellular proteomic analyses, the regulated T9SS proteins in the absence of moeA are the downregulation of GldL and SprT, and the upregulation of PorU. It shows the log2 FC (moeA/WT) of each these extracellular proteins:

      Author response table 1.

      <-1: downregulated in moeA KO, -1<X<1: no significant regulation, >1: upregulated in moeA KO, -: not detected

      (5) L401: In my opinion, the section "Quantification of the optical responses of IR1 WT and ΔmoeA colonies" should be moved up, before the characterization of motility.

      We have done this, as suggested. The section was moved from L401-423 to L388-411.

      (6) L475: Proteome comparison: « Of the total known proteins in IR1, 27.5% (1,504 proteins) extracellular proteins were identified » Are some of these proteins also found in the cell fraction? Wouldn't it be more accurate to write that « 1504 proteins were found in the extracellular fraction"?

      We have done this, as suggested.

      “Of the total known proteins in IR1, 27.5% (1,504 proteins) proteins were detected in the extracellular fraction, 60.4% (909) were statistically significant (p<0.01), with 20.5% (186) considered downregulated, and 20% (182) upregulated in ΔmoeA (Figure 7B).” [L484-486]

      How can the authors exclude contamination of the extracellular fraction? This could easily explain the number of proteins lacking secretion signals: "29.6% (55) were likely secreted through a non-classical way, lacking typical secretion sequence motifs in their N-terminus."

      Based on the results from SecretomeP and SignalP, we excluded contamination, reducing the significant downregulated proteins from 186 (L476) to 69 (L486), and the upregulated ones from 182 (L477) to 111 (L500).

      (7) L490: if the protein misannotated flagellin is highly downregulated, why not push the analysis a bit further and ask what true function may be perturbed? In addition, it should not be classified as a motility protein in Table S6 and considered as a motility protein in the article.

      We reconsidered the information given by this and decided to remove it because after checking the homology of the polypeptide by Blast searching, we feel it is probably due to a missannotation.

      As is, the whole proteomic section is not that useful. Too many functions are evoked and the reader is not directed toward any particular conclusion. The most convincing hits from the proteomic analysis should be confirmed using another method. Transcriptional regulation could be easily probed by RT-qPCR. Or, since genetics is possible, proteins could be tagged and levels compared by western blot maybe? Do knock-out of the encoding genes generate any phenotype on SC? This would bring weight to the proteomic analysis.

      We have revised the proteomics section and removed functions that are not directly relevant to our conclusion.

      We feel the most important observation suggested by proteomics was the possible link between moeA and starch metabolism, because the metabolism of complex polysaccharides is important in the Flavobacteriia and known to be linked to SC (van de Kerkhof et al., 2022). It was not possible to follow up every pathway suggested by the proteomics, but the study is appropriately performed with the correct statistics.

      (8) Figure 9 : Does the absence of moeA affect the spreading of ASWS? Were colony sizes similar during the starch degradation assay? How can the authors rule out the idea that starch degradation is impacted by the difference in spreading rather than an independent function of moeA in starch metabolism? Slower spreading could lead to the accumulation of amylases, hence stronger activity. Why does starch degradation only accumulate at the center of the colony in the WT case?

      The colonies of the WT and moeA had similar size during the starch degradation assay (2 days). However, after day 3, only WT colonies kept expanding on diameter.

      Starch degradation is logically in the centre of the colony as it is where the greatest concentration of cells exists, secreting degradative enzymes, for the longest time. Presumably starch degradation at the colony edge is not yet seen as the action of extracellular enzymes is low and has not had time to degrade the starch to the point that there is no iodine staining.

      “In contrast to other media where ΔmoeA colony expansion was less than WT, the ΔmoeA showed similar colony spreading and stronger starch degradation, supporting a role of moeA in complex polysaccharides metabolism.” [L562-565]

      (9) Finally, I am not quite sure what the authors mean by « a role of moeA in complex polysaccharides metabolism ». Are they referring to enzymes secreted in the medium to degrade starch? or to the incorporation and use of starch degradation products?

      We meant that the deletion of moeA showed an increase of extracellular starch degradation as seen in the iodine assay (Figure 9), as well as the upregulation of three different PUL operons (Figure 8).

      Reviewer #2 (Recommendations for the authors):

      The paper in general is well written with proper experimentation. However, here are a few recommendations for improving the writing and presentation, including minor corrections to the text and figures.

      Thank you.

      (1) It would be helpful for the readers if you could expand on "some metabolic pathways" in line 71. Please provide examples of metabolic pathways that are linked to SC.

      We have done this.

      “A recent bioinformatic study has shown the possible link of some metabolic pathways, such as carbohydrate, pterin, and acetolactate metabolism, to bacterial SC (Zomer et al., 2024).”[L70-72]

      (2) "Line 79 : a bioinformatics analysis", please mention what kind of bioinformatics analysis was done and by whom to provide clarity for the readers: Either mention bio info analysis or give more details on what kind of bio info analysis and study done by whom"

      We have clarified this, as suggested.

      “A large-scale, genomic-based analysis of 117 bacteria strains (87 with SC and 30 without) identified genes potentially involved in SC by comparing gene presence/absence, providing a SC-score (Zomer et al., 2024). By this method, pterin pathway genes were strongly predicted to be involved in SC.” [L80-83]

      (3) Please correct "Bacteria strains used in this study" to "bacterial" strains in Line 122.

      We have done so.

      (4) Please indicate in "Lines 394-396" that there were no vortex patterns observed in the moeA mutant.

      We have done so.

      “In contrast, ΔmoeA exhibited limited motility, with a more tightly packed cell organization and a fine, slow-moving layer at the edge (Figure 6, blue arrows), and did not show a ‘vortex’ pattern. This suggests that moeA deletion significantly impairs cell motility and colony expansion.” [428-L431]

      (5) In Figure 4 it looks like with a different carbon source (ASWB with agar and Fucoidan (ASWBF)) the moeA mutant and wild type exchanges its phenotype compared to ASWBKC. Could you explain why this happens in the discussion by highlighting the differences between fucose and Kappa-Carrageenan or confirm if there are any differences in the carbohydrate utilization between the wild type and moeA mutant using biolog assays?

      We have explained the differences. Biolog would not be appropriate as we are looking for metabolic processes of bacteria on surfaces (agar) and this is not necessarily appropriate to biolog, which we understand uses liquid cultivation in microplates.

      “On different polysaccharide media, the ΔmoeA strain showed varied SC and colony expansion patterns: green/blue SC and low colony expansion on agar, intense blue SC and low colony expansion on kappa-carrageenan, dull green SC and low colony expansion on fucoidan, and blue/green SC with higher colony expansion on starch. Interestingly, the color phenotype of the WT and ΔmoeA exchanged their phenotype on kappa-carrageenan (a simple linear sulfated polysaccharide of D-galactopyranose) and fucoidan (a complex sulfated polysaccharide of fucose and other sugars as galactose, xylose, arabinose and rhamnose), showing the importance of the polysaccharide metabolism in SC. While reduced motility has been associated with dull or absent SC, and reduced polysaccharide metabolism (Kientz et al., 2012a; Johansen et al., 2018), ΔmoeA showed reduced motility, but an intense blue SC, and high polysaccharide metabolism. Based on these results, we established a link among polysaccharide metabolism, MoCo biosynthesis, and SC, showing that intense SC is not strictly dependent on motility.” [L636-648]

      (6) In the discussion "Line 632" it is unclear what loss is being limited, and it would help strengthen your discussion if you could add references for lines: 633-636. There are a lot of hypotheses in lines 637-642, it would help the readers if you could clearly mention that these are hypotheses and will need experimental evidence or provide appropriate evidence to support these claims.

      We have done this.

      “Ecologically, we hypothesize that dense, highly structured bacterial colonies, such as necessary for the SC phenotype, can enhance the uptake of metabolic degradation products from complex polysaccharides. These large macromolecules are often partially hydrolyzed extracellularly because they are too large to pass through bacterial cell membranes. For example, marine Vibrionaceae strains that produce lower levels of extracellular alginate lyases tend to aggregate more strongly, potentially facilitating localized degradation and uptake of polysaccharides (D’Souza et al., 2023). Additionally, certain marine bacteria employ a "selfish" mechanism to internalize large polysaccharide fragments into their periplasmic space, minimizing loss to the environment and enhancing substrate utilization (Reintjes et al., 2017). Bacteria secrete enzymes into the surrounding environment to break these polysaccharides down into more easily absorbable monosaccharides or oligosaccharides. This mechanism suggests that the colony structure could create a physical barrier that keeps these products concentrated and near the cells, allowing the colony to efficiently access and utilize these products, preventing the leakage into the surrounding environment. While SC may also yield other ecological benefits associated with growth in biofilms, the highly structured colonies that characterize SC may be more resistant against invasion by competitor species scavenging for degradation products, than an unstructured biofilm. This model is consistent with the observation that SC is associated with polysaccharide metabolism genes, and with the recent observation that SC is mainly localized on surface and interface environments such as airwater interfaces, tidal flats, and marine particles (Zomer et al., 2024).” [L650-670]

      (7) It would help the readers if you could expand on how polysaccharide metabolism is linked to motility in Line 610.

      As indicated previously, this is known and we will clarify.

      “Polysaccharide metabolism in IR1 has been linked to changes in colony color and motility through the study of fucoidan metabolism (van de Kerkhof et al., 2022).” [L622-623]

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Recommendation:

      We value the reviewer’s positive comments and appreciate time taken to provide detailed feedback that has certainly helped improve our manuscript.

      Major Concerns:

      (1) The biggest concern in this manuscript lies with experiments comparing polyP45, which has a net negative charge of -47, and double-stranded DNA of 45 base pairs (as stated in the methods), which will have a net negative charge of -90. Given the dependence of phase separation and phase transitions on not only net charge but charge density, this is an important factor to consider when comparing the effect of these molecules. It is unclear how or if the authors considered these factors in the design of their experiments. Because of the factor of 2 difference in net charge over the same number of polymer chain components, i.e. a chain of 45 pi vs. a chain of 45 double-stranded base pairs, it is unclear if the results from polyP vs. DNA are directly comparable. One solution would be to repeat all DNA experiments using single-stranded DNA so that the net charge is similar to polyP over the same chain length. Another possibility would be to repeat DNA experiments using a doublestranded DNA of 23 base pairs. This would allow for a nearly equal net charge (-46 vs. -47 for polyP), but the charge density would still be 2X polyP. As it stands now, the perceived differences in DNA vs. polyP behavior may be an artifact arising from the difference in net charge and charge density between DNA and polyP.

      To address the reviewer’s concerns regarding charge density differences between polyP and DNA, we conducted an experiment using a higher DNA concentration (11.24 µM) to obtain charge equivalence between the two experiments (i.e. the total concentration of charges). As shown in Figure S5, even at higher DNA concentration, the condensates undergo progressive dissolution over time. This observation indicates that the differential maturation of condensates, arising from distinct initial protein ensembles, are governed by the intrinsic properties of polyP. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. The differences are not an artifact as they arise from fundamental features of two natural anionic polymers found within cells. In other words, the outcomes could be very different if the concentration of one polymer dominates over the other (see the response below).

      (2) One outstanding question the authors do not address relates to how mixtures of CytR or FruR, DNA, and polyP behave. In the bacterial cytoplasm, these molecules are all in the same compartment (admittedly that compartment is not well mixed due to unique condensate-driven organization). Would the authors expect to see similar effects of polyP and DNA if they were in the same solution? Perhaps the authors could run a set of experiments where they vary the ratios of DNA and polyP to probe how increased levels of "stress", i.e. increased levels of polyP vs. DNA, alter the formation and behavior of enzymatic condensates.

      Following this comment, we investigated the phase separation behavior of CytR WT in the presence of different charge ratios of polyP-DNA mixtures. As seen in Author response image 1,panel A below, the outcomes are highly sensitive to the starting concentrations: at higher charge concentration of polyP (left panel), the OD and ThT fluorescence intensity is high at lower time points, both decrease and increase again. Fluorescence microscopy images (panel B) reveal similar trends, but the more fascinating outcome are the FRAP recovery profiles which recover extremely fast and fully at zero time point (panel C) despite aggregation-like tendencies observed in ThT fluorescence assays. However, at longer time points (20 and 40 mins) the FRAP recovery is significantly weaker but recovers to ~65% at 1 hour (panel C). At high relative polyP concentrations with respect to DNA, droplets are formed first which then transition into aggregates (liquid-to-solid transition; middle image in panel A). At relatively high DNA concentrations it appears that both droplets and aggregates co-exist as both OD and ThT fluorescence are moderately high. Given these complex behaviors, we have not included the same in the current manuscript as we still do not fully understand the origins of these differences. In fact, we are planning to extend this study by exploring the combinations in detail to understand the relative roles played by the two polymers in ternary mixtures.

      Author response image 1.

      (3) In Figure 1H, the recovery trace shows the fractional recovery of DM to near WT levels. It is clear from the images that recovery of the bleached region occurs, but the overall fluorescence intensity of DM is much lower than WT, even when accounting for the difference in starting condensate sizes in the Pre-Bleach images. Shouldn't this qualitative difference in total fluorescence be reflected in the quantitative trace?

      In Figure 2H, as the reviewer rightly points out, there is a clear difference in the absolute fluorescence intensity between WT and DM condensates. We would like to clarify that the recovery traces shown in Figure 2I were normalized to the pre-bleach intensity of each individual condensate to reflect fractional recovery. This normalization is intended to highlight the relative mobility of the protein within each condensate, but it does not capture the difference in total fluorescence intensity between WT and DM.

      (4) A description of the molten-globular variant Y19A FruR should be included in the main text where the variant is introduced. There is currently no additional description of the molten-globular variant in the Supplement as suggested by the manuscript.

      Figure 6A depicts the three-dimensional structure of FruR WT, with tyrosine residues Y19 and Y28, shown in red, forming stacking interactions. In the Y19A mutant, the loss of these interactions results in little changes in secondary structure (as shown in Figure 6E) but disrupts the protein’s tertiary structure, resulting in a molten globular state. The FruR work is now published in JPCB and can be found at https://doi.org/10.1021/acs.jpcb.4c03895, and is also appropriately cited in the revised version (reference 53).

      (5) Throughout the manuscript, the authors discuss polyP and DNA being able (or unable) to "distinguish" between different variants of CytR and FruR. This is confusing and suggests that DNA or polyP can choose to bind one form over another. The authors should re-work the language in this section to better reflect their direct observations for the behavior of protein in CD experiments and condensate behavior in imaging and turbidity experiments.

      We have now modified the text where necessary. The experiments were not done in the presence of both polyP and DNA, but in isolation (protein + polyP or protein + DNA). Hence, our aim is to convey that polyP is the polymer that leads to variable outcomes because of its ability to ‘interact’ differently with the different starting ensembles.

      Minor Concerns:

      (1) For all Figures, please include the number of measurements, i.e., N = ...

      We have updated all figure legends to include the number of measurements, indicated as N = ..., as suggested.

      (2) For all Figures, please place panel labels, i.e., A, B, C, etc., in the same respective location for each panel. As currently mapped out, it is difficult to easily determine which data are associated with each panel because the IDs are in various locations.

      Due to variations in data presentation and spacing within individual plots, it was challenging to place all labels in exactly the same position without obscuring important details. We have therefore maintained the labels as they were before.

      (3) In the introduction, it would be helpful for the authors to specify exactly what is meant by chaperone. Given the context, it seems that the authors refer to the chaperone activity as one that prevents aggregation. Is this correct?

      We refer to chaperone activity specifically as the ability to prevent aggregation of proteins. We have now clarified this definition in the Introduction section of the revised manuscript.

      (4) The results for experiments shown in Figure 3 need additional setup in the text. Were these measurements taken immediately after mixing WT, DM, or P33A with polyP? If so, why do condensates immediately appear and then dissipate before ThT-detected aggregates begin forming? Or were condensates allowed to form and then transferred to a different buffer, after which measurements were taken? Without a brief description of the experimental setup, interpreting the results is difficult.

      The condensates appear immediately after adding polyP to protein solutions, indicating that the condensate phase is kinetically accessible on mixing polyP with DM or the WT. As illustrated in Figure 3A and 3B, for WT protein, the condensates undergo liquid to solid transition over the time as this likely is the most thermodynamically stable phase. Effectively, this work is to convey that it is important to look at time-dependence of even droplets when formed as they may not be the most stable phase.

      (5) Please include images of P33A over the time course of the experiment in Figure 3B.

      We have included the representative images of P33A in presence of polyP over the time in Figure 3B in the revised manuscript.

      (6) In Figures 3D, E, G, and H, please plot each measurement separately with mean and standard deviation to enable the reader to see each data point.

      We have now revised Figures 3D, E, G, and H to show individual data points along with the mean and standard deviation.

      (7) In the top paragraph on page 12, "fast-moving molecules" can be replaced with "dynamic molecules", as this offers a better description of the FRAP data.

      We have incorporated the suggested changes.

      (8) In the "Structural changes within the condensates spans over three hours" results section on page 15, the conclusion reads "In summary, we find that both the WT and the DM 'unfold' on forming condensates with polyP..." The way this is written suggests that WT and DM behave in a similar manner. Given the CD data, however, it seems that by 4 hours, DM forms alpha helices while the WT does not. This suggests that while each unfolds, the conformation at 4 hours is different. The summary should reflect these differences.

      We fully agree with the reviewer on this. The summary is now modified to include the fact the DM forms alpha helices at 4 hours while the WT does not.

      (9) At the end of the first paragraph of the results section "DNA does not discriminate the conformational ensembles" the authors should refer to Figure 2G, where they show the altered morphology of polP-P33A condensates.

      We have now included the reference to Figure 2G.

      (10) The authors refer to droplets "solubilizing" throughout the manuscript. It seems that dissolve is a better term to use. Solubilize is better associated with individual biomolecules while dissolve is better associated with condensate behavior.

      We thank the reviewer for pointing this out. We have revised the manuscript to replace “solubilize” with “dissolve”.

      (11) In Figures 5L and 5N, please change the Y-axis scale so that each curve is visible on the plot.

      We have adjusted the Y-axis scale in Figures 5L, 5M, and 5N to ensure that each curve is clearly visible and for easier comparison among the variants.

      (12) The authors should show an image of FruR WT and Y19A with DNA for a direct comparison with experiments in which FruR and polyP were used. The addition of turbidity measurements of samples shown in Figure 6D will offer another direct comparison. As written, there is no way for the author to directly compare the effects of polyP and DNA on FruR phase transitions.

      As suggested, we have now included representative images of FruR WT and Y19A with DNA (Figure 6K and 6L) to enable a direct comparison with the FruR–polyP experiments. Also, we have already shown turbidity measurements in Figure 6B and 6C corresponding to the samples shown in Figure 6D.

      Reviewer 2:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      We thank the reviewer for the favorable comments on the manuscript.

      Major Concerns:

      (1) The authors are using bacterial proteins (CytR and FruR) and solely represent polyphosphates as polyP45 (a polyphosphate with 45 Pi units). However, in bacterial systems, polyphosphates can be significantly longer (in the order of 100s to 1000 Pi units). Additionally, the experiments were run at neutral pH (7.0), and though this is fairly appropriate for the cytoplasm, volutin granules (where polyphosphates often accumulate) are typically considered slightly acidic (pH 5.5-6.5). From a physiological perspective, understanding how pH and the length of polyphosphate influence the ability to induce condensates or aggregates could be of importance.

      We appreciate the reviewer’s insightful comments regarding the physiological relevance of polyphosphate length and pH. In our current study, we used polyP45 as it is easily available commercially and we conducted our experiments at pH 7 to mimic the general cytoplasm conditions. We agree that polyphosphates in bacterial cells can be significantly longer (hundreds to thousands of Pi units) and conducting experiments at slightly more acidic environment would be physiologically relevant. We plan to use longer polyP from Regene Tiss Inc. and acidic pH to explore how polyphosphate-induced phase separation of CytR vary with pH as a part of a future study. One could imagine doing all the experiments listed in the manuscript at different pH conditions for the different variants, but this could not be a part of the current work which has a specific focus on the differences in maturation properties depending on the nature of starting ensemble. However, the pKa values of the internal hydroxyl groups is ~2.2 (DOI:10.2147/IJN.S389819) indicating that the polyP carries near identical charges in the pH range between 4-7, and hence we expect little change in the charged status of polyP. On the other hand, the protonation states of charged amino acids within CytR could vary with pH, thus influencing its assembly properties.

      (2) In the study, the longest metastable condensate induced by polyphosphate lasted approximately 3 hours before resolubilizing. It would be nice if the authors were able to generate a longer-lived condensate phase that would enable further mechanistic studies (e.g., NMR).

      We agree that generating longer-lived condensates would be highly valuable for mechanistic studies. However, the formation and stability of condensates is an intrinsic property of protein, and optimizing different conditions for a longer-lived condensate phase is beyond the scope of the current study. It is possible that the condensates are long-lived with longer polyP, but it is not clear if this would indeed be the case. We would also like to state here that while it is common to report on the liquid-to-solid transition in condensates, the intrinsic metastability of droplets (when there is no aggregation) is rarely reported. One possibility is to mutationally introduce cysteine residues and induce the formation of disulphide bridges (as done in a recent work, doi: 10.1021/jacs.4c09557) that make the condensate highly stable kinetically; however, this would also complicate the interpretation as the mechanism of condensate formation might be very different. We have therefore reported our results as an observation arising from differences in the nature of the poly-anionic polymers.

      (3) The authors showed that CytR DM (fully folded), CytR WT (minor state folded), and CytR P33A (highly disordered) with polyphosphates lead to longer-lived condensates that resolubilize, shorterlived condensates that aggregate, and immediate aggregating, respectively. Whereas FruR (folded) and FruR Y19A (molten globular) with polyphosphate induce spontaneous aggregation and short-lived condensates, respectively. I would expect FruR to be more similar to CytR DM and FruR Y19A more similar to CytR WT in terms of structure and conformational dynamics and plasticity, yet they have opposing results. This raises a bit of concern. Meaning, that though polyphosphate discriminates between the different ensembles, is it actually possible to obtain information on the initial ensemble composition?

      In the current study, we show that CytR WT (less structured) and FruR Y19A (molten globule) form short-lived condensates that aggregate. We agree with the reviewer that while CytR DM (fully folded) forms condensates that dissolve over time, FruR WT (fully folded) variant forms aggregates immediately upon polyP addition. The observations show that polyP can discriminate between different protein conformations, in contrast to DNA, which does not show such selectivity. However, we acknowledge that while polyP-induced behavior reflects aspects of protein ensemble properties, it does not provide direct insight into the nature of the initial conformational ensemble.

      (4) In the case of FruR with polyphosphate, no CD for the secondary structure analysis was provided as it was for CytR. It would be useful to see if the polyphosphate-induced structural changes observed for CytR hold true for FruR as well.

      We thank the reviewer for the suggestion. In response, we have performed far-UV CD experiments on FruR variants in the presence of polyP. Similar to the CytR WT, FruR WT shows unfolding upon polyP addition. A similar outcome is noted for the Y19A variant though there is significant residual helix content in the condensate unlike the WT. The CD spectra of FruR variants have been added to Figure 6.

      Minor Concerns/Suggestions:

      Under conclusion, third paragraph, first sentence. This sentence reads, "Our observations thus establish that polyP efficiently discriminates the conformational features of proteins than DNA, contributing to the diverse outcomes."

      We thank the reviewer for pointing this out. The sentence has been revised for clarity. It now reads “Our observations establish that polyP is more sensitive to the conformational features of proteins than DNA, thereby contributing to the diverse outcomes.”

      One experimental suggestion. Seeing that protein dynamics and plasticity seem to play a role. For either CytR WT or DM, it would be interesting to see the influence of temperature. Altering the temperature is a good way to perturb the population distribution of conformation sub-states and to alter kinetics. It may be that at a lower temperature (maybe 5C) for the WT you reduce conformational dynamics and you obtain results more similar to that of the DM. Alternatively, heating the DM would be another option. Obviously, there are additional challenges that may arise with changing the temperature, but if it were to work I think it could add some value.

      We thank the reviewer for the thoughtful suggestion. Due to limitations in our current experimental setup (as the reviewer notes as ‘challenges’)- the confocal set up does not have a temperature controller - we will not be to perform temperature-controlled assays. However, the ‘structure’ of CytR variants do not vary much between 280 – 298 K, and this is one of the reasons for choosing three variants without altering any other thermodynamic property. If temperature were varied, the dynamics of polyP would also change and hence the true molecule origins of any differences we might observe will be confounded by the dynamic effects on polyP as well. In this work, we have eliminated any dynamic differences in polyP by performing the experiments at a fixed temperature.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results.

      Several times in the previous version of the manuscript we used the expression “effect of dN/dS on…” which might suggest a causal relationship. We have rephrased these expressions and highlighted the changes in the main text, so that correlation is not mistaken with causation (see also responses to detailed comments below).

      In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

      The analysis focuses on metazoans for two reasons: one practical and one fundamental.

      The practical reason is computational. Our analysis included TE annotation, phylogenetic estimation and dN/dS estimation, which would have been very difficult with the hundreds, if not thousands, of plant genomes available. If we had included plants, it would have been natural to include fungi as well, to have a complete set of multicellular eukaryotic genomes, adding to the computational burden. The second fundamental reason is that plants show important genome size differences due to more frequent whole genome duplications (polyploidization) than in animals. It is therefore possible that the effect of selection on genome size is different in these two groups, which would have led us to treat them separately, decreasing the interest of this comparison. For these reasons we chose to focus on animals that still provide very wide ranges of genome size and population size well suited to test the impact of genetic drift on the genomic TE content.

      Reviewer #2 (Public review):

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH.

      The MHH hypothesis says that (i) low-Ne species have more junk in their genomes and

      (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      The MHH is arguably the most structured and influential theoretical framework proposed to date based on the null assumption (i), therefore setting the paper up with the MHH is somehow inevitable. Because of this, we mostly discuss the assumption (ii) (the mutational aspect brought about by junk DNA) and the peculiarities of TE biology that can drive the genome away from the expectations of (i). We however agree that the hazard posed by extra DNA is not limited to the gain of function via the mutation process, but can be linked to many other molecular processes as mentioned above. Moreover, we also agree that our results can be interpreted within the general framework of the nearly-neutral theory. They demonstrate that mutations, whether increasing or decreasing genome size, have a distribution of fitness effects that falls outside the range necessary for selection in larger populations. In the revised manuscript, we made the concept of hazard more comprehensive and further stressed that this applies not only to TEs but any nearly-neutral mutation affecting non-coding DNA (lines 491-496): “Notably, these results not only reject the theory of extra non-coding DNA being costly for its point mutational risk, but also challenges the more general idea of its accumulation depending on other kinds of detrimental effects, such as increased replication, pervasive transcription, or ectopic recombination. Therefore, our results can be considered more general than a mere rejection of the MHH hypothesis, as they do not support any theory predicting that species with low Ne would accumulate more non-coding DNA.”

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      We thank the referee for this perspective. We agree that comparing dispersion of the points from the non-phylogenetically corrected correlation with the results of the phylogenetic contrasts intuitively emphasizes the importance of accounting for species relatedness. We added on to the discussion to stress the phylogenetic structure present in the data (lines 408-417): “It is important to note how not treating species traits as non-independent leads to artifactual results (Figure 2B-C). For instance, mammals have on average small population sizes and the largest genomes. Conversely, insects tend to have large Ne and overall small genomes. With a high sampling power and phylogenetic inertia being taken into account, our meta-analysis clearly points at a phylogenetic structure in the data: the main clades are each confined to separate genome size ranges regardless of their dN/dS variation. The other way around, variability in genome size can be observed in insects, irrespective of their dN/dS. Relying on non phylogenetically corrected models based on a limited number of species (such as that available at the time of the MHH proposal) can thus result in a spurious positive scaling between genome size and Ne proxies.”

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

      We respectfully disagree with the review that there is currently no evidence for an effect of Ne on genome size evolution. While it is accurate that our large dataset allows us to reject the universality of Ne as the major contributor to genome size variation, this does not exclude the possibility of such an effect in certain contexts. Notably, there are several pieces of evidence that find support for Ne to determine genome size variation and to entail nearly-neutral TE dynamics under certain circumstances, e.g. of particularly strongly contrasted Ne and moderate divergence times (Lefébure et al., 2017 Genome Res 27: 1016-1028; Mérel et al., 2021 Mol Biol Evol 38: 4252-4267; Mérel et al., 2024 biorXiv: 2024-01; Tollis and Boissinot, 2013 Genome Biol Evol 5: 1754-1768; Ruggiero et al., 2017 Front Genet 8: 44). The strength of such works is to analyze the short-term dynamics of TEs in response to N<sub>e</sub> within groups of species/populations, where the cost posed by extra DNA is likely to be similar. Indeed, the MHH predicts genome size to vary according to the combination of drift and mutation under the nearly-neutral theory of molecular evolution. Our work demonstrates that it is not true universally but does not exclude that it could exist locally. Moreover, defence mechanisms against TEs proliferation are often complex molecular machineries that might or might not evolve according to different constraints among clades. We have detailed these points in the discussion (lines 503-518).

      Reviewer #3 (Public review):

      Summary

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      Strengths

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      Weaknesses

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      There are indeed some discrepancies between our estimates of low complexity repeats and those from the literature due to the approach used. Hence, occasional underestimates or overestimates of repeat content are possible. As noted, the contribution of “Other” repeats to the overall repeat content is generally very low, meaning an underestimation bias. We thank the reviewer for providing this interesting review.

      We emphasized these points in the discussion of our revised manuscript (lines 358-376): “While the remarkable conservation of avian genome sizes has prompted interpretations involving further mechanisms (see discussion below), dnaPipeTE is known to generally underestimate satellite content (Goubert et al. 2015). This bias is more relevant for those species that exhibit large fractions of satellites compared to TEs in their repeatome. For instance, the portions of simple and low complexity repeats estimated with dnaPipeTE are consistently smaller than those reported in previous analyses based on assembly annotation for some species, such as Triatoma infestans (0.46% vs 25%; 7 Mbp vs 400 Mbp), Drosophila eugracilis (1.28% vs 10.89%; 2 Mbp vs 25 Mbp), Drosophila albomicans (0.06% vs 18 to 38%; 0.12 Mbp vs 39 to 85 Mbp) and some other Drosophila species (Pita et al. 2017; de Lima and Ruiz-Luano 2022; Supplemental Table S2). Although the accuracy of Coevol analyses might occasionally be affected by such underestimations, the effect is likely minimal on the general trends. Inability to detect ancient TE copies is another relevant bias of dnaPipeTE. However, the strong correlation between repeat content and genome size and the consistency of dnaPipeTE and earlGrey results, even in large genomes such as that of Aedes albopictus, indicate that dnaPipeTE method is pertinent for our large-scale analysis. Furthermore, such an approach is especially fitting for the examination of recent TEs, as this specific analysis is not biased by very repetitive new TE families that are problematic to assemble.”

      Not being able to correctly estimate the quantity of satellites might pose a problem for quantifying the total content of junk DNA. However, the overall repeat content mostly composed of TEs correlates very well with genome size, both in the overall dataset and within clades (with the notable exception of birds) so we are confident that this limitation is not the explanation of our negative results. Moreover, while satellite information might be missing, this is not problematic to test our hypothesis, as we focus on TEs, whose proliferation mechanism differs significantly from that of tandem repeats and largely account for genome size variation.

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al., 2015 Genome Biol Evol 7: 1192-1205). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

      Indeed, dnaPipeTE is not good at detecting old TE copies due to the read-based approach, biasing the outcome towards new elements. We agree that TE content can be underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. However, the sum of old TEs and recent TEs is extremely well correlated to genome size (Pearson’s correlation: r = 0.87, p-value < 2.2e-16; PIC: slope = 0.22, adj-R<sup>2</sup> = 0.42, p-value < 2.2e-16). Our main result therefore does not rely on an accurate estimation of old TEs. In contrast, we hypothesized that recent TEs could be interesting because selection could be more likely to act on TEs insertion and dynamics rather than on non-coding DNA as a whole. Our results demonstrate that this is not the case. It should be noted that in spite of its limits towards old TEs, dnaPipeTE is well-suited for this analysis as it is not biased by highly repetitive new TE families that are challenging to assemble. In the revised manuscript, we now emphasize the limitations of dnaPipeTE and discuss the consequences on our results. See lines 359-374 (reported above) and lines 449-455: “On the other hand, it is conceivable the avian TE diversity to be underappreciated due to the limits of sequencing technologies used so far in resolving complex repeat-rich regions. For instance, employment of long-reads technologies allowed to reveal more extended repeated regions that were previously ignored with short read assemblies (Kapusta and Suh 2017; Benham et al. 2024). Besides, quite large fractions might indeed be satellite sequences constituting relevant fractions of the genome that are challenging to identify with reference- or read-based methods (Edwards et al. 2025).”

      Finally, in a preliminary work on the dipteran species, we showed that the TE content estimated with dnaPipeTE is generally similar to that estimated from the assembly with earlGrey (Baril et al., 2024 Mol Biol Evol 38: msae068) across a good range of genome sizes going from drosophilid-like to mosquito-like (TE genomic percentage: Pearson’s r = 0.88, p-value = 1.951e-10; TE base pairs: Pearson’s r = 0.90, p-value = 3.573e-11; see also the corrected Supplementary Figure S2 and new Supplementary Figure S3). While TEs for these species are probably dominated by recent to moderately recent TEs, Ae. albopictus is an outlier for its genome size and the estimations with the two methods are largely consistent. However, the computation time required to estimate TE content using EarlGrey was significantly longer, with a ~300% increase in computation time, making it a very costly option (a similar issue applicable to other assembly-based annotation pipelines). Given the rationale presented above, we decided to use dnaPipeTE instead of EarlGrey.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Since I am not an expert in the field, some of these comments may simply reflect a lack of understanding on my part. However, in those cases, I hope they can help the authors clarify important points. I did have a bunch of comments concerning the complexity of the relationship between TEs and their hosts that would likely affect TE content, but I ended up deleting most of them because they were covered in the discussion. However, I do think that in setting up the paper, particularly given the results, it might have been useful to introduce those issues in the introduction. That is to say, treating TEs as a generic mutagen that will fit into a relatively simple model is unlikely to be correct. What will ultimately be more interesting are the particulars of the ways that the relationships between TEs and their host evolve over time. Finally, given the huge variation in plant genes with respect to genome size and TE content, along with really interesting variation in deletion rates, I'm surprised that they were not included. I get that you have to draw a line somewhere, and this work builds on a bunch of other work in animals, but it seems like a missed opportunity.

      We chose to restrict the introduction to the rationale behind the MHH as it is the starting point and focus of the manuscript. Because the aspects of the complexity of TE-host relationships are only covered in a speculative way, we limited them to the discussion but it is true that introducing them at the very beginning gives a more comprehensive overview. The introduction now includes a few sentences about lineage-specific selective effect of TEs and TE-host evolution (lines 83-86): “On top of that, an alternative TE-host-oriented perspective is that the accumulation of TEs in particular depends on their type of activity and dynamics, as well as on the lineage-specific silencing mechanisms evolved by host genomes (Ågren and Wright 2011).”

      Page 4. "The MHH is highly popular..." Evidence for this? It is fine as is, but it could also be seen as a straw man argument. Perhaps make clear this is an opinion of the authors?

      That MHH is popular and well-known is more a fact than an opinion: the original paper by Lynch and Conery (2003) and “The origins of genome architecture” by Lynch (2007) have respectively 1872 and 1901 citations to the present date (04/03/2025). Besides, the MHH is often invoked in highly cited reviews about TEs, e.g. Bourque et al., 2018 Genome Biol 19:1-12; Wells and Feschotte, 2020 Annu Rev Genet 54: 539-561.

      Page 4. "on phylogenetically very diverse datasets..." Given the fact that even closely related plants can show huge variation in genome size, it's a shame that they weren't included here. There are also numerous examples of closely related plants that are obligate selfers and out-crossers.

      This is true, and some studies already tested MHH in specific plant groups (Ågren et al., 2014 BMC Genom 15: 1-9; Hu et al., 2011 Nat Genet 43: 476-481; Wright et al., 2008 Int J Plant Sci 169: 105-118), including selfers vs out-crossers cases (Glémin et al., 2019 Evolutionary genomics: statistical and computational methods: 331-369). Further development in this kingdom would be interesting. However, the boundary was set to metazoans since the very beginning of analyses to maintain a large phylogenetic span and a manageable computational burden. Furthermore, some of the included animal clades are supposed to display good Ne contrasts according to known LHTs or to previous literature: for instance, the very different Ne of mammals and insects, as well as more narrowed examples like Drosophilidae and solitary vs eusocial hymenopterans.

      Page 6. "species-poor, deep-branching taxa were excluded" I see why this was done, as these taxa would not provide close as well as distant comparisons, but I would have thought they might have provided some interesting outlying data. As the geneticists say, value the exceptions.

      The reason to exclude them was not only that they would solely provide very distant comparisons. The lack of a rich and balanced sampling would imply calculating nucleotide substitution rates over hundreds of millions of years, which typically lead to saturation of synonymous sites. In case of saturation of synonymous sites, the synonymous divergence will be underestimated, and therefore, the dN/dS ratio no longer a valuable estimate of N<sub>e</sub>. Outside vertebrates and insects, the available genomes in a clade would mostly correspond to a few species from an entire phylum, making it challenging to estimate dN/dS and to correlate present day genome size with Ne estimated over hundreds of millions of years.

      Figure 1. What are the scaling units for each of these values? I get that dN/dS is between 0 and 1, but what about genome sizes? Are these relative sizes? Are TE content values a percent of the total? This may be mentioned elsewhere, but I think it is worth putting that information here as well.

      Thanks for pointing this out. Both genome sizes and TE contents are in bp, we added this information in the legend of the figure.

      Page 8. TE content estimates are invariably wrong given the diversity of TEs and, in many genomes, the presence of large numbers of low copy number "dead" elements. If that varies between taxa, this could cause problems. Given that, I would have liked to see the protocols used here be compared to a set of "gold standard" genomes with exceptionally well-annotated TEs (Humans and D. melanogaster, for instance).

      As already mentioned, dnaPipeTE is indeed biased towards young TEs (elements older than 25-30% are generally not detected). TE content can therefore be underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. Although most of them do not have “gold-standard” genomes, a comparison of dnaPipeTE with TE annotations from assemblies is already provided for a subset of species. Some variation can be present - see Supplemental Figure S6 and comments of Reviewer#3 about detection of satellite sequences. However, the subset covers a good range of genome sizes and overall dnaPipeTE emerges as an appropriate tool to characterize the general patterns of repeat content variation.

      Page 11. "close to 1 accounts for more..." I would say "closer" rather than "close".

      Agreed and changed.

      Page 11. "We therefore employed this parameter..." I know you made the point earlier, but maybe reiterate the general point here that selection is lower on average with a lower effective population size. Actually, I'm wondering if we don't need a different term for long-term net effective population size, which dN/dS is measuring.

      We reiterated here the relationship among dN/dS, Ne and magnitude of selection (lines 200-204): “a dN/dS closer to 1 accounts for more frequent accumulation of mildly deleterious mutations over time due to increased genetic drift, while a dN/dS close to zero is associated with a stronger effect of purifying selection. We therefore employed this parameter as a genomic indicator of N<sub>e</sub>, as the two are expected to scale negatively between each other.”

      Page 11. "We estimated dN/dS with a mapping method..." I very much appreciate that the authors are using the same pipeline for the analysis of all of these taxa, but I would also be interested in how these dN/dS values compare with previously obtained values for a subset of intensively studied taxa.

      The original publication of the method demonstrated that dN/dS estimations using mapping are highly similar to those obtained with maximum likelihood methods, such as implemented in CODEML (Romiguier et al., 2014 J Evol Biol 27: 593-603). Below is the comparison for 16 vertebrate species from Figuet et al. (2016 Mol Biol Evol 33: 1517-1527), where dN/dS are reasonably correlated (slope = 0.57, adjusted-R<sup>2</sup> = 0.39, p-value=0.006). That being said, some noise can be present as the compared genes and the phylogeny used are different. Although we expect some value between 0 and 1, some range of variation is to be expected depending on both the species used and the markers, as substitution rates and/or selection strength might be different. Differences in dN/dS for the same species would not necessarily imply an issue with one of the methods.

      Author response image 1.

      Page 12. " As expected, Bio++ dN/dS scales positively with..." Should this be explicitly referenced earlier? I do see that references mentioning both body mass and longevity are included earlier, but the terms themselves are not.

      We added a list of the expected correlations for dN/dS and LHTs at the beginning of the paragraph (lines 205-208): “In general, dN/dS is expected to scale positively with body length, age at first birth, maximum longevity, age at sexual maturity and mass, and to scale negatively with metabolic rate, population density and depth range.”

      Page 12. "dN/dS estimation on the trimmed phylogeny deprived of short and long branches results in a stronger correlation with LHTs, suggesting that short branches..." and what about the long branches? Trimming them helps because LHTs change over long periods of time?

      Trimming of long branches should avoid saturation in the signal of synonymous substitutions if present (whereby increase in dN is not parallelled by corresponding increase in dS due to depletion of all sites). Excluding very long branches was one of the reasons why we excluded taxonomic groups with few species. See lines 131-133: “For reliable estimation of substitution rates, this dataset was further downsized to 807 representative genomes as species-poor, deep-branching taxa were excluded”. Correlating present-day genome size with Ne estimates over long periods of time could weaken a potential correlation. However, exploratory analyses (not included) did not indicate that excluding long branches improved the relationship between Ne and genome size/TE content. The rationale is explained in Materials and Methods but was wrongly formulated. We rephrased it and added a reference (lines 636-638): “Estimation of dN/dS on either very long or short terminal branches might lead to loss of accuracy due to branch saturation (Weber et al. 2014) or to a higher variance of substitution rates, respectively”.

      Table 2. "Expected significant correlations are marked in bold black; significant correlations opposite to the expected trend are marked in bold red." Expected based on the initial hypothesis? Perhaps frame it as a test of the hypothesis?

      As per the comment above, we added a sentence in the main text to clarify the expected correlations for dN/dS and LHTs (lines 205-208): “In general, dN/dS is expected to scale positively with body length, age at first birth, maximum longevity, age at sexual maturity and mass, and to scale negatively with metabolic rate, population density and depth range.”. The second expected correlation is that between dN/dS and genome size/TE content, which is stated at the beginning of paragraph 2.5 (lines 244-245): “If increased genetic drift leads to TE expansions, a positive relationship between dN/dS and TE content, and more broadly with genome size, should be observed.”.

      Page 14. "Based on the available traits, the two kinds of Ne proxies analyzed here correspond in general..." the two kinds being dN/dS and a selection of LHT?

      We rephrased the sentence as such (lines 233-234): “Based on the available traits, the estimations of dN/dS ratios obtained using two different methods correspond in general to each other”.

      Table 3. Did you explain why there is a distinction between GC3-poor and GC3-rich gene sets?

      No, the explanation is missing, thank you for pointing it out. The choice comes from the observations made by Mérel et al. (2024 biorXiv: 2024-01), who do find a stronger relationship between dN/dS and genome size in Drosophila using the same tool (Coevol) in GC3-poor genes than in GC3-rich ones or in random sets of genes exhibiting heterogeneity in GC3 content. There are several possible explanations for this. First, mixing genes with various base compositions in the same concatenate can alter the calculation of codon frequency and impair the accuracy of the model estimating substitution rates.

      Moreover, base composition and evolutionary rates may not be two independent molecular traits, at the very least in Drosophila, and more generally in species experiencing selection on codon bias. Because optimal codons are enriched in G/C bases at the third position (Duret and Mouchiroud, 1999 PNAS 96: 4482-4487), GC3-rich genes are likely to be more expressed and therefore evolve under stronger purifying selection than GC3-poor genes in Drosophila.

      Accordingly, Merel and colleagues observed significantly higher dN/dS estimates for GC3-poor genes than for GC3-rich genes. Additionally, selection on codon usage acting on these highly expressed genes, that are GC3-rich, violates the assumed neutrality of dS. This implies that dN/dS estimates based on genes under selection on codon bias are likely less appropriate proxies of Ne than expected.

      Although some of these observations may be specific to Drosophila, this criterion was taken into consideration as taking restricted gene subsets was required for Coevol runs. We added this explanation in materials and methods (lines 723-738).

      Page 16. "Coevol dN/dS scales negatively with genome size across the whole dataset (Slope = -0.287, adjusted-R<sup>2</sup> = 0.004, p-value = 0.039) and within insects" Should I assume that none of the other groups scale negatively on their own, but cumulatively, all of them do?

      Yes, and this is an “insect-effect”: the regression of the whole dataset is negative but it is not anymore when insects are removed (with the model still being far from significant).

      Page 16. "Overall, we find no evidence for a recursive association of dN/dS with genome size and TE content across the analysed animal taxa as an effect of long-term Ne variation." I get the point, but this is starting to feel a bit circular. What you see is a lack of an association between dN/dS and TE content, but what do you mean by "as an effect of..." here? You are using dN/dS as a proxy, so the wording here feels odd.

      See the reply below.

      Page 17. I'm not sure that "effect" here is the word to use. You are looking at associations, not cause-effect relationships. Certainly, dN/dS is not causing anything; it is an effect of variation in purifying selection.

      Agreed, dN/dS is the ratio reflecting the level of purifying selection, not the cause itself. dN/dS is employed here as the independent variable in the correlation with genome size or TE content. dN/dS has an “effect” on the dependent variables in the sense that it can predict their variation, not in the sense that it is causing genome size to vary. We rephrased this and similar sentences to avoid misunderstandings (changes are highlighted in the revised text).

      Page 17. "Instead, mammalian TE content correlates positively with metabolic rate and population density, and negatively with body length, mass, sexual maturity, age at first birth and longevity." I guess I'm getting tripped up by measures of current LHTs and historical LHTs which, I'm assuming, varies considerably over the long periods of time that impact TE content evolution.

      PIC analyses can be considered as correlations on current LHTs as we compare values (or better, contrasts) at the tips of phylogenies. In the case of Coevol, traits are inferred at internal nodes, in such a way that the model should take into account the historical variation of LHTs, too.

      Page 18. "positive effect of dN/dS on recent TE insertions..." Again, this is not a measure of the effect of dN/dS on TE insertions, it is a measure of correlation. I know it's shorthand, but in this case, I think it really matters that we avoid making cause inferences.

      We have rephrased this as ”...very weak positive correlation of dN/dS with recent TE insertions…”.

      Page 18. "are consistent with the scenarios depicted by genome size and overall TE content in the corresponding clades." Maybe be more explicit here at the very end of the results about what those scenarios are.

      Correlating the recent TE content with dN/dS and LHTs basically recapitulates the relationship found using the other genomic traits (genome size and overall TE content). We have rephrased the closing sentence as “Therefore, the coevolution patterns between population size and recent TE content are consistent with the pictures emerging from the comparison of population size proxies with genome size and overall TE content in the corresponding clades” (lines 312-315).

      Page 19. "However, the difficulty in assembling repetitive regions..." I would say the same is true of TE content, which is almost always underestimated for the same reasons.

      “Repetitive regions” is here intended as an umbrella term including all kinds of repeats, from simple ones to transposable elements.

      Page 20. "repeat content has a lower capacity to explain size compared to other clades." Perhaps, but I'm not convinced this is not due to large numbers of low copy number elements, perhaps purged at varying rates. Are we certain that dnaPipeTE would detect these? Have rates of deletion in the various taxa examined been estimated?

      It is possible that low copy number elements are detected differently, according to the rate of decay in different species and depending also on the annotation method (indeed low copy families are less likely to be captured during read sampling by dnaPipeTE). A negative correlation between assembly size and deletion rate was observed in birds (Ji et al., 2023 Sci Adv 8: eabo0099). So we should expect a rate of TE removal inversely proportional to genome size, a positive correlation between TE content and genome size, and negative relationship between TE content and deletion rate, too. The relationship of TE content with deletion rate and genome size however appears more complex than this, even this paper using assembly-based TE annotations. However, misestimations of repeat content are also potentially due to the limited capacity of dnaPipeTE of detecting simple and low complexity repeats (see comments from Reviewer#3), which might be important genomic components in birds (see a few comments below).

      Page 21. "DNA gain, and their evolutionary dynamics appear of prime importance in driving genome size variation." How about DNA loss over time?

      See response to the comment below.

      Page 22. "in the latter case, the pace of sequence erosion could be in the long run independent of drift and lead to different trends of TE retention and degradation in different lineages." Ah, I see my earlier question is addressed here. How about deletion as a driver as well?

      Deletion was not investigated here. However, deletion processes are surely very different across animals and their impact merits to be studied as well within a comparative framework. Small scale deletion events have even been proposed to contrast the increase in genome size by TE expansion (Petrov et al., 2002 Theor Popul Biol 61: 531-544). In fact, their magnitude would not be high enough to effectively contrast processes of genome expansion in most organisms (Gregory, 2004 Gene 324: 15-34). However, larger-scale deletions might play an important role in genome size determinism by counterbalancing DNA gain (Kapusta et al., 2017 PNAS 114: E1460-E1469; Ji et al., 2023 Sci Adv 8: eabo0099). For sake of space we do not delve in detail into this issue, but we do provide some perspectives about the role of deletion (see lines 518-521 and 535-541).

      Page 22. "however not surprising given the higher variation of TE load compared to the restricted genome size range." I admit, I'm struggling with this. If it isn't genes, and it isn't satellites, and it isn't TEs, what is it?

      Most birds having ~1Gb genomes and displaying very low TE contents. Other studies annotated TEs in avian genome assemblies and also found a not so strong correlation between amount of TEs and genome size (Ji et al., 2023 Sci Adv 8: eabo0099, Kapusta and Suh, 2016 Ann N Y Acad Sci 1389: 164-185). It is possible that the TE diversity is underappreciated in birds due to the limits of sequencing technologies used so far in resolving complex repeat-rich regions. For instance, employment of long-reads technologies allowed to reveal more extended repeated regions that were previously ignored with short read assemblies (Kapusta and Suh, 2016 Ann N Y Acad Sci 1389: 164-185). Besides, quite large fractions might indeed be satellite sequences constituting relevant fractions of the genome (Edwards et al., 2025 biorXiv: 2025-02). We added this perspective in the discussion (lines 446-455): “As previous studies find relatively weak correlations between TE content and genome size in birds (Ji et al. 2022; Kapusta and Suh 2017), it is possible for the very narrow variation of the avian genome sizes to impair the detection of consistent signals. On the other hand, it is conceivable the avian TE diversity to be underappreciated due to the limits of sequencing technologies used so far in resolving complex repeat-rich regions. For instance, employment of long-reads technologies allowed to reveal more extended repeated regions that were previously ignored with short read assemblies (Kapusta and Suh 2017; Benham et al. 2024). Besides, quite large fractions might indeed be satellite sequences constituting relevant fractions of the genome that are challenging to identify with reference- or read-based methods (Edwards et al. 2025).” See also responses to Reviewer#3’s concerns about dnaPipeTE.

      Page 24. "Our findings do not support the quantity of non-coding DNA being driven in..." Many TEs carry genes and are "coding".

      Yes. Non-coding DNA intended as the non-coding portion of genomes not directly involved in organisms’ functions and fitness (in other words sequences not undergoing purifying selection). TEs do have coding parts but are in most part molecular parasites hijacking hosts’ machinery.

      Page 25. "There is some evidence of selection acting against TEs proliferation." Given that the vast majority of TEs are recognized and epigenetically silenced in most genomes, I'd say the evidence is overwhelming. Here I suspect you mean evidence for success in preventing proliferation. Actually, since we know that systems of TE silencing have a cost, it might be worth considering how the costs and benefits of these systems may have influenced overall TE content.

      We meant selection against TE proliferation in the making, notably visible at the level of genome-wide signatures for relaxed/effective selection. We rephrased it as “Evidence for signatures of negative selection against TE proliferation exist at various degrees.” (line 543).

      Reviewer #3 (Recommendations for the authors):

      Page 14: Please define GC3-rich and GC3-poor gene sets and how they were established, as well as why the analyses were conducted separately on GC3-rich and GC3-poor genes.

      We added a detailed explanation for the choice of GC3-rich and GC3-poor genes (see modified section Methods - Phylogenetic independent contrasts and Coevol reconstruction, lines 723-738).

      “Genes were selected according to their GC content at the third codon position (GC3). Indeed, mixing genes with heterogeneous base composition in the same concatenate might result in an alteration of the calculation of codon frequencies, and consequently impair the accuracy of the model estimating substitution rates (Mérel et al. 2024). Moreover, genes with different GC3 levels can reflect different selective pressures, as highly expressed genes should be enriched in optimal codons as a consequence of selection on codon usage. In Drosophila, where codon usage bias is at play, most optimal codons present G/C bases at the third position (Duret and Mouchiroud, 1999), meaning that genes with high GC3 content should evolve under stronger purifying selection than GC3-poor genes. Accordingly, Mérel et al. (2024) do find a stronger relationship between dN/dS and genome size when using GC3-poor genes, as compared to GC3-rich genes or gene concatenates of random GC3 composition. Finally, dN/dS can be influenced by GC-biased gene conversion (Bolívar et al. 2019; Ratnakumar et al. 2010), and the strength at which such substitution bias acts can be reflected by base composition. For these reasons, two sets of 50 genes with similar GC3 content were defined in order to employ genes undergoing similar evolutionary regimes.”

      Please add lines between columns and rows in tables. Table 3 is especially difficult to follow due to its size, and lines separating columns and rows would vastly help with readability.

      We added lines delimiting cells in all the main tables.

      Throughout the text and figures, please be consistent with either scientific names or common names for lineages or clades.

      Out of the five groups, for four of them the common name is the same as the scientific one (except Aves/birds).

      Regarding the title for section 3.1, I don't believe "underrate" is the best word here. I find this title confusing.

      We replaced the term “underrate” with “underestimate” in the title.

      The authors report that read type (short vs. long) does not have a significant effect on assembly size relative to C-value. However, the authors (albeit admittedly in the discussion) removed lower-quality assemblies using a minimum N50 cutoff. Thus, this lack of read-type effect could be quite misleading. I strongly recommend the authors either remove this analysis entirely from the manuscript or report results both with and without their minimum N50 cutoff. I expect that read type should have a strong effect on assembly size relative to C-value, especially in mammals where TEs and satellites comprise ~50% of the genome.

      Yes, it's likely that if we took any short-read assembly, we would have a short-read effect. We do not mean to suggest that in general short reads produce the same assembly quality as long reads, but that in this dataset we do not need to account for the read effect in the model to predict C-values. Adding the same test including all assemblies will be very time-consuming because C-values should be manually checked as already done for the species. If we removed this test, readers might wonder whether our genome size predictions are not distorted by a short-read effect. We now make it clear that this quality filter likely has an outcome on our observations: “This suggests that the assemblies selected for our dataset can mostly provide a reliable measurement of genome size, and thus a quasi-exhaustive view of the genome architecture.” (lines 333-335).

      There seem to be some confusing inconsistencies between Supplementary Table S2 and Supplementary Figure S2. In Supplementary Table S2, the authors report ~24% of the Drosophila pectinifera genome as unknown repeats. This is not consistent with the stacked bar plot for D. pectinifera in Supplementary Figure S2.

      True, the figure is wrong, thank you for spotting the error. The plot of Supplemental Figure S2 was remade with the correct repeat proportions as in Supplementary Tables S2 and S4. Because the reference genome sizes on which TE proportions are calculated are different for the two methods, we added another supplemental figure showing the same comparison in Kbp (now Supplemental Figure S3).

      At the bottom of page 20: "many species with a high duplication score in our dataset correspond to documented duplication" How many?

      Salmoniformes (9), Acipenseriformes (1), Cypriniformes (3) out of 23 species with high duplication score. It’s detailed in the results (lines 193-196): “Of the 24 species with more than 30% of duplicated BUSCO genes, 13 include sturgeon, salmonids and cyprinids, known to have undergone whole genome duplication (Du et al. 2020; Li and Guo 2020; Lien et al. 2016), and five are dipteran species, where gene duplications are common (Ruzzante et al. 2019).”

      Top of page 21: "However, the contribution of duplicated genes to genome size is minimal compared to the one of TEs, and removing species with high duplication scores does not affect our results: this implies that duplication does not impact genome size strongly enough to explain the lack of correlation with dN/dS." This sentence is confusing and needs rewording.

      We reworded the sentence (lines 383-384): “this implies that duplication is unlikely to be the factor causing the relationship between genome size and dN/dS to deviate from the pattern expected from the MHH”.

      Beginning of section 3.3: "Our dN/dS calculation included several filtering steps by branch length and topology: indeed, selecting markers by such criteria appears to be an essential step to reconcile estimations with different methodologies" A personal communication is cited here. Are there really no peer-reviewed sources supporting this claim?

      This mainly comes from a comparison of dN/dS calculation with different methods (notably ML method of bpp vs Coevol bayesian framework) on a set of Zoonomia species. We observed that estimations with different methods appeared correlated but with some noise: filtering out genes with deviant topologies (by a combination of PhylteR and of an unpublished Bayesian shrinkage model) reconciled even more the estimations obtained from different methods. Results are not shown here but the description of an analogous procedure is presented in Bastian, M. (2024). Génomique des populations intégrative: de la phylogénie à la génétique des populations (Doctoral dissertation, Université lyon 1) that we added to the references.

      Figure 2 needs to be cropped to remove the vertical gray line on the right of the figure as well as the portion of visible (partly cropped) text at the top. What is the "Tree scale" in Figure 1?

      Quality of figure 2 in the main text was adjusted. The tree scale is in amino acid substitutions, we added it in the legend of the figure.

      It is also unclear whether the authors used TE content or overall repeat content for their analyses.

      The overall repeat content includes both TEs and other kinds of repeats (simple repeats, low complexity repeats, satellites). The contribution of such other repeats to the total content is generally quite low for most species compared to that of TEs (only 13 genomes in all dataset have more than 3% of “Other” repeats). Conversely, the “other” repeats were not included in the recent content since the divergence of a copy from its consensus sequence is pertinent only for TEs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1.

      Comments on revisions:

      I am satisfied with the revisions made by the authors, which helped clarify some points that were confusing in the initial submission.

      Thank you.

      Reviewer #2 (Public Review):

      This revised manuscript mostly addresses previous concerns by doubling down on the model without providing additional direct evidence of interactions between Srs2 and PCNA, and that "precise sites of Srs2 actions in the genome remain to be determined." One additional Srs2 allele has been examined, showing some effect in combination with rfa1-zm2. Many of the conclusions are based on reasonable assumptions about the consequences of various mutations, but direct evidence of changes in Srs2 association with PNCA or other interactors is still missing. There is an assumption that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects, which may not be the case. How SLX4 might interact with Srs2 is unclear to me, again assuming that the SLX4 defect is "surgical" - removing only one of its many interactions.

      Previous studies have already provided direct evidence for the interaction between Srs2 and PCNA through the Srs2’s PIM region (Armstrong et al, 2012; Papouli et al, 2005); we have added these citations in the text. Similarly. Srs2 associations with SUMO and Rad51 have also been demonstrated (Colavito et al, 2009; Kolesar et al, 2016; Kolesar et al., 2012), and these studies were cited in the text.

      We did not state that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects. We only assessed whether these previously characterized mutant alleles could mimic srs2∆ in rescuing rfa1-zm2 defects.

      We assessed the genetic interaction between slx4-RIM and srs2-∆PIM mutants, and not the physical interaction between the two proteins. As we described in the text, our rationale for this genetic test is based on that the reports that both slx4 and srs2 mutants impair recovery from the Mec1 induced checkpoint, thus they may affect parallel pathways of checkpoint dampening.

      One point of concern is the use of t-tests without some sort of correction for multiple comparisons - in several figures. I'm quite sceptical about some of the p < 0.05 calls surviving a Bonferroni correction. Also in 4B, which comparison is **? Also, admittedly by eye, the changes in "active" Rad53 seem much greater than 5x. (also in Fig. 3, normalizing to a non-WT sample seems odd).

      Claims made in this work were based only on pairwise comparison not multi-comparison. We have now made this point clearer in the graphs and in Method. As the values were compared between a wild-type strain and a specific mutant strain, or between two mutants, we believe that t-test is suitable for statistical analysis.

      Figure 4B, ** indicates that the WT value is significantly different from that of the slx4-RIM srs2-∆PIM double mutant and from that of srs2-∆PIM single mutant. We have modified the graph to indicate the pair-wide comparison. The 5-fold change of active Rad53 levels was derived by comparing the values between the srs2∆ PIM slx4<sup>RIM</sup>-TAP double mutant and wild-type Slx4-TAP. In Figure 3, normalization to the lowest value affords better visualization. This is rather a stylish issue; we would like to maintain it as the other reviewers had no issues.

      What is the WT doubling time for this strain? From the FACS it seems as if in 2 h the cells have completed more than 1 complete cell cycle. Also in 5D. Seems fast...

      Wild-type W303 strain has less than 90 min doubling time as shown by many labs, and our data are consistent with this. The FACS profiles for wild-type cells shown in Figures 3C, 4C, and 5C are consistent with each other, showing that after G1 cells entered the cell cycle, they were in G2 phase at the 1-hour time points, and then a percentage of the cells exited the first cell cycle by two hours.

      I have one over-arching confusion. Srs2 was shown initially to remove Rad51 from ssDNA and the suppression of some of srs2's defects by deleting rad51 made a nice, compact story, though exactly how srs2's "suppression of rad6" fit in isn't so clear (since Rad6 ties into Rad18 and into PCNA ubiquitylation and into PCNA SUMOylation). Now Srs2 is invoked to remove RPA. It seems to me that any model needs to explain how Srs2 can be doing both. I assume that if RPA and Rad51 are both removed from the same ssDNA, the ssDNA will be "trashed" as suggested by Symington's RPA depletion experiments. So building a model that accounts for selective Srs2 action at only some ssDNA regions might be enhanced by also explaining how Rad51 fits into this scheme.

      While the anti-recombinase function of Srs2 was better studied, its “anti-RPA” role in checkpoint dampening was recently described by us (Dhingra et al, 2021) following the initial report by the Haber group some time ago (Vaze et al, 2002). A better understanding of this new role is required before we can generate a comprehensive picture of how Srs2 integrates the two functions (and possibly other functions). Our current work addresses this issue by providing a more detailed understanding of this new role of Srs2.

      Single molecular data showed that Srs2 strips both RPA and Rad51 from ssDNA, but this effect is highly dynamic (i.e. RPA and Rad51 can rebind ssDNA after being displaced) (De Tullio et al, 2017). As such, generation of “deserted” ssDNA regions lacking RPA and Rad51 in cells can be an unlikely event. Rather, Srs2 can foster RPA and Rad51 dynamics on ssDNA. Additional studies will be needed to generate a model that integrates the anti-recombinase and the anti-RPA roles of Srs2.

      As a previous reviewer has pointed out, CPT creates multiple forms of damage. Foiani showed that 4NQO would activate the Mec1/Rad53 checkpoint in G1- arrested cells, presumably because there would be singlestrand gaps but no DSBs. Whether this would be a way to look specifically at one type of damage is worth considering; but UV might be a simpler way to look. As also noted, the effects on the checkpoint and on viability are quite modest. Because it isn't clear (at least to me) why rfa1 mutants are so sensitive to CPT, it's hard for me to understand how srs2-zm2 has a modest suppressive effect: is it by changing the checkpoint response or facilitating repair or both? Or how srs2-3KR or srs2-dPIM differ from rfa1-zm2 in this respect. The authors seem to lump all these small suppressions under the rubric of "proper levels of RPA-ssDNA" but there are no assays that directly get at this. This is the biggest limitation.

      CPT treatment is an ideal condition to examine how cells dampen the DNA damage checkpoint, because while most genotoxic conditions (e.g. 4NQO, MMS) induce both the DNA replication checkpoint and the DNA damage checkpoint, CPT was shown to only induced the latter (Menin et al, 2018; Minca & Kowalski, 2011; Redon et al, 2003; Tercero et al, 2003). Future studies examining 4NQO and UV conditions can further expand our understanding of checkpoint dampening in different conditions.

      We have previously provided evidence to support the conclusion that srs2 suppression of rfa1-zm is partly mediated by changing checkpoint levels (Dhingra et al., 2021). We cannot exclude the possibility that the suppression may also be related to changes of DNA repair; we have now added this note in the text.

      Regarding direct testing RPA levels on DNA, we have previously shown that srs2∆ increased the levels of chromatin associated Rfa1 and this is suppressed by rfa1-zm2 (Dhingra et al., 2021). We have now included chromatin fractionation data to show that srs2-∆PIM also led to an increase of Rfa1 on chromatin, and this was suppressed by rfa1-zm2 (new Fig. S2).

      Srs2 has also been implicated as a helicase in dissolving "toxic joint molecules" (Elango et al. 2017). Whether this activity is changed by any of the mutants (or by mutations in Rfa1) is unclear. In their paper, Elango writes: "Rare survivors in the absence of Srs2 rely on structure-specific endonucleases, Mus81 and Yen1, that resolve toxic joint-molecules" Given the involvement of SLX4, perhaps the authors should examine the roles of structure-specific nucleases in CPT survival?

      Srs2 has several roles, and its role in RPA antagonism can be genetically separated from its role in Rad51 regulation as we have shown in our previous work (Dhingra et al., 2021) and this notion is further supported by evidence presented in the current work. Srs2’s role in dissolving "toxic joint molecules” was mainly observed during BIR (Elango et al, 2017). Whether it is related to checkpoint dampening will be interesting to address in the future but is beyond of the scope of the current work that seeks to answer the question how Srs2 regulates RPA during checkpoint dampening. Similarly, determining the roles of Mus81 and Yen1 and other structural nucleases in CPT survival is a worthwhile task but it is a research topic well separated from the focus of this work.

      Experiments that might clarify some of these ambiguities are proposed to be done in the future. For now, we have a number of very interesting interactions that may be understood in terms of a model that supposes discriminating among gaps and ssDNA extensions by the presence of PCNA, perhaps modified by SUMO. As noted above, it would be useful to think about the relation to Rad6.

      Several studies have shown that Srs2’s functional interaction with Rad6 is based on Srs2-mediated recombination regulation (reviewed by (Niu & Klein, 2017). Given that recombinational regulation by Srs2 is genetically separable from the Srs2 and RPA antagonism (Dhingra et al., 2021), we do not see a strong rationale to examine Rad6 in this work, which addresses how Srs2 regulates RPA. With this said, this study has provided basis for future studies of possible cross-talks among different Srs2-mediated pathways.

      Reviewer #3 (Public Review):

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2.

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. Double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of this mutants in a RFA1 wild type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. This data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction, did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths:

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights.

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data.

      Weaknesses:

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2-Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613).

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results.

      Comments on revisions:

      In this revision, the authors adequately addressed my concerns. The only issue I see remaining is the site of Srs2 action. The authors argue in favor of gaps and against R-loops and ssDNA resulting from excessive supercoiling. The authors do not discuss ssDNA resulting from processing of onesided DSBs, which are expected to result from replication run-off after CPT damage but are not expected to provide the 3'-junction for preferred PCNA loading. Can the authors exclude PCNA at the 5'-junction at a resected DSB?

      We have now added a sentence stating that we cannot exclude the possibility that PCNA may be positioned at a 5’-junction, as this can be observed in vitro, albert that PCNA loading was seen exclusively at a 3’-junction in the presence of RPA (Ellison & Stillman, 2003; Majka et al, 2006).

      Recommendations For the authors:

      Reviewer #2 (Recommendations For the authors):

      A Bonferroni correction should be made for the multiple comparisons in several figures.

      Specific comments:

      l. 41. This is a too long and confusing sentence.

      Sentence shortened: “These data suggest that Srs2 recruitment to PCNA proximal ssDNA-RPA filaments followed by its sumoylation can promote checkpoint recovery, whereas Srs2 action is minimized at regions with no proximal PCNA to permit RPA-mediated ssDNA protection”.

      l. 60. Identify Ddc2 and Mec1 as ATRIP and ATR.

      Done.

      l. 125 "fails to downregulate RPA levels on chromatin and Mec1-mediated DDC..." fails to downregulate RPA and fails to reduce Mec1-mediated DDC?

      Sentence modified: “fails to downregulate both the RPA levels on chromatin and the Mec1-mediated DDC”

      l. 204 "consistent with the notion that Srs2 has roles beyond RPA regulation"... What other roles? It's stripping of Rad51? Removing toxic joint molecules? Something else?

      Sentence modified: “consistent with the notion that Srs2 has roles beyond RPA regulation, such as in Rad51 regulation and removing DNA joint molecules”.

      l. 249 "Significantly, srs2-ΔPIM and -3KR increased the percentage of rfa1-zm2 cells transitioning into the G1 phase" No. Just back to normal. As stated in l. 258: "258 We found that srs2-ΔPIM and srs2-3KR mutants on their own behaved normally in the two DDC assays described above." All of these effects are quite small.

      Sentence modified: “Compared with rfa1-zm2 cells, srs2-∆PIM rfa1-zm2 and srs2-3KR rfa1-zm2 cells showed increased percentages of cells transitioning into the G1 phase”.

      l. 468 "Our previous work has provided several lines of evidence to support that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). What evidence? See my comment above about not having both proteins removed at the same time.

      We have addressed this point in our initial rebuttal and some key points are summarized below. In our previous report (Dhingra et al., 2021), we provided several lines of evidence to support the conclusion that Rad51 is not relevant to the Srs2-RPA antagonism. For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, rad51∆ did not affect the hyper-checkpoint phenotype of srs2∆. In contrast, rfa1-zm1/zm2 have the opposite effects, that is, rfa1zm1/zm2 suppressed the hyper-checkpoint, but not the hyper-recombination, phenotype of srs2∆ cells. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the ATPase dead allele of Srs2 (srs2K41A). For example, rfa1-zm2 rescued hyper-checkpoint and CPT sensitivity of srs2-K41A cells, while rad51∆ had neither effect. These and other data described by Dhingra et al (2021) suggest that Srs2’s effects on checkpoint vs. recombination can be separated genetically. Consistent with our conclusion summarized above, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Fig. 2D). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2RPA antagonism.

      l. 525 "possibility, we tested the separation pin of Srs2 (Y775), which was shown to enables its in vitro helicase activity during the revision of our work..." ?? there was helicase activity during the revision of your work? Please fix the sentence.

      Sentence modified: “we tested the separation pin of Srs2 (Y775). This residue was shown to be key for the Srs2’s helicase activity in vitro in a report that was published during the revision of our work (Meir et al, 2023).”

      Fig. 3. "srs2-ΔPIM and -3KR allow better G1 entry of rfa1-zm2 cells." is it better entry or less arrest at G2/M? One implies better turning off of a checkpoint, the other suggests less activation of the checkpoint.

      This is a correct statement. For all strains examined in Figure 3, cells were seen in G2/M phase after 1-hour CPT treatment, suggesting proper arrest.

      References:

      Armstrong AA, Mohideen F, Lima CD (2012) Recognition of SUMO-modified PCNA requires tandem receptor motifs in Srs2. Nature 483: 59-63

      Colavito S, Macris-Kiss M, Seong C, Gleeson O, Greene EC, Klein HL, Krejci L, Sung P (2009) Functional significance of the Rad51-Srs2 complex in Rad51 presynaptic filament disruption. Nucleic Acids Res 37: 6754-6764.

      De Tullio L, Kaniecki K, Kwon Y, Crickard JB, Sung P, Greene EC (2017) Yeast Srs2 helicase promotes redistribution of single-stranded DNA-bound RPA and Rad52 in homologous recombination regulation. Cell Rep 21: 570-577

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E, Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin. Proc Natl Acad Sci U S A 118: e2020185118

      Elango R, Sheng Z, Jackson J, DeCata J, Ibrahim Y, Pham NT, Liang DH, Sakofsky CJ, Vindigni A, Lobachev KS et al (2017) Break-induced replication promotes formation of lethal joint molecules dissolved by Srs2. Nat Commun 8: 1790

      Ellison V, Stillman B (2003) Biochemical characterization of DNA damage checkpoint complexes: clamp loader and clamp complexes with specificity for 5' recessed DNA. PLoS Biol 1: E33

      Kolesar P, Altmannova V, Silva S, Lisby M, Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction. J Biol Chem 291: 7594-7607.

      Kolesar P, Sarangi P, Altmannova V, Zhao X, Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation. Nucleic Acids Res 40: 7831-7843.

      Majka J, Binz SK, Wold MS, Burgers PM (2006) Replication protein A directs loading of the DNA damage checkpoint clamp to 5'-DNA junctions. J Biol Chem 281: 27855-27861

      Meir A, Raina VB, Rivera CE, Marie L, Symington LS, Greene EC (2023) The separation pin distinguishes the pro- and anti-recombinogenic functions of Saccharomyces cerevisiae Srs2. Nat Commun 14: 8144

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP, Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after Topoisomerase poisoning. EMBO Rep 19: e45535

      Minca EC, Kowalski D (2011) Replication fork stalling by bulky DNA damage: localization at active origins and checkpoint modulation. Nucleic Acids Res 39: 2610-2623

      Niu H, Klein HL (2017) Multifunctional roles of Saccharomyces cerevisiae Srs2 protein in replication, recombination and repair. FEMS Yeast Res 17: fow111

      Papouli E, Chen S, Davies AA, Huttner D, Krejci L, Sung P, Ulrich HD (2005) Crosstalk between SUMO and ubiquitin on PCNA is mediated by recruitment of the helicase Srs2p. Mol Cell 19: 123-133

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF, Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage. EMBO Rep 4: 678-684

      Tercero JA, Longhese MP, Diffley JFX (2003) A central role for DNA replication forks in checkpoint activation and response. Mol Cell 11: 1323-1336

      Vaze MB, Pellicioli A, Lee SE, Ira G, Liberi G, Arbel-Eden A, Foiani M, Haber JE (2002) Recovery from checkpointmediated arrest after repair of a double-strand break requires Srs2 helicase. Mol Cell 10: 373-385

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I In this manuscript, Jiao D et al reported the induction of synthetic lethal by combined inhibition of anti-apoptotic BCL-2 family proteins and WSB2, a substrate receptor in CRL5 ubiquitin ligase complex. Mechanistically, WSB2 interacts with NOXA to promote its ubiquitylation and degradation. Cancer cells deficient in WSB2, as well as heart and liver tissues from Wsb2-/- mice exhibit high susceptibility to apoptosis induced by inhibitors of BCL-2 family proteins. The anti-apoptotic activity of WSB2 is partially dependent on NOXA.

      Overall, the finding, that WSB2 disruption triggers synthetic lethality to BCL-2 family protein inhibitors by destabilizing NOXA, is rather novel. The manuscript is largely hypothesis-driven, with experiments that are adequately designed and executed. However, there are quite a few issues for the authors to address, including those listed below.

      Specific comments:

      (1) At the beginning of the Results section, a clear statement is needed as to why the authors are interested in WSB2 and what brought them to analyze "the genetic co-dependency between WSB2 and other proteins".

      We thank the reviewer for raising this important point. We agree that a clear rationale should be provided at the beginning of the Results section. As reported in previous studies [Ref: 1, 2, 3], strong synthetic interactions have been observed between WSB2 and several mitochondrial apoptosis-related factors, including MCL-1, BCL-xL, and MARCH5. We have referenced these findings in the Discussion section. Motivated by these studies, we became interested in the role of WSB2 and aimed to investigate the specific mechanisms underlying its synthetic lethality with anti-apoptotic BCL-2 family members. We will revise the beginning of the Results section to clearly state this rationale.

      (1) McDonald, E.R., 3rd et al. Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell 170, 577-592 e510 (2017).

      (2) DeWeirdt, P.C. et al. Genetic screens in isogenic mammalian cell lines without single cell cloning. Nat Commun 11, 752 (2020).

      (3) DeWeirdt, P.C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat Biotechnol 39, 94-104 (2021).

      (2) In general, the biochemical evidence supporting the role of WSB2 as a SOCS box-containing substrate-binding receptor of CRL5 E3 in promoting NOXA ubiquitylation and degradation is relatively weak. First, since NOXA binds to WSB2 on its SOCS box, which consists of a BC box for Elongin B/C binding and a CUL5 box for CUL5 binding, it is crucial to determine whether the binding of NOXA on the SOCS box affects the formation of CRL5WSB2 complex. The authors should demonstrate the endogenous binding between NOXA and the CRL5WSB2 complex. Additionally, the authors may also consider manipulating CUL5, SAG, or ElonginB/C to assess if it would affect NOXA protein turnover in two independent cell lines.

      We thank the reviewer for raising this important point. To determine whether endogenous NOXA binds to the intact CRL5<sup>WSB2</sup> complex, we performed co-immunoprecipitation assays using an antibody against NOXA. Indeed, NOXA co-immunoprecipitated with all subunits of the CRL5<sup>WSB2</sup> complex (Figure 2—figure supplement 1D), suggesting that NOXA binding to WSB2 does not disrupt interactions between WSB2 and the other CRL5 subunits. Moreover, depletion of CRL5 complex components (RBX2/SAG, CUL5, ELOB, or ELOC) through siRNAs in C4-2B or Huh-7 cells also resulted in a marked increase in NOXA protein levels.

      Second, in all the experiments designed to detect NOXA ubiquitylation in cells, the authors utilized immunoprecipitation (IP) with FLAG-NOXA/NOXA, followed by immunoblotting (IB) with HA-Ub. However, it is possible that the observed poly-Ub bands could be partly attributed to the ubiquitylation of other NOXA binding proteins. Therefore, the authors need to consider performing IP with HA-Ub and subsequently IB with NOXA. Alternatively, they could use Ni-beads to pull down all His-Ub-tagged proteins under denaturing conditions, followed by the detection of FLAG-tagged NOXA using anti-FLAG Ab. The authors are encouraged to perform one of these suggested experiments to exclude the possibility of this concern. Furthermore, an in vitro ubiquitylation assay is crucial to conclusively demonstrate that the polyubiquitylation of NOXA is indeed mediated by the CRL5WSB2 complex.

      We appreciate the reviewer for raising these important considerations regarding our ubiquitylation assays. We fully acknowledge the reviewer's concern that classical ubiquitination assays could potentially detect ubiquitination of proteins interacting with NOXA. However, we would like to clarify that our experimental conditions effectively mitigate this issue. Specifically, cells were lysed using buffer containing 1% SDS followed by boiling at 105°C for 5 minutes. These rigorous denaturing conditions ensure disruption of non-covalent protein interactions, thereby effectively eliminating the possibility of detecting ubiquitination signals from NOXA-associated proteins.

      Regarding the suggestion to perform an in vitro ubiquitination assay, we agree this experiment would indeed provide additional evidence. However, due to significant technical complexities associated with reconstituting CRL5-based E3 ubiquitin ligase activity in vitro—which would require the expression and purification of at least six recombinant proteins—such experiments are rarely performed in this context. Furthermore, NOXA is uniquely localized as a membrane protein on the mitochondrial outer membrane, posing additional significant challenges for protein expression and purification. Given the robustness of our current in vivo ubiquitylation assay under stringent denaturing conditions, we believe our existing data sufficiently and conclusively demonstrate NOXA ubiquitination mediated by the CRL5<sup>WSB2</sup> complex.

      (3) In their attempt to map the binding regions between NOXA and WSB2, the authors utilized exogenous proteins of both WSB2 and NOXA. To strengthen their findings, it would be more convincing to perform IP with exogenous wt/mutant WSB2 or NOXA and subsequently perform IB to detect endogenous NOXA or WSB2, respectively. Additionally, an in vitro binding assay using purified proteins would provide further evidence of a direct binding between NOXA and WSB2.

      We thank the reviewer for raising these important issues. In response to the reviewer’s suggestion to map the binding regions between NOXA and WSB2 more convincingly, we have indeed performed semi-endogenous Co-IP assays, which yielded results consistent with our exogenous protein experiments (Figure 3—figure supplement 1A, B). Concerning the recommendation to further validate direct interaction using purified recombinant proteins, we encountered substantial technical difficulties in obtaining pure and soluble recombinant WSB2 protein. Additionally, given that NOXA is an outer mitochondrial membrane protein and the interaction occurs on mitochondria, we believe that an in vitro binding assay may have limited physiological relevance. We hope the reviewer can appreciate these practical challenges and our current evidence supporting the strong interaction between NOXA and WSB2.

      Reviewer #2 (Public Review):

      Summary:

      Exploring the DEP-MAP database and two drug-screen databases, the authors identify WSB2 as an interactor of several BCL2 proteins. In follow-up experiments, they show that CRL5/WSB2 controls NOXA protein levels via K48 ubiquitination following direct protein-protein interaction, and cell death sensitivity in the context of BH3 mimetic treatment, where WSB2 depletion synergizes with drug treatment.

      Strengths:

      The authors use a set of orthogonal methods across different model cell lines and a new WSB2 KO mouse model to confirm their findings. They also manage to correlate WSB2 expression with poor prognosis in prostate and liver cancer, supporting the idea that targeting WSB2 may sensitize cancers for treatment with BH3 mimetics.

      Weaknesses:

      The conclusions drawn based on the findings in cancer patients are very speculative, as regulation of NOXA cannot be the sole function of CRL5/WSB2 and it is hence unclear what causes correlation with patient survival. Moreover, the authors do not provide a clear mechanistic explanation of how exactly higher levels of NOXA promote apoptosis in the absence of WSB2. This would be important knowledge, as usually high NOXA levels correlate with high MCL1, as they are turned over together, but in situations like this, or loss of other E3 ligases, such as MARCH, the buffering capacity of MCL1 is outrun, allowing excess NOXA to kill (likely by neutralizing other BCL2 proteins it usually does not bind to, such as BCLX). Moreover, a necroptosis-inducing role of NOXA has been postulated. Neither of these options is interrogated here.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2J. The authors showed that "the mRNA levels of NOXA were even reduced in WSB2-KO cells compared to parental cells". What is the possible mechanism? This point should at least be discussed.

      We thank the reviewer for raising these important issues. The underlying mechanisms for the significantly lower mRNA levels of NOXA following the KO of WSB2 are not fully understood at present. However, we propose that this could represent a form of negative feedback regulation at the level of gene expression. Specifically, when the protein levels of BNIP3/3L rise sharply, it may activate mechanisms that suppress their own mRNA synthesis or stability, serving as a buffering system to prevent further protein accumulation. Such negative feedback loops may be critical for maintaining cellular homeostasis and avoiding excessive protein production. Moreover, this phenomenon is frequently observed in other studies investigating substrates targeted by E3 ubiquitin ligases for degradation. We have elaborated on this point in the Discussion section.

      (2) Figure 2M. A previous study has clearly demonstrated that NOXA is subjected to ubiquitylation and degradation by CRL5 E3 ligase (PMID: 27591266). This paper should be cited. Also, in that publication, NOXA ubiquitylation is via the K11 linkage, not the K48 linkage. The authors should include K11R mutant in their assay.

      We thank the reviewer for raising this important issue. We thank the reviewer for suggesting the relevant reference (PMID: 27591266), which we have now cited accordingly. Additionally, we would like to clarify that our new in vivo ubiquitination assays included the K11R and K11-only ubiquitin mutants, and our data demonstrate that WSB2-mediated NOXA ubiquitination indeed involves the K11 linkage ubiquitination(Figure 2—figure supplement 1E).

      (3) Figure 3H, J. The authors stated, "By mutating these lysine residues to arginine, we found that WSB2-mediated NOXA ubiquitination was completely abolished". Which one of the three lysine residues is playing the dominant role?

      We thank the reviewer for raising this important issue. To address this, we generated FLAG-NOXA mutants individually substituting lysine residues K35, K41, and K48 with arginine. In vivo ubiquitination assays demonstrated that lysine 48 (K48) is the predominant residue responsible for WSB2-mediated NOXA ubiquitination (Figure 3—figure supplement 1C).

      (4) Figure 3N. The authors need to show that the fusion peptide containing C-terminal NOXA peptide competitively inhibits the interaction between endogenous WSB2 and NOXA and extends the protein half-life of NOXA, leading to NOXA accumulation.

      We sincerely thank the reviewer for raising these important issues. As suggested, we investigated whether the fusion peptide containing the C-terminal NOXA sequence competitively disrupts the interaction between endogenous WSB2 and NOXA, subsequently influencing NOXA stability. Our results demonstrated that treatment with this fusion peptide indeed significantly reduced the endogenous interaction between WSB2 and NOXA (Figure 3—figure supplement 1D). Furthermore, we observed that the peptide dose-dependently increased endogenous NOXA protein levels and prolonged its protein half-life, thereby resulting in the accumulation of NOXA (Figure 3N; Figure 3—figure supplement 1E, F). These findings collectively indicate that the fusion peptide competitively inhibits the WSB2-NOXA interaction, stabilizes NOXA protein, and enhances its accumulation.

      (5) Figure 4. a) It would be better to investigate whether WSB2 knockdown can sensitize cancer cells to the treatment with ABT-737 or AZD5991, evidenced by a decrease in both IC50 values and clonogenic survival rates and whether such sensitization is dependent on NOXA. b) The authors need to show the levels of cleaved caspase-3/7/9 and the percentages of apoptotic cells in shNC cells upon silencing of WSB2 in Figure 4A-F. c) It will be more convincing to repeat the experiment to show synthetic lethality by WSB2 disruption and MCL-1 inhibitor AZD5991 treatment using another cell line, such as WSB2-deficient Huh-7 cells in Figure 4 I&J.

      We sincerely thank the reviewer for these valuable and constructive suggestions. Regarding point (a): We believe that our current Western blot and flow cytometry data (Figure 4G–L) have already provided strong evidence that WSB2 depletion enhances apoptosis in response to ABT-737 and AZD5991. Therefore, we consider that additional IC50 and clonogenic survival assays, while informative, may not be essential for supporting our conclusion. Furthermore, as shown in Figure 5A–F, we found that silencing NOXA largely, though not completely, reversed the enhanced apoptosis triggered by these inhibitors in WSB2-deficient cells, suggesting that the sensitization effect is at least partially dependent on NOXA.

      Regarding point (b): We have shown that WSB2 knockout alone had no impact on the levels of cleaved caspase-3/7/9 or the percentages of apoptotic cells in Huh-7 and C4-2B cells (Figure 4G-L and Figure 4—figure supplement 1A-D), indicating that WSB2 loss does not induce apoptosis on its own under basal conditions.

      Regarding point (c): We appreciate the reviewer’s suggestion and have now repeated the experiment in WSB2 knockout Huh-7 cells. The new results further support the synthetic lethality between WSB2 loss and AZD5991 treatment (Figure 4—figure supplement 1C, D).

      (6) Figure 5A/C/E. The effect of siNOXA is minor, if any, for cleavage of caspases. The same thing for Figure 6F/H.

      We appreciate the reviewer’s insightful observation regarding the relatively modest effect of shNOXA on caspase cleavage in Figures 5A/C/E and Figures 6F/H. Indeed, we acknowledge that the reduction in caspase cleavage following NOXA knockdown is moderate. However, consistent with our discussions in the manuscript, NOXA knockdown significantly—but not completely—rescued the increased apoptosis observed in WSB2-deficient cells treated with BCL-2 family inhibitors. This suggests that while NOXA plays a notable role, additional mechanisms or unidentified targets may also be involved in WSB2-mediated regulation of apoptosis.

      (7) Figure 5 I&J. The authors may consider performing IHC staining, immunofluorescence, or WB analysis to show the levels of NOXA and cleaved caspases or PARP in xenograft tumors. This would provide in vivo evidence of significant apoptosis induction resulting from the co-administration of ABT-737 and R8-C-terminal NOXA peptide.

      We appreciate the reviewer's thoughtful suggestion regarding additional immunohistochemical or immunofluorescence analyses in xenograft tumors. However, due to current limitations in available antibodies suitable for reliable detection of NOXA by IHC and IF, we are unable to perform these experiments. We greatly appreciate the reviewer's understanding of this technical constraint. Nevertheless, our existing data collectively supports the conclusion that the combination of ABT-737 and R8-C-terminal NOXA peptide significantly enhances apoptosis in vivo.

      (8) Figure 7. Does an inverse correlation exist between the protein levels of WSB2 and NOXA in RPAD or LIHC tissue microarrays? On page 12, in the first paragraph, Figure 7M-P was cited incorrectly.

      We sincerely thank the reviewer for raising this important issue. As mentioned above, due to current limitations regarding the availability of suitable antibodies that can reliably detect NOXA by IHC, we regret that it is not feasible to experimentally address this question at this time.

      Additionally, we have carefully corrected the citation error involving Figure 7M-P on page 12, as pointed out by the reviewer.

      (9) Figure S1D. BCL-W levels were reduced upon WSB2 overexpression, which should be acknowledged.

      We sincerely thank the reviewer for raising this important issue. We acknowledge that BCL-W protein levels were slightly reduced upon WSB2 overexpression in Figure S1D. However, this effect is distinct from the pronounced reduction observed in NOXA protein levels. We have revised the manuscript to clarify this point. Additionally, we recognize that transient overexpression systems may occasionally lead to non-specific or artifactual changes. Our exogenous expression and co-immunoprecipitation experiments did not support an interaction between BCL-W and WSB2. Therefore, the observed reduction of BCL-W under these conditions may not reflect a physiologically relevant regulation.

      (10) Figure S4. Given WSB2 KO mice are viable; the authors may consider determining whether these mice are more sensitive to radiation-induced tissue damage or but more resistant to radiation-induced tumorigenesis?

      We sincerely thank the reviewer for this insightful and biologically meaningful suggestion. We agree that investigating the potential role of WSB2 in radiation-induced tissue damage and tumorigenesis would be of great interest. However, conducting such experiments requires access to specialized irradiation facilities, which are currently unavailable to us. Nevertheless, we recognize the value of this line of investigation and plan to explore it in our future studies.

      (11) All data were displayed as mean{plus minus}SD. However, for data from three independent experiments, it is more appropriate to present the results as mean{plus minus}SEM, not mean{plus minus}SD.

      We sincerely thank the reviewer for highlighting this important issue. In line with the reviewer's suggestion, we have revised the manuscript accordingly and now present data from three independent experiments as mean ± SEM.

      (12) The figure legends require careful review: i) The low dose of ABT-199 (Figure 6H) and the dose of ABT-199 used in Figure 6I are missing. ii) The legends for Figure S1D-E are incorrect. iii) The name of the antibody in the legend of Figure S3C is incorrect.

      We sincerely thank the reviewer for raising these important issues. We have carefully corrected all the errors mentioned. In addition, we have thoroughly reviewed the manuscript to prevent similar errors.

      Reviewer #2 (Recommendations For The Authors):

      The authors focus on NOXA, after initially identifying WSB2 to interact with several BCL2 proteins. The rationale behind this is that WSB2 depletion or overexpression affects NOXA levels, but none of the other BCL2 proteins tested, as stated in the text. Yet, BCLW is also depleted upon overexpression of WSB2 (Supplementary Figure 1). How does this phenomenon relate to the sensitization noted, is BCL-W higher in WSB2 KO cells? It does not seem so though. This warrants discussion.

      We appreciate the reviewer for raising this important issue. Our results showed that overexpression of WSB2 markedly reduced NOXA levels, while the levels of other BCL-2 family proteins remained unaffected or minimally affected, such as BCL-W (Figure 2—figure supplement 1A). Furthermore, depletion of WSB2 through shRNA-mediated KD or CRISPR/Cas9-mediated KO in C4-2B cells or Huh-7 cells led to a marked increase in the steady-state levels of endogenous NOXA, without affecting other BCL-2 family proteins examined, included BCL-W (Figure 2A-C, Figure 2—figure supplement 2A, B).

      If WSB2 depletion does not affect MCL1 levels, how does excess NOXA actually kill? Does it bind to any (other) prosurvival proteins under conditions of WSB2 depletion? Is the MCL1 half-life changed?

      We appreciate the reviewer for raising this important point. NOXA is a BH3-only protein known to promote apoptosis primarily by binding to and neutralizing anti-apoptotic BCL-2 family members, especially MCL-1, via its BH3 domain. It can inhibit MCL-1 either through competitive binding or by facilitating its ubiquitination and subsequent proteasomal degradation. In our system, the total protein levels of MCL-1 remained unchanged in WSB2 knockout cells, suggesting that NOXA may not be promoting apoptosis through enhanced MCL-1 degradation. Instead, we speculate that the accumulation of NOXA in WSB2-deficient cells enhances apoptosis by sequestering MCL-1 through direct binding, thereby freeing pro-apoptotic effectors such as BAK and BAX. In line with our observations, Nakao et al. reported that deletion of the mitochondrial E3 ligase MARCH5 led to a pronounced increase in NOXA expression, while leaving MCL-1 protein levels unchanged in leukemia cell lines (Leukemia. 2023 ;37:1028-1038., PMID: 36973350).

      Additionally, NOXA has been reported to interact with other anti-apoptotic proteins, including BCL-XL. It is therefore possible that under conditions of WSB2 depletion, excess NOXA may also bind to BCL-XL and relieve its inhibition of BAX/BAK, further contributing to apoptosis. Future experiments assessing NOXA binding partners in WSB2-deficient cells would help clarify this mechanism.

      I think some initial insights into the mechanism underlying the sensitization would add a lot to this study. Is there a role of BFL1/A1 in any of these cell lines, as it can also rather selectively bind to NOXA and is sometimes deregulated in cancer?

      We appreciate the reviewer for raising this important issue. While BFL1/A1 is indeed another anti-apoptotic BCL-2 family member that can selectively bind to NOXA and has been implicated in cancer, our study primarily focuses on the WSB2-NOXA axis. However, given its potential involvement in apoptosis regulation, it would be an interesting direction for future studies to explore whether BFL1/A1 contributes to NOXA-mediated sensitization in specific cellular contexts.

      Otherwise, this is a very nice and convincing study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor coreceptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco knockout. Furthermore, we did not find any developmental differences between wildtype and knockout caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orcoexpressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2024). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further clarify this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however we do believe that this would be an interesting avenue for further research.

      (5)Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of this manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety III greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. But indeed, we did not include ITCs, based on information from the EAG results in Figures 2A & 2B. Butterfly antennae did not respond strongly to ITCs, so we did not include ITCs in the larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as “important” by VIP scores in the chemical analyses. In the EAG results of Plutella xylostella (Liu et al., 2020), moths responded well to a few ITCs, the tested ITCs in our study are actually adopted from this study except for those that were not available to us. However, butterflies did not show a strong response to the tested ITCs; therefore, we did not include ITCs because we expected that Pieris brassicae caterpillars are not likely to show good responses to ITCs. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      Figure 3D shows that WT caterpillars prefer infested plants without parastioids to infested plants with parasitoids. In addition, we observed that caterpillars move frequently between different leaves. Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from (parasitoid-exposed) conspecifics via their spit or faeces to avoid parts of the plant potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plant parts that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it more clear.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract

      Line 24: "optimal food plant" should be changed to "optimal food plants"

      Thank you for the suggestion, we will revise it.

      (2) Introduction

      Lines 44-46: The sentence should be rephrased.

      Thank you for the suggestion, we will revise it.

      Line 50: "are" should be changed to "is".

      Thank you for the suggestion, we will revise it.

      Lines 57 and 58: Please provide the Latin names of "brown planthoppers" and "striped stem borer".

      Thank you for the suggestion, we will revise it.

      Line 85: "investigate the influence of odor-guided behavior by this primary herbivore on the next trophic levels"; similarly, Line 160: "investigate if caterpillars could locate the optimal host-plant when supplied with differently treated plants". These sentences are not very accurate in describing the relevant experiments. A: Thank you for the suggestion, we will revise them.

      Reviewer #2 (Recommendations for the authors):

      (1) L53 Remove the "the" from "Under the strong selection pressure"

      Thank you for the suggestion, we will revise it.

      (2) L80 I suggest adding a reference for the spitting behaviour, e.g. Muller et al 2003.

      Thank you for the suggestion, we will add it.

      (3) L89 establishing a homozygous KO insect colony.

      Thank you for the suggestion, we will revise it.

      (4) L107 perhaps this goes against the journal style but I always like to see acronyms explained the first time they are used.

      Thank you for the suggestion, we will try to make it more understandable.

      (5) L146-148 sentence difficult to read - consider rephrasing.

      Thank you for the suggestion, we will revise it.

      (6) L230 do you mean still produce? Rather than still reproduce?

      Thank you for the suggestion, we will revise it.

      (7) L233 missing an and before "a greater vulnerability to the parasitoid wasp".

      Thank you for pointing this out, we will revise it.

      (8) L238 malfunctional is a strange word choice.

      Thank you for pointing this out, we will revise it.

      (9) L181 - can the authors confirm that this lower survival was due to parasitism by the wasps?

      This question is similar to Q(7) of Reviewer 1, so we quote our answer for Q(7) here:

      When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasp (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (10) L474 - has it been tested if wasps still behave similarly after their ovipositor has been removed?

      Thank you for pointing out this issue. We did not strictly compare if disarmed and untreated wasps have similar behaviors. However, we did observe if disarmed wasps can actively move or fly after recovering from anesthesia before releasing into a cage, otherwise we would replace with another active one.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We have added a table as supplementary figure 3 that shows a comparison of all candidates. While there are differences in both proteomes, components such as ZO proteins and the endocytosis machinery are clearly conserved.

      No description of how mass spectrometry was done and what type of validation was done.

      We have contacted the mass spec facility we worked with and added a paragraph explaining the mass spec. procedure in the material and methods section.

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary. This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina). This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins.

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa. The background mainly originates from these two proteins. Therefore, it is easy to distinguish specific hits from non-specific background.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 151. There are many gap junctions at which Cx35b is not colocalized with Cx34.7.

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript. We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include a comprehensive summary of the results from the quantitative proteomics analyses, such as the number of proteins detected in each species and the number of proteins associated with each GO term. Additionally, a clear figure or table highlighting the specific proteins conserved between zebrafish and mice would strengthen the evidence for evolutionary conservation of proteins at electrical synapses.

      We have added the raw data we received from our mass spec facility including a comparison of all the candidates for different species. Supplementary figure 3.

      (2) A more detailed description of the number of experimental and/or technical replicates would improve the technical rigor of the study. For example, what was the rationale for using different log2 fold-change cutoffs in mice versus zebrafish? Are the replicates consistent in terms of protein enrichment?

      We have added raw data from individual experiments as a supplement (Excel spreadsheet). We have two replicates from zebrafish and two from mice. The first experiment in mice was conducted with fewer retinas and a different promoter (human synapsin promoter) and didn’t yield nearly as many candidates. We are currently running a third experiment with 35 mouse retinas which will most likely detect more candidates as we have identified currently. We can update the proteome in this paper once the analysis is complete. It is not feasible to conduct these experiments with multiple replicates at the same time, since the number of animals that have to be used is simply too high, especially since very specific genotypes are required that are difficult obtain.

      (3) It would be interesting to determine whether there are differences in the presence of candidate proteins between AII-AII gap junctions and AII-cone bipolar cell gap junctions. Given that the subcellular localization of AII-AII gap junctions differs from that of AII-cone bipolar cell gap junctions (with most AII-AII gap junctions located below AII-cone ones), histological validations of the proteins shown in Figure 6 can be repeated for AII-AII gap junctions. This would help reveal similarities or differences in the protein compositions of these two types of gap junctions.

      Thank you for this suggestion. We had similar plans. However, we realized that homologous gap junctions are difficult to recognize with GFP. The dense GFP labeling in the proximal IPL, where AII-AII gap junctions are formed, does not allow us to clearly trace the location of individual dendrites from different cells. Detecting AII-AII gap junctions would require intracellular dye Injections of neighboring AII cells. Unfortunately, we don’t have a set up that would allow this. Bipolar cell terminals, on the contrary, are a lot easier to detect with markers such as SCGN, which is why we decided to focus on AII/ONCB gap junctions.

      (4) In Figures 1 and 2, it would be helpful to clarify in the figure legends whether the proteins in the interaction networks represent all detected proteins or only those selected based on log2 fold-change or other criteria.

      Thank you for this suggestion! We have added a description in lines 643 and 662.

      (5) In Figure 1A (bottom panel), please include a negative control for the Neutravidin staining result from the non-labeling group.

      We only tested the biotinylation for wild type retinas in cell lysates and western blots as shown in figure 1C, which shows an entirely different biotinylation pattern.

      (6) In Figure 2B, please include the results of Neutravidin staining for both the labeling and non-labeling groups.

      Same comment: We see the differences in the biotinylation pattern on western blots, which is distinct for Cx36-EGFP and wild type retinas, although both genotypes were injected with the same AAV construct and the same dose of biotin. We hope that this provides sufficient evidence for the specificity of our approach.

      (7) In Figure 5B, the sizes of multiple proteins detected by Western blotting are inconsistent and confusing. For example, the size of Cx36 in the "FLAG-SJ2BP" panel differs from that in the other three panels. Additionally, in the "Myc-SIPA1L3+" panel, the size of SIPA1l3 appears different between the input and IP conditions.

      Thank you for pointing this out! The differences in the molecular weight can be explained by dimerization. We have indicated the position of the dimer and the monomer bands with arrows. Especially, when larger amounts of Cx36 are coprecipitated Cx36 preferentially occurs as a dimer. This can also be seen in our previous publication:

      S. Tetenborg et al., Regulation of Cx36 trafficking through the early secretory pathway by COPII cargo receptors and Grasp55. Cellular and Molecular Life Sciences 81, 1-17 (2024). Figure 1D

      The band that occurs above 150kDa in the SIPA1L3 input is most likely a non-specific product. The specific band for SIPA1L3 can be seen in the IP sample, which has the appropriate molecular weight. We often see much better immuno reactivity for the protein of interest in IP samples, because the protein is concentrated in these experiments which facilitates its detection.

      (8) How specific are the antibodies used for validating the proteins in this study? Given that many proteins, such as EPS15l1, HIP1R, SNAP91, GPrin1, SJ2BP, Syt4, show broad distribution in the IPL (Figure 3B, 4A, 6D), it is important to validate the specificity of these antibodies. Additionally, including negative controls in the histological validation would strengthen the reliability of the results.

      We carefully selected the antibodies based on western blot data, that confirmed that each antibody detected an antigen of appropriate size. Moreover, the distribution of the proteins mentioned is consistent with function of each protein described in the literature. EPS15L1 and GPrin1 for instance are both membrane-associated, which is evident in Hek cells. Figure 5C.

      A true negative control would require KO tissue and we don’t think that this is feasible at this point.

      (9) In Figure 7F, the model could be improved by highlighting which components may be conserved between zebrafish and mice, as well as which components are conserved between the AII-AII junction and AII-cone bipolar cell junction?

      Thank you for this suggestion. However, we don’t think that this is necessary as our study primarily focuses on the AII amacrine cell.

      Currently we are unable to distinguish differences in the composition of AII-AII and AII-ONCB junctions as described above.

      (10) Are there any functional measurements that could support the conclusion that "loss of Cx36 resulted in a quantitative defect in the formation of electrical synapse density complex"?

      The loss of electrical synapse density proteins is shown by these immunostaining comparisons. Functional measurements necessarily depend on the function of the electrical synapse itself, which is gone in the case of the Cx36 KO. It is not clear that a different functional measurement can be devised.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be very helpful if there were page and line numbers on the manuscript.

      Line and page numbers have been added.

      (2) Typos in the 3rd paragraph, the sentence 'which is triggered by the influx of Calcium though non-synaptic NMDA...'

      Should it read '... Calcium THROUGH non-synaptic NMDA'?

      We have corrected this typo.

      (3) Figure 1B: please add a description of the top panels, 'Cx36 S293'.

      A description of the top panels has been added to the figure legend in line. Line 639.

      (4) Figure 1C: what do the arrows indicate?

      We apologize for the confusion. The arrows in the western blot indicate the position of the Cx35-V5-TurboID construct, which can be detected with streptavidin-HRP and the V5 antibody. We have added a description for these arrows to the figure legend. See line 641.

      (5) Related to the point in the 'Weakness', there are some descriptions of how well some of the gap junction-associated proteins colocalize with Cx36 in immunostaining. For example, 'In comparison to the scaffold proteins, however, the colocalization of Cx36 with each of these endocytic components, was clearly less frequent and more heterogenous, which appears to reflect different stages in the life cycle of Cx36' and 'All of these proteins showed considerable colocalization with Cx36 in AII amacrine cell dendrites'. It would be nice to see quantification data to support these claims.

      Thank you for this suggestion. We have added a colocalization analysis to figure 3 (C & D). We quantified the colocalization for the endocytosis proteins Eps15l1 and Hip1r. This quantification included a flipped control to rule out random overlap. For both proteins we confirmed true colocalization (Figure 3D).

      (6) In Figure 5B, it would be helpful if there were arrows or some kind in western blottings to indicate which bands are supposed to be the targeted proteins.

      We have added arrows in IP samples to indicate bands representing the corresponding protein.

      (7) In the sentence including 'for the PBM of Cx36, as it is the case for ZO-1', what is PBM?

      The PBM means PDZ binding motif. We have added an explanation for this abbreviation in line 244.

      (8) Please add a description of the Cx35b promoter construct in the Method section.

      The Cx35b Promoter is a 6.5kb fragment. We will make the clone available via Addgene to ensure that all details of the clone can be accessed via snapgene or alternative software.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Formins are complex proteins with multiple effects on actin filament assembly, including nucleation, capping with processive elongation, and bundling. Determining which of these activities is important for a given biological process and normal cellular function is a major challenge.

      Here, the authors study the formin FHOD3L, which is essential for normal sarcomere assembly in muscle cells. They identify point mutants of FHOD3L in which formin nucleation and elongation/bundling activities are functionally separated. Expression of these mutants in neonatal rat ventricular myocytes shows that the control of actin filament elongation by formin is the major activity required for the normal assembly of functional sarcomeres.

      Strengths:

      The strength of this work is to combine sensitive biochemical assays with excellent work in neonatal rat ventricular myocytes. This combination of approaches is highly effective for analyzing the function of proteins with multiple activities in vitro.

      Weaknesses:

      FHOD3L does not seem to be the easiest formin to study because of its relatively weak nucleation activity and the short duration of capping events. This difficulty imposes rigorous biochemical analysis and careful interpretation of the data, which should be improved in this work.

      We thank the reviewer for their praise and appreciation of our work. Indeed, FHOD3L is a challenging formin to work with.

      Important points are raised here and below regarding the brief elongation events we reported. As suggested, we performed more rigorous analysis of the data and present it in the revised manuscript. We now report that from 45 dim regions analyzed, in three independent experiments with wild type FHOD3L, we detected 40 bursts. (The remaining five could be formin falling off too quickly to detect or the dim spots could be regions of inhomogeneity in intensity, not due to formin.) For comparison to the presented data with FHOD3L-CT, we analyzed the filaments in TIRF assays with no formin present. As the reviewers point out, inhomogeneities in filament intensity are normal. Thus, we examined any dim spots for pauses and/or bursts. As is now reported in Figure 2G,H, the velocity of growth of these dim spots is indistinguishable from the velocity of the rest of the filament. We acknowledge that our numbers may not be perfectly accurate, due to the noise in our system, we believe that the difference of 3-4 fold increase versus no change in rate is substantial and convincing.

      We also determined the number of dim spots per length of filament. We found a higher frequency when FHOD3L-CT or FHOD3S-CT was present vs no formin, as now shown in Figure 2 – supplements 1G and 2E.

      We were asked about the pauses we observe before bursts of elongation and how we know they are functionally relevant. The short answer is that we do not know. We reported them because they were so common: Of the 40 bursts, pauses preceded the burst in 38 cases. We cannot rule out that this pause reflects an interaction with the surface but might expect the frequency to be lower if it were. We revise the text to make our conclusions about pauses more circumspect.

      We are convinced that the brief dim events we observed in the presence of FHOD3L-CT, in fact, reflect formin-mediated elongation and worked hard to improve their presentation, in addition to the added analysis. We include new kymographs, including examples from FHOD3L, FHOD3S, K1193L, and actin alone. We hope that the reviewers are also convinced.

      This does not preclude our interest in the microfluidics and two-color assays, which will be pursued in the future. We have reached out to a colleague who is set up to repeat these measurements with microfluidics-assisted TIRF. The noise should be greatly reduced and the system is also optimal for directly visualizing labeled FHOD3, as suggested. We expect these experimental approaches will provide additional insights.

      Reviewer #2 (Public review):

      This article elucidates the biochemical and cellular mechanisms by which the FHOD-family of formins, particularly FHOD3, contributes to sarcomere formation and contractility in cardiomyocytes. Formins are mainly known to nucleate and elongate actin filaments, with certain family members also exhibiting capping, severing, and bundling activities. Although FHOD3 has been well-established as essential for sarcomere assembly in cardiomyocytes, its precise biochemical functions and contributions to actin dynamics remain poorly understood.

      In this study, the authors combine in vitro biochemical assays with cellular experiments to dissect FHOD3's roles in actin assembly and sarcomere formation. They demonstrate that FHOD3 nucleates actin filaments and acts as a transient elongator, pausing elongation after an initial burst of filament growth. Using separation-of-function mutants, they show thatFHOD3's elongation activity - rather than its nucleation, capping, or bundling capabilities - is key for its sarcomeric function.

      The experiments have been conducted rigorously and well-analyzed, and the paper is clearly written. The data presented support the authors' conclusions. I appreciate the detailed description and rationale behind the FHOD3 constructs used in this study.

      We are happy to hear others find paper to be clearly written and well described.

      However, I was somewhat surprised and a bit disappointed that while the authors conducted single-color TIRF experiments to observe the effects of FHOD3 on single filaments, they did not use fluorescently labeled FHOD3 to directly visualize its behavior. Incorporating such experiments would significantly strengthen their conclusions regarding FHOD3's bursts of elongation interspersed with capping activity. While I understand this might require a few additional weeks of experiments, these data would add considerable value by directly testing the proposed mechanism.

      We appreciate the suggestion and hope to incorporate a two-color approach soon. As noted, FHOD3L is not always easy to work with and we do not have a functional labeled copy of the protein at this time.

      There is a typo in the word "required" in line number 30. The authors also use fit data to extract parameters in several panels (e.g., Figures 2b, 2d, 3a, and 3b). While these fit functions may be intuitive to actin experts, explicitly describing the fit functions in the figure legends or methods would greatly benefit the broader readership.

      Thank you for these comments. We updated the indicated figures and described the analysis in greater detail.

      Reviewer #3 (Public review):

      Valencia et al. aim to elucidate the biochemical and cellular mechanisms through which the human formin FHOD3 drives sarcomere assembly in cardiomyocytes. To do so, they combined rigorous in vitro biochemical assays with comprehensive in vivo characterizations, evaluating two wild-type FHOD3 isoforms and two function-separating mutants. Surprisingly, they found that both wild-type FHOD3 isoforms can nucleate new actin filaments, as well as elongate existing actin filaments in conjunction with profilin following barbed-end capping. This is in addition to FHOD3's proposed role as an actin bundler. Next, the authors asked whether FHOD3L promotes sarcomere assembly in cardiomyocytes through its activity in actin nucleation or rather elongation. With two function-separating mutants, the authors evaluated the numbers and morphology of sarcomeres, as well as their ability to beat and generate cardiac rhythm. The authors found that while the wild-type FHOD3L and the K1193L mutant can rescue sarcomere morphology and physiology, the GS-FH1 mutant fails to do so. Given that in GS-FH1 mainly elongation activity is compromised, the authors concluded that the elongation activity of FHOD3 is essential for its role in sarcomere assembly in cardiomyocytes, while its nucleator activity is dispensable. Overall, this important study provided a broadened view on the biochemical activities of FHOD3, and a pioneering view on a possible cellular mechanism of how FHOD3L drives sarcomere assembly. If further validated, this can lead to new mechanistic models of sarcomere assembly and potentially new therapeutic targets of cardiomyopathy.

      The conclusions of this paper are mostly well supported by the comprehensive biochemical analyses performed by the authors. However, the sarcomere assembly defect phenotype in the GS-FH1 rescue condition requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding by this construct in vivo, rather than its inability to drive elongation. Though the authors do show in Figure 6 that GS-FH1 can bind to normal-looking sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 may dimerize with endogenous FHOD3L. The authors should demonstrate that GS-FH1 alone can indeed interact with existing actin filaments in vivo. While this has been clearly demonstrated in vitro, given the more complex biochemical environment in vivo where additional unknown binding partners may present, cautions should be made when extrapolating findings from the former to the latter.

      The reviewer is concerned about the low protein levels in the GS-FH1 rescue experiments as reflected in the HA fluorescence intensity distributions shown in Fig. 5 Supplement 2A. While the scenario proposed could explain our observations with the GSFH1 rescues it is quite complex. Nor does the scenario preclude the conclusion that the FH1 domain is critical. We agree that the observed sarcomeres are likely to be residual in cells with incomplete RNAi. We now include the image of a cell that is still full of sarcomeres and note that the GH-FH1 is expressed at a relatively high level and striated throughout the cell. We interpret this as evidence that GS-FH1 is stable when suitable binding sites are available. We cannot exclude that there is more GS-FH1 because there was more endogenous FHOD3L with which to heterodimerize. If the GS-FH1 heterodimer were simply poisoning the wild type protein, we do not expect that it would be bound correctly to sarcomeres. If, instead, heterodimers have some activity, it seems far from sufficient to rescue sarcomere formation, suggesting that two functional FH1 domains are critical.

      Furthermore, we do not see evidence of correlation between protein levels and rescue at the level present in these cells (addressed below). Unfortunately, the proposed IP to test whether FHOD3L binds actin in vivo would only potentially report on filament side binding (both direct and indirect). It would not address whether the GS-FH1 mutant functions as a nucleator, elongator, bundler and/or capping protein in vivo.

      The critical question that we can address is whether the phenotype is due to low protein levels, assuming the protein present is functional, or due to loss of elongation activity by FHOD3L. To address this question, we returned to our data.

      First, we plotted the distributions of the intensities of the cells we analyzed further, in addition to the automated readout of all of the cells in the dish (Fig. 4 supplement 1). These cells were selected randomly and, as should be the case, the distributions of their intensities agree well with the original distributions for the three different rescue constructs: FHOD3L, K1193L, and GS-FH1 (Fig. 6 supplement 1). We then asked whether there was any correlation in HA intensities with the sarcomere metrics. As seen in our pilot data, no correlation is evident in any of the three cases across the range of intensities we collected (400 – 2700 a.u.) (old Fig. 6 supplement C,D,E). We now replace the data from pilot experiments with analysis of HA intensities and sarcomere metrics from the data sets included in the paper (new Fig 6. Supplement 1). Again, little to no correlation was observed (the single highest r-squared value is 0.2 and the remaining eight values are less than or equal to 0.08).

      To more specifically address the question of whether low HA fluorescence intensity is likely to reflect sufficient protein levels to build sarcomeres we re-examined two data sets from the FHOD3L WT rescue data. We found that, by chance, the first replicate of data from the wild type rescue has a comparable intensity distribution to that of the GSFH1 rescues (580 +/- 261 / cell vs. 548 +/- 105 / cell). In addition, we collected all of the data from cells with intensity levels <720, designed to mimic the distribution of the GS-FH1 cells (Fig. 6 supplement 3). We then compared the sarcomere metrics (sarcomere number, sarcomere length, sarcomere width) between the full data set and the two low intensity subsets:

      • Sarcomere number is the only non-normal metric. We therefore used the Mann Whitney U test, which shows no difference between all 3 WT distributions.

      • We compared Z-line lengths by one-way ANOVA and Tukey's post hoc tests, again finding no significant difference for all distributions.

      • Sarcomere length shows a weakly significant difference (p=0.038) between the whole WT data set and bio rep 1, but no difference between the whole WT data set and the HA<720 group.

      Thus, cells expressing wild type FHOD3L at levels comparable to levels detected in GS-FH1 mutant rescues, are fully rescued. Based on these findings we conclude that the expression levels in the GS-FH1 are high enough to rescue the FHOD3 knock down, supporting our conclusion that the defect is due to loss of elongation activity. We have added this analysis and discussion to the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      You will see that the 3 reviewers are very positive about your work and appreciate the elegant combination of biochemical assays and functional tests in cardiomyocytes. We've had a long discussion with them and we all agree that two experiments deserve further effort to make the conclusions of your paper more convincing.

      Thank you.

      The first experiment is the TIRF elongation assay, where the two biochemist Reviewers remain doubtful that these short events are really due to the presence of a formin at the end of the filament. One of them suggests that two-color imaging with a labeled formin should clearly prove this point.

      We agree that the elongation assays can be improved. Given the similarity of processivity of Fhod3L, Fhod3S and Drosophila FhodA (measured by a distinct method), we are inclined to believe them. However, the reviewer raises an excellent point about the accuracy of the measurements given the resolution (and noise) of the data. We are interested in the two-color imaging assay but do not believe it will necessarily simplify the analysis. We suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation rates. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low.

      We will return to these experiments, using alternate methods, curious to see what else we learn. In the meantime, we conducted more thorough analysis, including controls, and improved visualization of example traces. Data for elongation analysis and kymographs were acquired with Jfilament. We stretched the x-axis (time) in kymographs for FHOD3L-CT (Fig. 2F), FHOD3S-CT (Fig. 2, supplement 2C), FHOD3L-CT K1193L (Fig. 3, supplement 1A), and actin alone (Fig 2G), and highlighted regions of analysis. The slopes for these regions, separated based on intensity, were fit to the data in KaleidaGraph. The fits are offset from the data such that they do not obscure the filaments and corresponding rates are given. The fact that we never see fast dim regions when FHOD3 is not present, as shown in Fig. 2H and that the frequency of dim events is markedly increased (Fig. 2-supplements 1G and 2E) give us confidence that the events are real. We acknowledge in the text that the precise values of the short events may be inaccurate due to the resolution of our experiments. We hope the reviewers are convinced by the improved analysis.

      The second experiment is the sarcomere assembly defect phenotype in the GS-FH1 rescue condition. This requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding/nucleation in vivo, rather than its inability to elongate F-actin. Although you show that GS-FH1 can bind to sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 could dimerize with endogenous FHOD3L.

      We agree that the sarcomeres we see are likely to be residual and could reflect some remaining endogenous FHOD3. The reviewers are concerned about the low protein levels in the GSFH1 rescues. First, we do not agree that the levels are “extremely” low. Through careful analysis, we established that 3xHA-FHOD3L intensities between 300 and 3000 a.u./um<sup>2</sup> were sufficient for full rescue. The mean for the GSFH1 experiments is 533 +/- 93, which is well within this range. Furthermore, we did not observe correlation between sarcomere number, length, or width and HA intensity over the full range collected for wild type FHOD3L or within the GS-FH1 data. We previously showed pilot data but now show correlation analysis for every analyzed cell (Fig. 4 – figure supplement 1 D-F). We conducted this analysis on all of the mutant rescue experiments (Fig. 6-supplement 1). Finally, we identified two subpopulations of the wildtype rescue data. One is all of the cells with HA intensity < 720, which gives a distribution of mean 545 +/- 85. The second set is the first biological replicate of wild type rescue, which has a distribution of mean 560 +/- 160. Again correlation shows little relationship between HA levels and sarcomere metrics. Nevertheless, we show intensity level matched images in Fig 6, as opposed to images reflecting average intensities.

      The critical question remains whether the phenotype is due to low protein levels or due to loss of elongation by FHOD3L. Notably, we now show a cell that is full of sarcomeres and has relatively high FHOD3L levels as well, consistent with available binding sites stabilizing mutant protein but not ruling out heterodimerization (Fig. 6 – figure supplement 2C). Others have expressed mutant FHOD3L in a wild type background in mice. They observed poisoning, consistent with heterodimerization. Thus, it is possible that, as suggested, the FHOD3L-GSFH1 detected in sarcomeres is in fact heterodimerized with residual endogenous FHOD3L. In this case, we would still conclude that the protein is not functional enough to rescue, supporting a role for the FH1 domain.

      In the future, we plan to perform experiments with compromised, but not inactive, FH1 domains, as we discuss in the paper.

      We hope that you will find these comments useful.

      Yes, the comments were thoughtful and helped us write a better paper. Thank you.

      Reviewer #1 (Recommendations for the authors):

      Some experiments should be described and analyzed more carefully. This lack of clarity calls into question the interpretation of some experiments. Overall, this study is not yet as convincing as it should be.

      Main recommendations:

      (1) Formin elongation phases in the TIRF experiment are not convincing. They are rare and it is difficult to see any significant difference between the control movie without FHOD3L-CT and the movie with FHOD3L-CT. Filaments assembled in the absence of FHOD3L-CT also show some fluorescence inhomogeneity (which is normal), and measurements of formin elongation rates and capping times are not convincing (for example, the kymograph of the control profilin-actin situation in Figure 2F also shows a fast elongation phase on the right).

      Please see response above. We conducted more thorough analysis and created improved visualizations. We hope the data are more convincing now.

      It is also difficult to understand how an accurate measurement can be made from these noisy kymographs, and the method section should explain that precisely.

      This is a valid point. We added details of analysis to the methods section and we discuss the fact that the measurements are at the limit of our resolution in the paper. We rely on the large (~3-fold) difference in elongation, more than specific elongation rates for our interpretation.

      One of the problems is that these events are too transient to quantify well with noisy data. I noticed that the formin concentration used in these movies is quite low (0.1 nM FHOD3L-CT). Is there a reason for this? Is it possible to increase the formin concentration to increase the number of formin capping/elongation events and provide more convincing movies?

      We acknowledge that the data are noisy. We felt that it was necessary to perform experiments with filaments only tethered at one end, leaving the growing end free. We did so, in part, because when we did experiments with biotinylated actin to anchor the filaments down, we observed pauses in the absence of formin. Ultimately, we compromised, using anchored seeds and a relatively low concentration of NEM-myosin to decrease motion of the actin filaments.

      The experiments were performed with such low FHOD3L-CT because it was a potent nucleator in TIRF assays, making data analysis nearly impossible with more formin present. FHOD3S-CT and FHOD3L-CT K1193L behaved somewhat differently between these experiments and we were able to perform them with 1 nM formin.

      Not seeing formin at the tip of the filaments is an additional difficulty because we do not know if these pauses occur because formin is stuck to the coverslips (which could very well happen with these sticky proteins) or freely bound at the end of a filament as the text suggests. Is there any argument in favor of one scenario over the other?

      This will be an important experiment. As described above, we suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation data. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low. In order to address the question about the cause of pauses, we reviewed our data, finding that 38 of 40 bursts were preceded by pauses. We do, however, discuss that we cannot rule out non-specific interactions with the surface.

      (2) Pyrene elongation assays in the presence of profilin are actually more convincing to test the elongation ability of formins. However, such an assay is not presented for all mutants. It should be.

      While we agree to some extent with this comment, we did not include the pyrene data for all of the mutants because the shapes of the curves were even more complicated than those seen with wild type FHOD3L-CT rendering them uninterpretable.

      (3) Some experiments (e.g. in Figure 2E) are performed with yeast profilin, while others (e.g. in Figure 2F) are performed with human profilin. Obviously, both profilins could modulate formin activity differently and the side-by-side interpretation of both experiments is difficult. Could the authors stick to human profilin for all experiments?

      We used to always perform pyrene assays with yeast profilin because it was known to be insensitive to pyrene. These data were collected before we realized that the affinity of human profilin for actin is so high that we could probably do everything with this profilin. We have compared the two profilins for other formins, e.g. Delphilin, Capu, and did not observe detectable differences.

      Minor recommendations:

      (1) The pyrene assays with the light blue colored curve choice are not ideal. I have difficulties seeing some of the curves.

      Thank you. We added symbols to a subset of the traces to make them more visible.

      (2) In the same curves, I can't understand what the +3.75 and 0.078 numbers mean. Could these results be plotted in a clearer way?

      These values are the lowest concentrations in the range tests. They were matching light blue with black outline for visibility. We added symbols and changed the color of the numbering for improved visibility/understanding.

      (3) In Figure 2D, is the Kd of I1163A really determined only from 2 experimental data points?

      Of course not. We now show the figure with extended axes in Fig. 2 - figure supplement 1C.

      (4) In Figure 2C, the shape of the curves suggests that this is not a pure capping assay, but a mix of capping and nucleation. It's not dramatic but could lead to an under-estimation of the capping efficiency.

      We agree with the reviewer that the complicated shapes confound interpretation. Our analysis is based on the earliest slopes, in part, for this reason. We added discussion of this complication to the text.

      Reviewer #3 (Recommendations for the authors):

      Suggestions for additional experiments:

      (1) To evaluate whether GS-FH1 alone can indeed interact with existing actin filaments in vivo, the authors may consider performing immunoprecipitation assays with GS-FH1 extracted from rescued NRVMs.

      An IP of GS-FH1 from cells could show actin filament side binding but, unfortunately, will not provide any information about filament end binding, which is of much greater interest.

      It will be helpful to show phalloidin staining in GS-FH1 rescues in a similar manner as in Figure 6-supplement 1, panel B, and compare that with mock rescue in Figure 4 panel D. It will be essential to prove this prior to concluding that actin elongation activity is essential for sarcomere assembly.

      This is an excellent suggestion. We now include images of phalloidin stained cells from both K1193L and GS-FH1 rescues (Fig. 6A’ – supplement 2A,B). We were intrigued to see small actin punctae that were sometimes aligned. We speculate that these could be pre-premyofibrils and suggest that this is further evidence that the GS-FH1 protein is not completely unstable.

      (2) Prior to sarcomere assembly, a-actinin is known to form short bundles with actin filaments (I-Z-I complex) without clearly defined periodicity. This semi-ordered state then transforms into the more ordered sarcomeres with periodic spacing. It will be valuable to show the phalloidin staining in addition to the a-actinin IF consistently across all conditions. This may lead to further insights into the defects of sarcomere assembly. Along the same vein, higher magnification images showcasing several sarcomeres will help the readers evaluate these defects.

      We agree that there are additional valuable measurements to be made. In order to favor synchronized contraction, we plated the cells at too high a density to reliably identify IZI complexes. We have included some zoomed in images of the phalloidin staining.

      Recommendations for improving the writing:

      The authors mentioned the interaction between cardiac MyBP-C and FHOD3L as essential for the localization of FHOD3L to the C-line of the sarcomere. Can they discuss whether this interaction is important for the role of FHOD3L in sarcomere assembly? If so, how?

      This is a very interesting question that we cannot answer at this time.

      Minor corrections to the text and figures:

      In the legend of Figure 2-Figure Supplement 1, the labels of (F) and (E) are swapped.

      Thank you for catching this.

    1. Author response:

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the reviewers and editors for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to support accessibility and reproducibility. We respectfully disagree, however, with the assessment that the system lacks live-cell imaging capabilities. We are fully confident in the system’s suitability for live-cell applications and will demonstrate this by including representative live-cell imaging data in the revised manuscript, along with detailed instructions for implementing environment control. Moreover, we will expand our discussion to include a broader, more quantitative comparison to existing LSFM platforms—highlighting trade-offs in cost, performance, and accessibility—to better contextualize Altair’s utility and adaptability across diverse research settings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. For this reason, we are especially interested in identifying and expanding any sections that are perceived as superficial, and we would greatly appreciate the reviewer’s guidance on which areas would benefit from further elaboration.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to reduce the barrier to entry for non-specialist labs.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We agree that there are practical challenges associated with handling 5 mm diameter coverslips. However, the Nikon 25x can readily be replaced by a Zeiss W Plan-Apochromat 20x/1.0 objective, which eliminates the need for the 5 mm coverslip[1]. In the revised manuscript, we will more explicitly detail the practical challenges in handling a 5 mm coverslip and mention the alternative detection objective.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We understand the reviewer’s concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. While lower-cost alternatives for analog and digital control (e.g., microcontroller-based systems) do exist, our choice was intentional. By relying on a unified and professionally supported platform, we minimize the complexity of sourcing, configuring, and integrating components from disparate vendors—each of which would otherwise demand specialized technical expertise. Moreover, in future releases, we aim to further streamline the system by eliminating the need for the NI card, consolidating all optoelectronic control through the ASI Tiger Controller. This approach allows users to purchase a fully assembled and pre-configured system that can be operational with minimal effort.

      It is worth noting that the ASI components are not the primary cost driver. The full set—including XYZ and focusing stages, a filter wheel, a tube lens, the Tiger Controller, and basic optomechanical adapters—costs approximately $27,000, or ~18% of the total system cost. Additional cost reductions are possible. For example, replacing the motorized sample positioning and focusing stages with manual alternatives could reduce the cost by ~$12,000. However, this would eliminate key functionality such as autofocusing, 3D tiling, and multi-position acquisition. Open-source mechanical platforms such as OpenFlexure could in principle be adapted, but they would require custom assembly and would need to be integrated into our control software. Similarly, the filter wheel could be omitted in favor of a multi-band emission filter, reducing the cost by ~$5,000. However, this comes at the expense of increased spectral crosstalk, often necessitating spectral unmixing. An industrial CMOS camera—such as the Ximea MU196CR-ON, recently demonstrated in a Direct View Oblique Plane Microscopy configuration[2]—could substitute for the sCMOS cameras typically used in high-end imaging. However, these industrial sensors often exhibit higher noise floors and lower dynamic range, limiting sensitivity for low-signal imaging applications.

      While a $150,000 system represents a significant investment, we consider it relatively cost-effective in the context of advanced light-sheet microscopy. For comparison, commercially available systems with similar optical performance—such as LLSM systems from 3i or Zeiss—are several-fold more expensive.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our fibroblast images. As noted, the current manuscript focuses on the optical design and performance characterization of the system, using fixed specimens to validate resolution and imaging stability. We acknowledge the importance of environmental control for live-cell imaging. Temperature regulation is routinely implemented in our lab using flexible adhesive heating elements paired with a power supply and PID controller. For pH stabilization in systems that lack a 5% CO<sub>2</sub> atmosphere, we typically supplement the imaging medium with 10–25 mM HEPES buffer. In the revised manuscript, we will introduce a modified sample chamber capable of maintaining user-specified temperatures, along with detailed assembly instructions. We will also include representative live-cell imaging data to demonstrate the feasibility of in vitro imaging using this system.

      Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems.

      Strengths:

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful summary of our work. We are pleased that the foundational optical principles, design rationale, and emphasis on accessibility came through clearly. We agree that the approach used to construct the microscope is highly modular, and we anticipate that these design principles will serve as the basis for additional system variants tailored to specific biological samples and experimental contexts. To support this, we provide all Zemax simulations and CAD files openly on our GitHub repository, enabling advanced users to build upon our design and create new functional variants of the Altair system.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      The referee is correct that our comparisons reference the original LLSM design, which was simultaneously disseminated as an open-source platform and commercialized by 3i. While we acknowledge that newer variants of LLSM have been developed—including systems incorporating adaptive optics[3] and the MOSAIC platform (which remains unpublished)—the original implementation remains the most widely described and cited in the literature. It is therefore the most appropriate point of comparison for contextualizing Altair’s performance, complexity, and accessibility. Importantly, this version of LLSM is far from obsolete; it continues to be one of the most commonly used imaging systems at Janelia Research Campus’s Advanced Imaging Center.

      We acknowledge that more recent commercial implementation by Zeiss has addressed several of the practical limitations associated with the original design. In particular, we agree that the Zeiss Lattice Lightsheet 7 system, which integrates a meniscus lens to facilitate oblique imaging through a coverslip, offers a user-friendly experience—albeit with a modest tradeoff in resolution (reported deskewed resolution: 330 nm × 330 nm × 500–1000 nm).

      While we recognize that statements on usability and stability can be subjective, one objective proxy for system complexity is the number of optical elements that require precise alignment during assembly. The original LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair system contains only 9 such elements. By this metric, Altair is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories. In the revised manuscript, we will clarify the scope of our comparison and provide more precise language about what we mean by complexity (e.g., number of optical elements needed to align).

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may serve as an inconvenience for many users. We will discuss this more explicitly in the revised manuscript. Specifically, we note that changing the detection objective is sufficient to eliminate the need for a 5 mm coverslip. For example, as demonstrated in Moore et al., Lab Chip 2021, pairing the Zeiss W Plan-Apochromat 20x/1.0 objective with the Thorlabs TL20X-MPL allows imaging beyond the physical surfaces of both objectives, removing the constraint imposed by small-format coverslips[1]. In the revised manuscript, we will propose this modification as a straightforward path for increasing compatibility with more conventional sample mounting formats.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We appreciate the reviewer’s emphasis on the importance of environmental control for live-cell imaging applications. It is worth noting that the original LLSM design, including the system commercialized by 3i, provided temperature control only, without integrated gas or humidity regulation. Despite this, it has been successfully used by a wide range of scientists to generate important biological insights.

      We agree that both OPM and the Zeiss implementation of LLSM offer clear advantages in terms of environmental control, as we previously discussed in detail in Sapoznik et al., eLife, 2020[4]. However, assembly of high numerical aperture OPM systems is highly technical, and no open-source variant of OPM delivers sub-cellular scale resolution yet.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We thank the reviewer for this comment. It is true that our discussion focused primarily on the square lattice implementation of LLSM. While this could be viewed as a subset of the system’s broader capabilities, we chose this focus intentionally, as the square lattice remains by far the most commonly used variant in practice. Even in the original LLSM publication, 16 out of 20 figure subpanels utilized the square lattice, with only one panel each representing the hexagonal lattice in SIM mode, a standard Bessel beam in incoherent SIM mode, a hex lattice in dithered mode, and a single Bessel in dithered mode. This usage pattern largely reflects the operational simplicity of the square lattice: it minimizes sidelobe growth and enables more straightforward alignment and data processing compared to hexagonal or structured illumination modes.

      In 2019, we performed an exhaustive accounting of published illumination modes in LLSM and found that the SIM mode had only been used in two additional peer-reviewed publications at that time. We will consider updating this table in the revised manuscript and will expand our discussion to acknowledge the broader flexibility of the LLSM platform—including its capacity for structured illumination and alternative light-sheet geometries. However, we will also emphasize that, despite these advanced capabilities, the square lattice remains the dominant mode used by the community and therefore serves as a fair and practical benchmark for comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we will include a demonstration of live-cell imaging to directly validate the system’s suitability for dynamic biological applications. We will also characterize the temporal resolution of the system. As a sample-scanning microscope, the imaging speed is primarily limited by the performance of the Z-piezo stage. For simplicity and reduced optoelectronic complexity, we currently power the piezo through the ASI Tiger Controller. We will expand the supplementary material to describe the design criteria behind this choice, including potential trade-offs, and provide data quantifying the achievable volume rates under typical operating conditions.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a baseline familiarity with optics and instrumentation. Our goal is not to eliminate this requirement entirely, but to significantly reduce the technical and logistical barriers that typically accompany custom light-sheet microscope construction.

      Importantly, no machining experience or in-house fabrication capabilities are required—users can simply submit provided design files and specifications directly to the vendor. We will make this process as straightforward as possible by supplying detailed instructions, recommended materials, and vendor-ready files. Additionally, we draw encouragement from the success of related efforts such as mesoSPIM, which has seen over 30 successful implementations worldwide using a similar model of exhaustive online documentation, open-source control software, and community support through user meetings and workshops.

      We recognize that documentation alone is not always sufficient, and we are committed to further lowering barriers to adoption. To this end, we are actively working with commercial vendors to streamline procurement and reduce the logistical burden on end users. Additionally, Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant, which provides dedicated resources for hosting workshops, offering real-time community support, and generating supplementary materials such as narrated video tutorials. We will expand our discussion in the revised manuscript to better acknowledge these implementation challenges and outline our ongoing strategies for supporting a broad and diverse user base.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We appreciate the reviewer’s comment and agree that our language regarding adaptability may have been too strong. It was not our intention to suggest that the system can be easily modified without prior experience. Meaningful adaptations of the optical or mechanical design would require users to have expertise in optical layout, optomechanical design, and alignment.

      That said, for labs with sufficient expertise, we aim to facilitate such modifications by providing comprehensive resources—including detailed Zemax simulations, CAD models, and alignment documentation. These materials are intended to reduce the development burden for those seeking to customize the platform for specific experimental needs.

      In the revised manuscript, we will clarify this point and explicitly state in the discussion what technical expertise is required to modify the system. We will also revise our language around adaptability to better reflect the intended audience and realistic scope of customization.

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      Weaknesses:

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signal-to-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we will expand our discussion to include a broader range of light-sheet microscope designs and imaging modes, including both single- and dual-objective configurations. We agree that highlighting the trade-offs between these approaches—such as working distance, sample geometry constraints, and alignment complexity—will enhance the overall context and utility of the manuscript.

      To further aid comparison, we will include a summary table referencing key image quality parameters such as lateral and axial resolution, and illumination beam NA for Altair-LSFM. Where available, we will reference values from published work—such as the axial resolution reported in Valm et al. (Nature, 2017)—to provide a clearer benchmark. Because such comparisons can be technically nuanced, especially when comparing across systems with different geometries and sample mounting constraints, we will also include a supplementary note outlining the assumptions and limitations of these comparisons.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We thank the reviewer for highlighting this important consideration. In the revised manuscript, we will provide a detailed description of how temperature control can be implemented using flexible adhesive heating elements, a power supply, and a PID controller. Step-by-step assembly instructions and recommended components will be included to facilitate adoption by users interested in live-cell imaging. We also note that most light-sheet microscopy systems capable of sub-cellular resolution—including the original LLSM design, diSPIM, and ASLM—typically do not incorporate integrated CO<sub>2</sub> or humidity control. These systems often rely on HEPES-buffered media to maintain pH stability, which is generally sufficient for short- to intermediate-term imaging. While full environmental control may be necessary for extended time-lapse studies, it is not a prerequisite for high-resolution volumetric imaging in many applications. Nonetheless, we will include a discussion of the challenges associated with adding CO<sub>2</sub> and humidity control to open or semi-enclosed architectures like Altair-LSFM, and outline potential future paths for integration with commercial incubation systems.

      (3) System cost and data storage cost: While the system presented has the advantage of being open-source, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We thank the reviewer for raising this important point. First, we would like to clarify that the quoted $150k cost estimate includes the optical table and laser source. We apologize for any confusion and will communicate this more effectively in the revised manuscript.

      We agree that adaptability is a key concern, especially in academic settings with limited budgets. The detection path can be readily altered depending on experimental needs and cost constraints. For example, in our discussion of alternatives to the 5 mm coverslip geometry, we will describe how switching to a Zeiss W Plan-Apochromat 20x/1.0 in combination with a compatible excitation objective allows high-resolution imaging while accommodating more conventional sample formats. We will expand this to include cost-effective alternatives as well.

      We will also expand our discussion on cost-reduction strategies and the associated trade-offs. These include replacing motorized stages with manual ones, omitting the filter wheel in favor of a multi-band emission filter, or using industrial-grade cameras in place of scientific CMOS detectors. While each change entails some loss in functionality or sensitivity, such modifications allow users to tailor the system to their specific budget and application.

      Finally, we recognize the challenge in communicating exact costs of commercial systems due to variability in configuration and pricing. Nonetheless, we will include approximate figures where possible and note that comparable commercial systems—such as LLSM platforms from 3i and Zeiss—are several-fold more expensive than the system presented here.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      Data storage is indeed a critical consideration in light-sheet microscopy. In the revised manuscript, we will provide a note outlining typical volume dimensions for live-cell imaging experiments along with the associated data overhead. This will include estimates for voxel counts, bit depth, time-lapse acquisitions, and multi-channel datasets to help users anticipate storage needs. We will also briefly discuss strategies for managing large datasets, file types and compression formats.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      References

      (1) Moore, R. P. et al. A multi-functional microfluidic device compatible with widefield and light sheet microscopy. Lab Chip 22, 136-147 (2021). https://doi.org/10.1039/d1lc00600b

      (2) Lamb, J. R., Mestre, M. C., Lancaster, M. & Manton, J. D. Direct-view oblique plane microscopy. Optica 12, 469-472 (2025). https://doi.org/10.1364/OPTICA.558420

      (3) Liu, T. L. et al. Observing the cell in its native state: Imaging subcellular dynamics in multicellular organisms. Science 360 (2018). https://doi.org/10.1126/science.aaq1392

      (4) Sapoznik, E. et al. A versatile oblique plane microscope for large-scale and high-resolution imaging of subcellular dynamics. eLife 9 (2020). https://doi.org/10.7554/eLife.57681

      (5) Huisken, J. & Stainier, D. Y. Even fluorescence excitation by multidirectional selective plane illumination microscopy (mSPIM). Opt Lett 32, 2608-2610 (2007). https://doi.org/10.1364/ol.32.002608

      (6) Ricci, P. et al. Removing striping artifacts in light-sheet fluorescence microscopy: a review. Prog Biophys Mol Biol 168, 52-65 (2022). https://doi.org/10.1016/j.pbiomolbio.2021.07.003

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mollá-Albaladejo et al. investigate the neurons downstream of GR64f and Gr66a, called G2Ns. They identify downstream neurons using trans-Tango labeling with RFP and then perform bulk RNA-seq on the RFP-sorted cells. Gene expression is up- or downregulated between the cell populations and between fed and starved states. They specifically identify Leukocinin as a neuropeptide that is upregulated in starved Gr66a cells. Leucokinin cells, identified by a GAL4 line indeed show higher expression when starved, especially in the SEZ. Furthermore, Leucokinin cells colocalize with the transTango signal from downstream neurons of both GRs. This connection is confirmed with GRASP. According to EM data, Leucokinin cells in the SEZ receive a lot of input and connect to many downstream neurons. In behavior experiments performed with flies lacking Leucokinin neurons, flies show reduced responsiveness to sugar and bitter mixtures when starved. The authors suggest that Leucokinin neurons integrate bitter and sugar tastes and that their output is modified by a hunger state.

      Strengths:

      The authors use a multitude of tools to identify SELK neurons downstream of taste sensory neurons and as starvation-sensitive cells. This study provides an example of how combining genetic labeling, RNA-seq, and EM analysis can be combined to investigate neural circuits.

      Weaknesses:

      The authors do not show a functional connection between sensory neurons and SELK neurons. Additionally, data from RNA seq, anatomical studies, and EM analysis are sometimes contradictory in terms of connectivity. GRASP signal is not foolproof that cells are synaptically connected.

      We appreciate the reviewer’s comments. Unfortunately, we have not successfully demonstrated a functional response of SELK neurons using in vivo calcium imaging with UAS-GCaMP7 (we tried f, m, and s versions), primarily due to challenges in obtaining stable signals. We stimulated GRNs using sucrose, caffeine, or a mixture of both, and maybe even if the concentrations were high, they were not enough to induce a response.

      Regarding GRASP, we acknowledge its limitations as a standalone technique for establishing genuine synaptic connections between neurons, as some signals may reflect false positives resulting from the mere proximity of the candidate neurons. To strengthen our findings, we complemented these results by demonstrating the positive colocalization of the Leucokinin antibody signal over the Gr66aGal4>trans-TANGO and Gr64f-Gal4>trans-TANGO (Figure 4), confirming that Leucokinin neurons are indeed postsynaptic to both sweet and bitter GRNs. Moreover, we incorporated BacTrace data to highlight the direct connectivity between sweet and bitter GRNs (now Figure 5E).

      In the revised manuscript, we have introduced the active-GRASP technique (Macpherson et al., 2015). In this version of GRASP, the presynaptic half of GFP (GFP 1-10) is fused to synaptobrevin, which becomes accessible in the membrane of the presynaptic neuron within the synaptic cleft upon presynaptic stimulation (in our case, by stimulating with sucrose sweet Gr64f<sup>GRNs</sup> and with caffeine the bitter Gr66a<sup>GRNs</sup>). Utilizing this technique, we successfully demonstrated (see new Figure 5B and 5D) that when presented with water, no signal was detected in the Gr66a-LexA, Lk-Gal4 > active-GRASP, or Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies. However, in the presence of caffeine, Gr66aLexA, Lk-Gal4 > active-GRASP transgene flies exhibited a clear signal in the SEZ, and similarly, sucrose presentation to Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies yielded a detectable signal. The results obtained from active-GRASP provide additional evidence supporting the connectivity between SELK neurons and both Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup>, further indicating the functional connectivity of the GRNs and SELK neurons.

      The authors describe a behavioral phenotype when flies are starved, however, they do not use a specific driver for the described cell type, thus they should also tone down their claims.

      We agree with the reviewer that the Lk-Gal4 driver line used labels SELK, LHLK, and ABLK neurons. The behavior examined in this paper, the Proboscis Extension Response (PER), measures the initiation of feeding. Although the neural circuit involved in this behavior is primarily confined to the SEZ where SELK neurons are located, we cannot rule out the possibility that other Lk neurons may also play a role in the process. To restrict expression of the Tetanus Toxin, we have utilized the tsh-Gal80 (Clyne et al., 2008) transgene in combination with the Lk-Gal4>UAS-TNT and Lk-Gal4>UAS-TNT<sup>imp</sup> constructs to prevent the expression of the Tetanus Toxin in ABLK neurons, thereby restricting its expression to the SELK and LHLK neurons in the central brain. The new results (Sup Figure 7A) indicate that ABLK neurons do not play a role in integrating sweet and bitter information. However, we acknowledge the reviewer's point that we are still silencing LHLK neurons, so we have adjusted our claims to align more closely with our data

      Generally, the authors do not provide a big advancement to the field and some of the results are contradictory with previous publications.

      We believe our work does not contradict previous findings, nor does it invalidate the role of ABLK neurons in water homeostasis or the role of LHLK neurons in regulating sleep via starvation. We provide additional information on the possible role of SELK neurons in integrating gustatory information. The location of SELK neurons in the SEZ suggests that they may play a role in feeding behavior, and we have demonstrated that these neurons are indeed involved in integrating gustatory information to influence feeding decisions. We consider we have contributed by highlighting a new role for the Leucokinin neuropeptide in feeding behavior.

      Reviewer #2 (Public review):

      Summary:

      A core task of the brain is processing sensory cues from the environment. The neural mechanisms of how sensory information is transmitted from peripheral sense organs to subsequent being processing in defined brain centers remain an important topic in neuroscience. The taste system hereby assesses the palatability of food by evaluating the chemical composition and nutrient content while integrating the current need for energy by assessing the satiation level of the organism. The current manuscript provides insights into the early circuits of gustatory coding using the fruit fly as a model. By combining trans-tango and FACS- based bulk RNAseq to assess the target neurons of sweet sensing (using Gr64fGal4) and bitter sensing (using Gr66a-Gal4) in a first set of experiments the authors investigate genes that are differentially expressed or co-expressed in normal and starved conditions. With a focus on neuropeptides and neurotransmitters, different expressions in the different conditions were assessed resulting in the identification of Leucokinin as a potentially interesting gene. The notion is further supported by RNAseq of Lk- Gal4>mCD8:GFP sorted cells and immunostainings. GRASP and BacTrace experiments further support that the two Lk- expressing cells in the SEZ should indeed be postsynaptic to both types of sensories. Using EM-based connectomics data (based on a previous publication by Engert et al.), the authors also look for downstream targets of the bitter versus sweet gustatory neurons to identify the Lk-neurons. Based on the morphology they identify candidates and further depict the potential downstream neurons in the connectome, which appears largely in agreement with GRASP experiments. Finally silencing the Lk- neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding in a FlyPad assay. Strengths:

      Overall this is an intriguing manuscript, which provides insight into the organization of 2nd order gustatory neurons. It specifically provides strong evidence for the Lk-neurons as a target of sweet and bitter GRNs and provides evidence for their role in regulating sweet vs bitter-based behavioral responses. Particularly the integration of different techniques and datasets in an elegant fashion is a strong side of the manuscript. Moreover to put the known LK-neurons into the context of 2nd order gustatory signalling is strengthening the knowledge about this pathway.

      Weaknesses:

      I do not see any major weakness in the current manuscript. Novelty is to some degree lessened by the fact, that the RNAseq approach did not identify new neurons but rather put the known LK-neurons as major findings. Similarly, the final behavioral section is not very deep and to some degree corroborates the previous publication by the Keene and Nässel labs - that said, the model they propose is indeed novel (but lacks depth in analyses; e.g. there is no physiology that would support the modulation of Lk neurons by either type of GRN). The connectomic section appears a bit out of place and after reading it it's not really clear what one should make of the potential downstream neurons (particularly since the Lk-receptor expression has been previously analyzed); here it might have been interesting to address if/how Lk-neurons may signal directly via a classical neurotransmitter (an information that might be found easily in the adult brain single-cell data).

      We thank the reviewer for the comment. Indeed, we attempted in vivo Ca imaging but were unsuccessful. We have rewritten the connectomic section to better integrate it with the rest of the text and have reanalyzed the data obtained. We considered gathering data from the single-cell adult dataset, but this dataset includes the entire adult fly brain, encompassing SELK and LHLK neurons, making it impossible to differentiate between the two types of Lk neurons. Any further analysis will require transcriptomic analysis of SELK via scRNAseq under the different metabolic conditions tested in this study work.

      Reviewer #3 (Public review):

      Summary:

      To make feeding decisions, animals need to process three types of information: positive cues like sweetness, negative cues like bitterness, and internal states such as hunger or satiety. This study aims to identify where the information is integrated into the fruit fly brain. The authors applied RNA sequencing on second-order gustatory neurons responsible for sweet and bitter processing, under fed and starved conditions. The sequencing data reveal significant changes in gene expression across sweet vs. bitter pathways and fed vs. starved states. The authors focus on the neuropeptide Leucokinin (Lk), whose expression is dependent on the starvation state. They identify a pair of neurons, named SELK neurons, which express Lk and receive direct input from both sweet and bitter gustatory neurons. These SELK neurons are ideal candidates to integrate gustatory and internal state information. Behavioral experiments show that blocking these neurons in starved flies alters their tolerance to bitter substances during feeding.

      Strengths:

      (1) The study employs a well-designed approach, targeting specific neuronal populations, which is more efficient and precise compared to traditional large-scale genetic screening methods.

      (2) The RNAseq results provide valuable data that can be utilized in future studies to explore other molecules beyond Lk.

      (3) The identification of SELK neurons offers a promising avenue for future research into how these neurons integrate conflicting gustatory signals and internal state information.

      Weaknesses:

      (1) Unfortunately, due to technical challenges, the authors were unable to directly image the functional activity of SELK neurons.

      (2) In the behavioral experiments, tetanus toxin was used to block SELK neurons. Since these neurons may release multiple neurotransmitters or neuropeptides, the results do not specifically demonstrate that Leucokinin (Lk) is the critical factor, as suggested in Figure 8. To address this, I recommend using RNAi to inhibit Lk expression in SELK neurons and comparing the outcomes to wild-type controls via the PER assay.

      We appreciate the author's comments and suggestions. As noted, Tetanus Toxin silences the neuron’s activity, affecting the functioning of various neurotransmitters and neuropeptides released by the targeted neuron. In response to the reviewer's recommendation, we employed an RNAi line specifically designed to silence Leucokinin production in Lk-expressing neurons.

      The results presented in Supplementary Figure 7B demonstrate that knocking down Leucokinin in Lk neurons significantly reduces the flies' tolerance to caffeine in sweet food.

      It is crucial to highlight that the sucrose concentration used in Figure 7C was 50mM, whereas in Supplementary Figure 7B, it was increased to 100mM. This adjustment was necessary because the Lk-Gal4, UAS-RNAi, and Lk-Gal4>UAS-RNAi transgenic lines exhibited reduced sensitivity to sucrose compared to the Lk-Gal4>UAS-TNT or Lk-Gal4>UAS-TNT<sup>imp</sup> lines. We aimed to establish a sucrose concentration that would elicit a 50% Proboscis Extension Response (PER) without adding any other compound, thereby allowing us to evaluate the additional effect of caffeine in the food.

      However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To get more evidence for connections between sensory cells and SELK neurons, could the authors also analyze a second available EM data set? Would setting a different threshold (>5 synapses) reveal connections to both sensories? Comparisons between SELK in- and outputs from EM data and Tango labeling also seem to differ quite a lot based on provided images - can the authors count cell bodies in the stainings? Further proof would be to provide functional imaging data that shows that SELK neurons respond to sugar and bitter compounds.

      In this study, we utilized the recently published EM dataset for the Drosophila central brain connectome (Dorkenwald et al., 2024; Flywire.ai). Changing the number of synapses affects the counts of pre- and postsynaptic neurons. We set a threshold of more than five synapses, as recommended by Flywire, to avoid false positives (Dorkenwald et al., 2024). This threshold has been widely used in recent papers (Engert et al., 2022; Shiu et al., 2022; Walker et al., 2025).

      The neuron counts in the connectomic data differ from those in the trans- and retro-TANGO experiments. In our initial trans-TANGO experiment, which labeled postsynaptic neurons in the Gr64fGal4 and Gr66a-Gal4 transgenic lines, we counted the labeled neurons (see Supplementary Figure 1C) and observed considerable variability between different brains. Due to anticipated variability, we did not count the labeled neurons from trans-TANGO and retro-TANGO techniques in the Leucokinin neurons. Furthermore, neither technique labels all postsynaptic or presynaptic neurons, respectively. A recent study on the retro-TANGO technique (Sorkac et al., 2023) found a minimum threshold: the presynaptic neuron must form a certain number of synapses with the neuron of interest to be adequately labeled. According to this paper, the established threshold is 17 synapses. It is likely that the trans-TANGO technique also has a threshold relating to the number of labeled neurons, contingent on the synapse count. This would explain the discrepancy between the two results.

      Unfortunately, we have not been able to provide functional data pointing to the activation of SELK neurons by sucrose or caffeine. However, our active-GRASP data indicates that the connectivity between Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup> with SELK neurons is present and functional.

      How many Leucokinin-positive cells are in the SEZ? Does the RNA-seq data provide further information about the SELK neurons? Potential receptor candidates for how they integrate hunger signals? AMPKa was described to be required in LHLK neurons.

      There are two SELK neurons in the SEZ. Due to the nature of our bulk RNA sequencing (RNAseq), we cannot link any additional gene expressions detected in our transcriptomic analysis specifically to the SELK neurons regarding the integration of various signaling processes. Furthermore, the single-cell RNA sequencing (scRNAseq) data available from the Drosophila brain, as reported by Li et al. (2022), does not allow accurate differentiation between SELK and LHLK neurons. To understand how these neurons integrate both metabolic and sensory information, it is crucial to conduct a focused RNAseq study specifically on the SELK neurons to understand how these neurons integrate both metabolic and sensory information. This targeted analysis would provide the necessary insights to elucidate their functional roles better. However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      According to previous studies (Yurgel et al., 2019), the Lk-GAL4 line is also expressed in the VNC, thus the authors could make use of the tsh-GAL80 tool to clean up the line. This study also performed GCaMP imaging in fed and 24h starved animals in SELK and couldn't find a difference, can the authors explain this discrepancy?

      We thank the reviewer for this suggestion. We have now added a new piece of data using the tsh-Gal80 transgene in our PER experiments (Supplementary Figure 7A). Blocking the expression of TNT in the ABLK neurons does not affect the main conclusion of the behavioral results. As stated previously, we were unable to obtain in vivo Ca imaging responses in SELK neurons upon exposure to sucrose, caffeine, or mixtures of sucrose and caffeine. We do not believe this is a discrepancy with previous works like Yurgel et al., 2019. It is likely that we faced technical issues regarding expression stability and that the stimulation was possibly too weak to detect changes in GFP levels

      Reviewer #2 (Recommendations for the authors):

      As mentioned above I do not have any major comments on the manuscript, but there are a few points that I feel should be considered:

      (1) The identification of the Lk-candidate neurons in the connectome remains a bit mysterious. In the method sections, this reads as follows "manual and visual criteria were applied to identify the neurons of interest ". a) What precisely was done to get to the candidates?b) Are there alternative candidates that may be Lk-neurons? c) How would another neuron affect the conclusion of the downstream analysis?

      We thank the reviewer for this comment. We have now modified and added new information in the connectomic section, reinforcing our conclusions and correcting the results obtained.

      Our GRASP, BacTRace, and immunohistochemistry experiments pointed to SELK neurons as postsynaptic to both Gr64f<sup>GRNs</sup> (sweet) and Gr66a<sup>GRNs</sup> (bitter). To identify which neurons in the connectome could be the SELK neurons, we utilized a previously described set of GRNs already identified in the connectome (Shiu et al., 2022). We extracted all postsynaptic neurons to the sweet and bitter GRNs identified and intersected both datasets, retaining only those candidate hits receiving simultaneous input from sweet and bitter GRNs. This process yielded a total of 333 hits. Through visual inspection, we discarded all hits that were merely neuronal fragments or neurons that clearly were not our candidates. We narrowed the list down to a final set of 17 candidate neurons whose arborization was located in the SEZ. We reduced the candidates to two final entries from this list: ID 720575940623529610 (GNG.276) and ID 720575940630808827 (GNG.685). The GNG.276 neuron had a counterpart in the SEZ identified as GNG.246. Both of these neurons were annotated as DNg70 in the Flywire database. GNG.685 had a counterpart identified as GNG.595, and these two neurons were classified as DNg68. In both cases, the neuronal candidates, DNg70 and DNg68, were classified as descending neurons, a characteristic of previously described SELK neurons (Nässel et al., 2021). In our initial analysis published in bioRxiv and sent for revision, we identified DNg70 as potentially the SELK neurons based solely on the morphology of the neurons via visual inspection. However, we employed a better method to determine which candidate is more likely to be the SELK neurons, concluding that DNg68, rather than DNg70, represents the SELK neurons. Briefly, we performed an immunohistochemistry for GFP in the Lk-Gal4>UAS-CD8:GFP flies. We aligned the resulting image in a Drosophila reference brain (JRC2018 U) using the CMTK Registration plugin in ImageJ. The resulting image was skeletonized using the Single Neurite Tracer plugin in ImageJ and later uploaded to the Flywire Gateway platform to compare the structure of the aligned and skeletonized SELK neurons to our candidates. This comparison clearly indicated that the DNg68 neurons are the best candidates for representing the SELK neurons, rather than DNg70. We have updated the text and Figures 6 and Supplementary Figure 6 to reflect the new results. These new results do not alter the conclusions of the paper.

      (2) In the transcriptomic experiments It seems that the raw transcripts are reporters, rather than normalised data. Why?

      All transcriptomic data is normalized. In Figure 1 the differential expression was calculated using Deseq2 normalized counts. In Figure 2, Transcripts Per Million (TPM) were calculated using the Salmon package and normalized for the gene length.

      (3) The expression of nAChRbeta1 in the transcriptomic data is rather striking. However, this remains currently not addressed: is this expression real?

      We have not confirmed the upregulation or downregulation in gene expression for other but for Leucokinin, which is our main interest. We found the presence of nAChRbeta1 interesting, as GRNs are cholinergic (Jaeger et al., 2018), suggesting that it would make sense to find cholinergic receptors in G2Ns. However, it is possible that these receptors are expressed in all G2Ns and serve as a common means of communication.

      (4) The description of the behavioural experiments in the results section is rather brief. I had a hard time following it since the genotypes are not repeated nor is it stated what is different in the experimental group vs control (but instead simply what changes in the experimental group, in a rather discussion-like fashion).

      We thank the reviewer for the comment, we have rewritten this section to improve its clarity.

      (5) If I understand the genetics for the behavioural experiments correctly it addresses the entire Lk-Gal4 expressing population, thus it is not possible to describe the role of the two SEZ neurons, but rather LkGal4 neurons. This should be clarified.

      We thank the reviewer for this comment. Indeed, the Lk-Gal4 driver we used drives expression in all Leucokinin neurons, making it impossible to distinguish between the SELK, LHLK, or ABLK neurons. We have added a new piece of behavioral data by using the tsh-Gal80 transgene to prevent the expression of TNT in the ABLK neurons (Supplementary Figure 7A), but still we cannot distinguish between SELK and LHLK. We have rewritten the text to clarify this fact.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript is well-written, I only have one minor suggestion for improvement. In Figure 8C, please clarify the use of TNT to block Lk release.

      We thank the reviewer for the comment, we have clarified the use of TNT in the text.

      References Clyne, J. D. & Miesenböck, G. Sex-Specific Control and Tuning of the Pattern Generator for Courtship Song in Drosophila. Cell 133, 354–363 (2008).

      Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124–138 (2024).

      Engert, S., Sterne, G. R., Bock, D. D. & Scott, K. Drosophila gustatory projections are segregated by taste modality and connectivity. Elife 11, e78110 (2022).

      Jaeger, A. H. et al. A complex peripheral code for salt taste in Drosophila. Elife 7, e37167 (2018).

      Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015).

      Nässel, D. R. Leucokinin and Associated Neuropeptides Regulate Multiple Aspects of Physiology and Behavior in Drosophila. Int J Mol Sci 22, 1940 (2021).

      Shiu, P. K., Sterne, G. R., Engert, S., Dickson, B. J. & Scott, K. Taste quality and hunger interactions in a feeding sensorimotor circuit. eLife 11, e79887 (2022).

      Walker, S. R., Peña-Garcia, M. & Devineni, A. V. Connectomic analysis of taste circuits in Drosophila. Sci. Rep. 15, 5278 (2025).

    1. Author response:

      Reviewer #1:

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      We thank the reviewer for this insightful comment. We agree that signals from the same neuron may be collected by adjacent channels. To address this concern in our software, we plan to add a routine to SpikeMAP that allows users to discard nearby channels where spike count correlations exceed a pre-determined threshold. Because there is no ground truth to map individual cells to specific channels on the hd-MEA, a statistical approach is warranted.

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      This is a valid concern. To ensure that firing rates are relatively constant over the duration of a recording, we will plot average spike rates using rolling windows of a fixed duration. We expect that population firing rates will remain relatively stable across the duration of recordings.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We agree that further cycles of experiments could be performed with SOM, VIP, and other neuronal subtypes, and we hope that researchers will take advantage of SpikeMAP too. We will clarify this possibility in the Discussion section of the manuscript.

      Reviewer #2:

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      We thank the reviewer for this comment. As detailed in Table 1, SpikeMAP is the only method that performs E/I sorting on large-scale multielectrodes, hence a comparison to competing methods is not currently possible. That being said, many of the pre-processing steps of SpikeMAP (Figure 1) involve methods that are already well-established in the literature and available under different packages. To highlight the contribution of our work and facilitate the adoption of SpikeMAP, we plan to provide a “modular” portion of SpikeMAP that is specialized in performing E/I sorting and can be added to the pipeline of other packages such as KiloSort more clearly.  This modularized version of the code will be shared freely along with the more complete version already available.

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      We agree with the reviewers that there are indeed similarities between our work and the Hilgen et al. paper. However, while the latter employs optogenetics to stimulate neurons on a large-scale array, their technique does not specifically target inhibitory (e.g., PV) neurons as described in our work. We will clarify our paper accordingly.

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      The title of our work will be edited to make it clear that while elements of the pipeline are well-established and available from other packages, we are the first to extend this pipeline to E/I sorting on large-scale arrays.

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution, might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer and will point out limits of the center-of-mass algorithm based on the article of Scopin et al (2024). Further, we will augment the existing code library to include monopolar triangulation or grid-based convolution as options available to end-users.

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We will clarify these points. Specifically, the value of 90kHz was chosen because it provided a reasonable temporal characterization of spikes; this value, however, can be adjusted within the software based on user preference.

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We will re-check Fig.2B which seems to have error in rendering, likely due to conversion from its original format.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      Here, the reviewer is suggesting that it may be better to perform PCA on several channels at once, since spikes can occur at several channels at the same time. To address this concern, small routine will be written allowing users to choose how many nearby channels to be selected for PCA.

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      It is true that k=2 is a pre-determined choice in our software. In practice, we found that k>2 leads to poorly defined clusters. However, we will ensure that this parameter can be adjusted in the software. Furthermore, if the user chooses not to pre-define this value, we will provide the option to use a Calinski-Harabasz criterion to select k.

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We share the reviewer’s concern and will add results that include a population of neurons to assess the robustness of this phenomenon.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      We applied stringent criteria to exclude cells, and we will revise the main text to be clear about these criteria, which include a minimum spike rate and the use of LDA to separate out PCA clusters. For the cells that were retained, we will include SNR estimates.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.       

      We will include a comparison of firing rates for E and I neurons. It is possible that I cells are located at the border of the MEA due to the site of injections of the viral vector, and not because of an anatomical clustering of I cells per se. We will clarify the text accordingly.

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      As mentioned previously, Kilosort and related approaches do not address the problem of E/I identification (see Table 1). However, they do have pre-processing steps in common with SpikeMAP. We will add some specific comparison points – for instance, the use of k-means and PCA (which is more common across packages) and the use of cubic spline interpolation (which is less common). Further, we will provide a stand-alone E/I sorting module that can be added to the pipeline of other packages, so that users can use this functionality without having to migrate their entire analysis.

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      We apologize for this issue. It seems there was a rendering problem when converting the figure from its original format. We will address this issue in the revised version of the manuscript.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      We will mention how many flashes/animals/slices were employed in the GT data and provide open access to these data.

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We thank the reviewer for the suggestion that SpikeMAP could be tested on artificially generated spike trains and will add the citation of the two papers mentioned. We hope future efforts will employ SpikeMAP on both synthetic and experimental data to explore the neural dynamics of E and I neurons in healthy and pathological circuits of the brain.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and support the conclusions drawn in the discussion.

      Comments on latest version:

      The third version of this paper has been substantially improved. The English is significantly better, there are only few paragraphs and sentences which are hard to understand (see my comments and suggestions below). Almost all of my suggestions were incorporated.

      We would like to thank the reviewer for the comments and incorporated the suggestions within the latest version of the manuscript.

      Remaining minor concerns:

      About epileptic and non-epileptic (non-affected) tissue. I am aware that temporal lobe neocortical tissue derived from epileptic patients is regarded as non-affected by many groups, and they are quite similar to the cortex of non-epileptic (tumour) patients in their electrophysiological properties and synaptic physiology. But please, note, that one paper you cited did not use samples from epileptic patients, but only tissue from non-epileptic tumor patients (Molnár et al. PLOS 2008).

      When you look deeper, and make thorough comparison of tissues derived from epileptic and non-epileptic patients, there are differences in the fine structure, as well as in several electrophysiological features. See for example Tóth et al., J Physiol, 2018, where higher density of excitatory synapses were found in L2 of neocortical samples derived from epileptic patients compared to non-epileptic (tumor) patients. Furthermore, the appearance of population bursts is similar, but their occurrence is more frequent and their amplitude is higher in tissue from epileptic compared to non-epileptic patients. So, I still cannot agree, that temporal neocortex of epileptic patients with the seizure focus in the hippocampus would be non-affected. Therefore I suggested to use the term biopsy tissue.

      We are thankful for this comment on using non-epileptic tissue also by others. We are also aware that Molnár et al. 2008 worked with tumor tissue.

      It is still not emphasized in the first paragraph of the Discussion, that only excitatory axon terminals were investigated.

      We now mentioned in the first paragraph of the discussion that only excitatory synaptic boutons were investigated.

      The text in the Results and the Discussion are somewhat inconsistent.

      The last two paragraphs of the Results section ends with several sentences which should be part of the discussion, such as line 328: This finding strongly supports multivesicular release... or line 344: --- pointing towards a layer-specific regulation of the putative RRP. Moreover, the results suggest that... and line 370: ... it is most likely... Please, correct this.

      We disagree with the reviewer on these points because these sentences summarizes the findings.

      The first paragraph of the Discussion summarizes the work of the quantitative EM work and gives one conclusion about the astrocytic coverage. This last sentence is inconsistent with the other parts of the paragraph. I would either write that "astrocytic coverage was also investigated" (or something similar), or move this sentence to the paragraph which discusses the astrocytic coverage.

      Results line 180-183. "Special connections" between astrocytic processes and synaptic boutons are mentioned, but not shown. Either show these (but then prove with staining!), or leave out this paragraph.

      We deleted this paragraph as suggested.

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required and answered all of my concerns, included additional data sets, and clarified statements where needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor suggestions:

      Synaptic density, lines 189-193. If you say "comparatively" high, then compare to something (cite your own work for the other layers, and tell the approximative values for the other layers). Same in line 194 comparably high to what? Other option: say "relatively high".

      We corrected the sentences as suggested by the reviewer.

      Line 206: When present, mitochondria (comma missing)

      Corrected as suggested by the reviewer.

      Line 265: Dot is missing at the end of the sentence (after Shapira et al. 2003)

      Corrected as suggested by the reviewer.

      Lines 300-301: Check the English for this sentence: significant difference BETWEEN TWO sublaminae and not significant difference for both sublaminae.

      Corrected as suggested by the reviewer.

      Lines 304-305: Check the sentence, please, it is not understandable without the text in parenthesis.

      Corrected as suggested by the reviewer.

      Line 354 Dot missing at the end of the sentence (after Figure 6A, B)

      Corrected as suggested by the reviewer.

      Line 354-358: Please rephrase this sentence (too complicated, not understandable). I do not understand why results of the L4, L5, L6 are described here. What does it mean "Astrocytes and their fine processes formed a relatively dense, but a comparably loose network within the neuropil in L1"? Dense or loose?

      In the experiment measuring the volume fraction of astrocytic processes (Figure 6C), all six cortical layers were analyzed, thus we compared the values obtained for L1 with the results for L4, L5 and L6. For more clarity, we rephrased the sentence: “Astrocytes and their fine processes formed a relatively dense network in L4 and L5, but a comparably loose one within the neuropil in L1…” We also rephrased other sentences in this paragraph (as also suggested below).

      Lines 359-369: Please rephrase this paragraph. The sentences are too complicated, have too many parentheses, and are not understandable. I suggest to write first how many synapses were examined in L1 and L4, then how many of them were on spine and on dendrites (either n or %). Then give the values how many (n or %) of them were "tripartite synapses", out of spine synapses and of dendritic synapses in both layers. How many of them were partially covered in both layers. Please, write the data in a systematic way. The best would be to give the values in a table as well. This way it will be more understandable (now, it is chaotic, hard to follow).

      We rephrased the paragraph and added a new table (3).

      Line 383: Dot missing from the end of the sentence.

      Corrected as suggested by the reviewer.

      Line 436: Reconsider "comparably low compared to". The comparably means what in this case? The whole paragraph is hard to understand, please, check and review for improvements to the use of English or use chatGPT to check it.

      We corrected the sentence according to the reviewer’s suggestion.

      Line 487: Same thing again: "The comparably largest size of the RP in L1 when compared..." What would you like to say with "comparably"? Check the meaning of this word in a dictionary, please. I have the feeling that you are using this word instead of "relatively".

      Corrected as suggested by the reviewer.

      Line 488 "and TO that found fot L4 and L5 in rodents..."

      Corrected as suggested by the reviewer.

      Line 493-495: Same again, comparably when compared, correct, please.

      Corrected as suggested by the reviewer.

      Supplemental figures: Now I do understand why Hu-01 and Hu-02 are twice, and I think, 3 patients were examined for L1a and three for L1b. But which side is which on the subfigures? Left side (Hu-01, 02 03) was used for L1a, or L1b? Could you write this in the legend, or mark on the figure (at least at one subfigure), please?

      We implemented a comment for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling.

      There is some direct evidence of reversible beta cell inactivation in rodent / in vitro models. We had already mentioned this in the discussion, but we have added some text emphasizing / clarifying the role of this evidence (lines 359–362).

      Others have also argued that some analyses of insulin treatment in conventional T2D, which has a stronger effect in patients with higher glucose before treatment, provides indirect evidence of reversal of glucotoxicity. We have also mentioned this in the revised paper (lines 284–285).

      For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a ”pre-KPD” condition? What is known about the disease’s timescale?

      The answers to some of these questions are not entirely clear—patients present with very high glucose, and thus must be treated immediately. Due to a lack of antecedent data it is not entirely clear what the pre-KPD condition is, but there is some evidence that KPD is at least not preceded by diabetes symptoms. This point is already noted in the introduction of the paper and Table 1. However, we have added a small note clarifying that this does not rule out mild hyperglycemia, as in prediabetes (and indeed, as our model might predict) (lines 76–77). Similarly, due to the necessity of immediate insulin treatment, it is not clear from existing data whether the disorder manifests more strongly in fasting glucose or glucose response, although it is likely in both. (We might infer this since continuous insulin treatment does not produce fasting hypoglycemia, and the complete lack of insulin response to glucose shortly after presentation should produce a strong effect in glycemic response.) We believe our existing description of KPD lists all of the relevant timescales, however we have also slightly clarified this description in response to the first referee’s comments (lines 66–73, 83)

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the ”honeymoon” phase of T1D. Making a strong case for the model in this scenario would be fascinating.

      This is an excellent idea, which had not occurred to us. We have briefly discussed this possibility in the remission (lines 281–291), but plan to analyze it in more detail in a future manuscript.

      Reviewer #1 (Recommendations for the author):

      Whenever simulation results are presented, parameter values should be specified right there in the figure captions.

      We have added the values of glucotoxicity parameters to the caption of Figure 2. In other figures, we have explicitly mentioned which panel of Figure 2 the parameters are taken from. Description of the non-glucotoxicity parameters is a bit cumbersome (there are a lot of them, but our model of fast dynamics is slightly different from Topp et al. so it does not suffice to simply say we took their parameters) so we have referred the reader to the Materials and Methods for those.

      I was confused by the language in Figure 4. Could the authors clarify whether they argue that: (1) the observed KPD behaviour is the result of the system switching from one stable state to another when perturbed with high glucose intake? (2) the observed KPD behaviour is the result of one of the steady states disappearing with high glucose intake?

      What we mean to say is that during a period of high sugar intake or exogeneous insulin treatment, one of the fixed points is temporarily removed—it is still a fixed point of the “normal” dynamics, but not a fixed point of the dynamics with the external condition added. Since when glucose (insulin) intake is high enough, only the low (high)-β fixed point is present, under one of these conditions the dynamics flow toward that fixed point. When the external influx of glucose/insulin is turned off, both fixed points are present again—but if the dynamics have moved sufficiently far during the external forcing, the fixed point they end up in will have switched from one fixed point to the other. We have edited the text to make this clearer (lines 153–185). Do note, however, that in response to both referee’s comments (see below), Figures 3 and 4 have been replaced with more illuminating ones. This specific point is now addressed by the new Figure 3.

      The adaptation of the prefactor ’c’ was confusing to me. I think I understood it in the end, but it sounded like, ”here’s a complication, but we don’t explain it because it doesn’t really matter”. I think the authors can explain this better (or potentially leave out the complication with ’c’ altogether?).

      Indeed, the existence of an adaptation mechanism is important for our overall picture of diabetes pathogenesis, but not for many of our analyses, which assume prediabetes. Nonetheless, we agree that the current explanation of it’s role is confusing because of its vagueness. We have elaborated the explanation of the type of dynamics we assume for c, adding an equation for its dynamics to the “Model” section of the Materials and methods, explained in lines 456–465. We have also amended Figure 1 to note this compensation.

      I expect the main impact of this work will be to get clinical practitioners and biomedical researchers interested in the intermediate timescale dynamics of β-cells and take seriously the possibility that reversible inactive states might exist. But this impact will only be achieved when the results are clearly and easily understandable by an audience that is not familiar with mathematical modelling. I personally found it difficult to understand what I was supposed to see in the figures at first glance. Yes, the subtle points are indeed explained in the figure captions, but it might be advantageous to make the points visually so clear that a caption is barely needed. For example, when claiming that a change in parameters leads to bistability, why not plot the steady state values as a function of that parameter instead of showing curves from which one has to infer a steady state?

      I would advise the authors to reconsider their visual presentation by, e.g., presenting the figures to clinical practitioners or biomedical researchers with just a caption title to test whether such an audience can decipher the point of the figure! This is of course merely a personal suggestion that the authors may decide to ignore. I am making this suggestion only because I believe in the quality of this work and that improving the clarity of the figures and the ease with which one can understand the main points would potentially lead to a much larger impact on the presented results.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader had to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. (These new figures are Fig. 3–5 in the revised manuscript.)

      Could the authors explicitly point out what could be learned from their work for the clinic? At the moment treatment consists of giving insulin to patients. If I understand correctly, nothing about the current treatment would change if the model is correct. Is there maybe something more subtle that could be relevant to devising an optimal treatment for KPD patients?

      This is another very good point. We have added a new figure (Fig. 7) in our results section showing how this model, or one like it, can be analyzed to suggest an insulin treatment schedule (once parameters for an individual patient can be measured), and added some discussion of this point (lines 224–240) as well as lifestyle changes our model might suggest for KPD patients to the discussion (lines 413–425).

      Similarly, could the authors explicitly point out how their model could be experimentally tested? For example, are the functions f(G) and g(G) experimentally accessible? Related to that, presumably the shape of those functions matters to reproduce the observed behaviour. Could the authors comment on that / analyze how reproducing the observed behaviour puts constraints on the shape of the used functions and chosen parameter values?

      g(G) has not been carefully measured in cellular data, however it could be in more quantative versions of existing experiments. Further, our model indeed requires some general features for the forms of f(G) and g(G) to produce KPD-like phenomena. We have added some comment on this to the discussion section of the revised manuscript (lines 367–372).

      Could the authors explicitly spell out which parameters they think differ between individual KPD patients, and which parameters differ between KPD patients and ’regular’ type 2 diabetics?

      In general we expect all parameters should vary both among KPD patients and between KPD / “conventional” T2D. The primary parameter determining whether KPD and conventional T2D, is seen, however, is the ratio kIN/kRE. We have elaborated on both these points in the revised mansuscript. (Lines 186–192, 250–257.)

      I was confused about the timescale of remission. At one point the authors write “KPD patients can often achieve partial remission: after a few weeks or months of treatment with insulin” but later the authors state that “the duration of the remission varies from 6 months to 10 years”.

      The former timescale is the typical timescale achieve remission. After remission is reached, however, it may or may not last—patients may experience a relapse, where their condition worsens and they again require insulin. We have edited the text to clarify this distinction (lines 66–73).

      When the authors talk about intermediate timescales in the main text could they specify an actual unit of time, such as days, weeks, or months as it would relate to the rate constants in their model for those transitions?

      We have done so (lines 86–87, figure 1 caption, figure 2 caption). Getting KPD-like behavior requires (at high glucose) the deactivation process to be somewhat faster than the reactivation process, so the relevant scales are between weeks (reactivation) and days (deactivation at high G).

      The authors state ”Our simple model of β-cell adaptation also neglects the known hyperglycemiainduced leftward shift in the insulin secretion curve f(G) in Eq. (2)) ”. This seems an important consideration. Could the authors comment on why they did not model this shift, and/or explicitly discuss how including it is expected to change the model dynamics?

      We agree that this process seems potentially relevant, as it seems to happen on a relatively fast timescale compared to glucose-induced β-cell death. It is, however, not so well characterized quantitatively that including it is a simple matter of putting in known values—we would be making assumptions that would complicate the interpretation of our results.

      It is clear that this effect will need to be considered when quanitatively modelling real patient data. However, it is also straightforward to argue that this effect by itself cannot produce KPD-like symptoms, and will only tend to reduce the rate of glucotoxocity necessary to produce bibstability. We have added a discussion of this in the revisions (lines 307–315). We have also, in general, expanded the discussion of the effects that each neglected detail we have mentioned is expected to have (lines 292–315).

      The authors end with a statement that their results may “contribute to explanation of other observations that involve rapid onset or remission of diabetes-like phenomena, such as during pregnancy or for patients on very low calorie diets.” Could the authors spell out exactly how their model potentially relates to these phenomena?

      Our thinking is that, even when another direct cause, such as loss of insulin resistance, is implicated in reversal of diabetes, some portion of the effect may be explained by reversal of glucotoxicity. This is indeed at this point just a hypothesis, but we have expanded on it briefly in the revision. (Lines 281–291.)

      Minor typos:

      In Figure 2.D the last zero of 200 on the axis was cut off.

      Line 359 - there is a missing word ”in the analysis”.

      We have fixed these typos, thanks.

      Reviewer #2 (Recommendations for the author):

      The manuscript could be significantly improved in two key areas: the presentation of the analysis, and the relation with experimental phenomenology.

      Regarding the analysis presentation, the figures could be substantially enhanced with minimal effort from the authors. At present, they are sparse, lack legends, and offer only basic analysis. The authors should consider presenting, for example, a bifurcation diagram for beta cell mass and fasting glucose levels as a function of kIN, and how insulin sensitivity and average meal intake modulate this relationship. The goal should be to present clear, testable predictions in an intuitive manner. Currently, the specific testable predictions of the model are unclear.

      The response to this question is copied from the reponses to related questions from the first referee.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader thad to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. We have also supplemented our phase diagram that shows the effects of SI and the total beta cell population with bifurcation diagrams showing β as SI and βTOT are varied. (These new figures are Fig. 3–5 in the present manuscript.) Finally, we have added another figure analyzing the model’s predictions for the optimal insulin treatment and the resulting time needed to achieve remission (Fig. 7)

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      The authors frequently refer to their predictions and theory as being causal, both in the manuscript and in their response to reviewers. However, causal inference requires careful experimental design, not just statistical prediction. For example, the claim that "algorithmic differences between those with BPD and matched healthy controls" are "causal" in my opinion is not warranted by the data, as the study does not employ experimental manipulations or interventions which might predictably affect parameter values. Even if model parameters can be seen as valid proxies to latent mechanisms, this does not automatically mean that such mechanisms cause the clinical distinction between BPD and CON, they could plausibly also refer to the effects of therapy or medication. I recommend that such causal language, also implicit to expressions like "parameter influences on explicit intentional attributions", is toned down throughout the manuscript.

      Thankyou for this chance to be clearer in the language. Our models and paradigm introduce a from of temporal causality, given that latent parameter distributions are directly influenced by latent parameter estimates at a previous point in time (self-uncertainty and other uncertainty directly governs social contagion). Nevertheless, we appreciate the reviewers perspective and have now toned down the language to reflect this.

      Abstract:

      ‘Our model makes clear predictions about the mechanisms of social information generalisation concerning both joint and individual reward.’

      Discussion:

      ‘We can simulate this by modelling a framework that incorporates priors based on both self and a strong memory impression of a notional other (Figure S3).’

      ‘We note a strength of this work is the use of model comparison to understand algorithmic differences between those with BPD and matched healthy controls.’

      Although the authors have now much clearer outlined the stuy's aims, there still is a lack of clarity with respect to the authors' specific hypotheses. I understand that their primary predictions about disruptions to self-other generalisation processes underlying BPD are embedded in the four main models that are tested, but it is still unclear what specific hypotheses the authors had about group differences with respect to the tested models. I recommend the authors specify this in the introduction rather than refering to prior work where the same hypotheses may have been mentioned.

      Thankyou for this further critique which has enabled us to more cleary refine our introduction. We have now edited our introduction to be more direct about our hypotheses, that these hypotheses are instantiated into formal models, and what our predictions were. We have also included a small section on how previous predictions from other computational assessments of BPD link to our exploratory work, and highlighted this throughout the manuscript.

      ‘This paper seeks to address this gap by testing explicitly how disruptions in self-other generalization processes may underpin interpersonal disruptions observed in BPD. Specifically, our hypotheses were: (i) healthy controls will demonstrate evidence for both self-insertion and social contagion, integrating self and other information during interpersonal learning; and (ii) individuals with BPD will exhibit diminished self-other integration, reflected in stronger evidence for observations that assume distinct self-other representations.

      We tested these hypotheses by designing a dynamic, sequential, three-phase Social Value Orientation (Murphy & Ackerman, 2014) paradigm—the Intentions Game—that would provide behavioural signatures assessing whether BPD differed from healthy controls in these generalization processes (Figure 1A). We coupled this paradigm with a lattice of models (M1-M4) that distinguish between self-insertion and social contagion (Figure 1B), and performed model comparison:

      M1. Both self-to-other (self-insertion) and other-to-self (social contagion) occur before and after learning M2. Self-to-other transfer only occurs M3. Other-to-self transfer only occurs M4. Neither transfer process, suggesting distinct self-other representations

      We additionally ran exploratory analysis of parameter differences and model predictions between groups following from prior work demonstrating changes in prosociality (Hula et al., 2018), social concern (Henco et al., 2020), belief stability (Story et al., 2024a), and belief updating (Story, 2024b) in BPD to understand whether discrepancies in self-other generalisation influences observational learning. By clearly articulating our hypotheses, we aim to clarify the theoretical contribution of our findings to existing literature on social learning, BPD, and computational psychiatry.’

      Caveats should also be added about the exploratory nature of the many parameter group comparisons. If there are any predictions about group differences that can be made based on prior literature, the authors should make such links clear.

      Thank you for this. We have now included caveats in the text to highlight the exploratory nature of these group comparisons, and added direct links to relevant literature where able:

      Introduction

      ‘We additionally ran exploratory analysis of parameter differences and model predictions between groups following from prior work demonstrating changes in prosociality (Hula et al., 2018), social concern (Henco et al., 2020), belief stability (Story et al., 2024a), and belief updating (Story, 2024b) in BPD to understand whether discrepancies in self-other generalisation influences observational learning. By clearly articulating our hypotheses, we aim to clarify the theoretical contribution of our findings to existing literature on social learning, BPD, and computational psychiatry.’

      Model Comparison

      ‘We found that CON participants were best fit at the group level by M1 (Frequency = 0.59, Exceedance Probability = 0.98), whereas BPD participants were best fit by M4 (Frequency = 0.54, Exceedance Probability = 0.86; Figure 2A). This suggests CON participants are best fit by a model that fully integrates self and other when learning, whereas those with BPD are best explained as holding disintegrated and separate representations of self and other that do not transfer information back and forth.

      We first explore parameters between separate fits (see Methods). Later, in order to assuage concerns about drawing inferences from different models, we examined the relationships between the relevant parameters when we forced all participants to be fit to each of the models (in a hierarchical manner, separated by group). In sum, our model comparison is supported by convergence in parameter values when comparisons are meaningful (see Supplementary Materials). We refer to both types of analysis below.’

      Phase 2 analysis

      ‘Prior work predicts those with BPD should focus more intently on public social information, rather than private information that only concerns one party (Henco et al., 2020). In BPD participants, only new beliefs about the relative reward preferences – mutual outcomes for both player - of partners differed (see Fig 2E): new median priors were larger than median preferences in phase 1 (mean = -0.47; = -6.10, 95%HDI: -7.60, -4.60).’

      ‘Models of moral preference learning (Story et al., 2024) predicts that BPD vs non-BPD participants have more rigid beliefs about their partners. We found that BPD participants were equally flexible around their prior beliefs about a partner’s relative reward preferences (= -1.60, 95%HDI: -3.42, 0.23), and were less flexible around their beliefs about a partner’s absolute reward preferences (=-4.09, 95%HDI: -5.37, -2.80), versus CON (Figure 2B).’

      Phase 3 analysis

      ‘Prior work predicts that human economic preferences are shaped by observation (Panizza, et al., 2021; Suzuki et al. 2016; Yu et al, 2021), although little-to-no work has examined whether contagion differs for relative vs. absolute preferences. Associative models predict that social contagion may be exaggerated in BPD (Ereira et al., 2018).… As a whole, humans are more susceptible to changing relative preferences more than selfish, absolute reward preferences, and this is disrupted in BPD.’

      Psychometric and Intentional Attribution analysis

      ‘Childhood trauma, persecution, and poor mentalising in BPD are all predicted to disrupt one’s ability to change (Fonagy & Luyten, 2009).’

      ‘Prior work has also predicted that partner-participant preference disparity influences mental state attributions (Barnby et al., 2022; Panizza et al., 2021).’

      I'm not sure I understand why the authors, after adding multiple comparison correction, now list two kinds of p-values. To me, this is misleading and precludes the point of multiple comparison corrections, I therefore recommend they report the FDR-adjusted p-values only. Likewise, if a corrected p-value is greater than 0.05 this should not be interpreted as a result.

      We have now adjusted the exploratory results to include only the FDR corrected values in the text.

      ‘We assessed conditional psychometric associations with social contagion under the assumption of M3 for all participants. We conducted partial correlation analyses to estimate relationships conditional on all other associations and retained all that survived bootstrapping (5000 reps), permutation testing (5000 reps), and subsequent FDR correction. When not controlled for group status, RGPTSB and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p[fdr]=0.02). This was not affected by group correction. CTQ scores were moderately and negatively associated with shifts in individualistic reward preferences (; r = -0.25, 95%CI: -0.46, -0.04, p[fdr]=0.03). This was not affected by group correction. MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p[fdr]=0.03). This was diminished when controlled for group status (r = 0.13, 95%CI: -0.34, 0.08, p[fdr]=0.20). Together this provides some evidence that self-reported trauma and self-reported mentalising influence social contagion (Fig S11). Social contagion under M3 was highly correlated with contagion under M1 demonstrating parsimony of outcomes across models (Fig S12).

      Prior work has predicted that partner-participant preference disparity influences mental state attributions (Barnby et al., 2022; Panizza et al., 2021). We tested parameter influences on explicit intentional attributions in Phase 2 while controlling for group status. Attributions included the degree to which they believed their partner was motived by harmful intent (HI) and self-interest (SI). According with prior work (Barnby et al., 2022), greater disparity of absolute preferences before learning was associated on a trend level with reduced attributions of SI (<= -0.23, p[fdr]=0.08), and greater disparity of relative preferences before learning exaggerated attributions of HI = 0.21, p[fdr]=0.08), but did not survive correction (Figure S4B). This is likely due to partners being significantly less individualistic and prosocial on average compared to participants (= -5.50, 95%HDI: -7.60, -3.60; = 12, 95%HDI: 9.70, 14.00); partners are recognised as less selfish and more competitive.’

      Can the authors please elaborate why the algorithm proposed to be employed by BPD is more 'entropic', especially given both their self-priors and posteriors about partners' preferences tended to be more precise than the ones used by CON? As far as I understand, there's nothing in the data to suggest BPD predictions should be more uncertain. In fact, this leads me to wonder, similarly to what another reviewer has already suggested, whether BPD participants generate self-referential priors over others in the same way CON participants do, they are just less favourable (i.e., in relation to oneself, but always less prosocial) - I think there is currently no model that would incorporate this possibility? It should at least be possible to explore this by checking if there is any statistical relationship between the estimated θ_ppt^m and 〖p(θ〗_par |D^0).

      Thank you for this opportunity to be clearer in our wording. We belief the reviewer is referring to this line in the discussion: ‘In either case, the algorithm underlying the computational goal for BPD participants is far higher in entropy and emphasises a less stable or reliable process of inference.’

      We note in the revised Figure 2 panel E and in the results that those with BPD under M4 show insertion along absolute reward (they still expect diminished selfishness in others), but neutral priors over relative reward (around 0, suggesting expectations of neither prosocial or competitive tendencies of others). Thus, θ_ppt^m (self preference) and θ_par^m (other preference) are tightly associated for absolute, but not relative reward.

      In our wording, we meant that whether under model M4 or M1, those with BPD either show a neutral prior over relative reward (M4) or a prior with large variance over relative reward (M1), showing expectations of difference between themselves and their partner. In both cases, expectation about a partner’s absolute reward preferences is diminished vs. CON participants. We have strengthened our language in the discussion to clarify this:

      ‘In either case, the algorithm underlying the computational goal for BPD participants is far higher in uncertainty, whether through a neutral central tendency (M4) or large variance (M1) prior over relative reward in phase 2, and emphasises a less certain and reliable expectation about others.’

      To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired trait mentalising" - I don't understand what the authors mean by this, can they please elaborate and add some explanation to the main text?

      We have now clarified this in the text:

      ‘Together this provides some evidence that self-reported trauma and self-reported mentalising influence social contagion (Fig S11). Social contagion under M3 was highly correlated with contagion under M1 demonstrating parsimony of outcomes across models (Fig S12).’

      I noted that at least some of the newly added references have not been added to the bibliography (e.g., Hitchcock et al. 2022).

      Thankyou for noticing this omission. We have now ensured all cited works are in the reference list.

      Reviewer 2:

      The paper is not based on specific empirical hypotheses formulated at the outset, but, rather, it uses an exploratory approach. Indeed, the task is not chosen in order to tackle specific empirical hypotheses. This, in my view, is a limitation since the introduction reads a bit vague and it is not always clear which gaps in the literature the paper aims to fill. As a further consequence, it is not always clear how the findings speak to previous theories on the topic.’

      As I wrote in the public review, however, I believe that an important limitation of this work is that it was not based on testing specific empirical hypotheses formulated at the outset, and on selecting the experimental paradigm accordingly. This is a limitation because it is not always clear which gaps in the literature the paper aims to fill. As a consequence, although it has improved substantially compared to the previous version, the introduction remains a bit vague. As a further consequence, it is not always clear how the findings speak to previous theories on the topic. Still, despite this limitation, the paper has many strengths, and I believe it is now ready for publication

      Thank you for this further critique. We appreciate your appraisal that the work has improved substantially and is ready for publication. We nevertheless have opted to clarify our introduction and aprior predictions throughout the manuscript (please see response to Reviewer 1).

      Reviewer 3:

      Although the authors note that their approach makes "clear and transparent a priori predictions," the paper could be improved by providing a clear and consolidated statement of these predictions so that the results could be interpreted vis-a-vis any a priori hypotheses.

      In line with comments from both Reviewer 1 and 2, we have clarified our introduction to make it clear what our aprior predictions and hypotheses are about our core aims and exploratory analyses (see response to Reviewer 1).

      The approach of using a partial correlation network with bootstrapping (and permutation) was interesting, but the logic of the analysis was not clearly stated. In particular, there are large group (Table 1: CON vs. BPD) differences in the measures introduced into this network. As a result, it is hard to understand whether any partial correlations are driven primarily by mean differences in severity (correlations tend to be inflated in extreme groups designs due to the absence of observation in middle of scales forming each bivariate distribution). I would have found these exploratory analyses more revealing if group membership was controlled for.

      Thank you for this chance to be clearer in our methods. We have now written a more direct exposition of this exploratory method:

      ‘Exploratory Network Analysis

      To understand the individual differences of trait attributes (MZQ, RGPTSB, CTQ) with other-to-self information transfer () across the entire sample we performed a network analysis (Borsboom, 2021). Network analysis allows for conditional associations between variables to be estimated; each association is controlled for by all other associations in the network. It also allows for visual inspection of the conditional relationships to get an intuition for how variables are interrelated as a whole (see Fig S11). We implemented network analysis with the bootNet package in r using the ‘estimateNetwork’ function with partial correlations (Epskamp, Borsboom & Fried, 2018). To assess the stability of the partial correlations we further implemented bootstrap resampling with 5000 repetitions using the ‘bootnet’ function. We then additionally shuffled the data and refitted the network 5000 times to determine a p<sub>permuted</sub> value; this indicates the probability that a conditional relationship in the original network was within the null distribution of each conditional relationship. We then performed False Discovery Rate correction on the resulting p-values. We additionally controlled for group status for all variables in a supplementary analysis (Table S4).’

      We have also further corrected for group status and reported these results as a supplementary table, and also within the main text alongside the main results. We have opted to relegate Figure 4 into a supplementary figure to make the text clearer.

      ‘We explored conditional psychometric associations with social contagion under the assumption of M3 for all participants (where everyone is able to be influenced by their partner). We conducted partial correlation analyses to estimate relationships conditional on all other associations and retained all that survived bootstrapping (5000 reps), permutation testing (5000 reps), and subsequent FDR correction. When not controlled for group status, RGPTSB and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p[fdr]=0.02). This was not affected by group correction. CTQ scores were moderately and negatively associated with shifts in individualistic reward preferences (; r = -0.25, 95%CI: -0.46, -0.04, p[fdr]=0.03). This was not affected by group correction. MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p[fdr]=0.03). This was diminished when controlled for group status (r = 0.13, 95%CI: -0.34, 0.08, p[fdr]=0.20). Together this provides some evidence that self-reported trauma and self-reported mentalising influence social contagion (Fig S11). Social contagion under M3 was highly correlated with contagion under M1 demonstrating parsimony of outcomes across models (Fig S12).’

      Discussion first para: "effected -> affected"

      Thanks for spotting this. We have now changed it.

      Add "s" to "participant: "Notably, despite differing strategies, those with BPD achieved similar accuracy to CON participant."

      We have now changed this.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains of the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. This difference is attributed to the short-term plasticity properties of interneurons, particularly somatostatin-expressing (SST+) neurons. This claim is supported by the increased density of SST+ neurons found in L4 of the septa compared to barrels, along with a stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation. The role of the synaptic protein Elfn1 is then examined. Elfn1 KO mice exhibited little to no functional domain separation between barrel and septa, with no significant difference in single- versus multi-whisker response ratios across barrel and septal domains. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view that barrel and septal domains contribute differently to processing single versus multi-whisker inputs, suggesting that the barrel cortex multiplexes sensory information coming from the whiskers in different domains.

      We thank the reviewer for the very neat summary of our findings that barrel cortex multiplexes converging information in separate domains.

      Weaknesses:

      While the observed divergence in responses to repeated SWS vs MWS between the barrel and septal domains is intriguing, the presented evidence falls short of demonstrating that short-term plasticity in SST+ neurons critically underpins this difference. The absence of a mechanistic explanation for this observation limits the work's significance. The measurement of SST neurons' response is not specific to a particular domain, and the Elfn1 manipulation does not seem to be specific to either stimulus type or a particular domain.

      We appreciate the reviewer’s perspective. Although further research is needed to understand the circuit mechanisms underlying the observed phenomenon, we believe our data suggest that altering the short-term dynamics of excitatory inputs onto SST neurons reduces the divergent spiking dynamics in barrels versus septa during repetitive single- and multi-whisker stimulation. Future work could examine how SST neurons, whose somata reside in barrels and septa, respond to different whisker stimuli and the circuits in which they are embedded. At this time, however, the authors believe there is no alternative way to test how the short-term dynamics of excitatory inputs onto SST neurons, as a whole, contribute to the temporal aspects of barrel versus septa spiking.

      The study's reach is further constrained by the fact that results were obtained in anesthetized animals, which may not generalize to awake states.

      We appreciate the reviewer’s concern regarding the generalizability of our findings from anesthetized animals to awake states. Anesthesia was employed to ensure precise individual whisker stimulation (and multi-whisker in the same animal), which is challenging in awake rodents due to active whisking. While anesthesia may alter higher-order processing, core mechanisms, such as short and long term plasticity in the barrel cortex, are preserved under anesthesia (Martin-Cortecero et al., 2014; Mégevand et al., 2009).

      The statistical analysis appears inappropriate, with the use of repeated independent tests, dramatically boosting the false positive error rate.

      Thank you for your feedback on our analysis using independent rank-based tests for each time point in wild-type (WT) animals. To address concerns regarding multiple comparisons and temporal dependencies (for Figure 1F and 4D for now but we will add more in our revision), we performed a repeated measures ANOVA for WT animals (13 Barrel, 8 Septa, 20 time points), which revealed a significant main effect of Condition (F(1,19) = 16.33, p < 0.001) and a significant Condition-Time interaction (F(19,361) = 2.37, p = 0.001). Post-hoc tests confirmed significant differences between Barrel and Septa at multiple time points (e.g., p < 0.0025 at times 3, 4, 6, 7, 8, 10, 11, 12, 16, 19 after Bonferroni posthoc correction), supporting a differential multi-whisker vs. single-whisker ratio response in WT animals. In contrast, a repeated measures ANOVA for knock-out (KO) animals (11 Barrel, 7 Septa, 20 time points) showed no significant main effect of Condition (F(1,14) = 0.17, p = 0.684) or Condition-Time interaction (F(19,266) = 0.73, p = 0.791), indicating that the Barrel-Septa difference observed in WT animals is absent in KO animals.

      Furthermore, the manuscript suffers from imprecision; its conclusions are occasionally vague or overstated. The authors suggest a role for SST+ neurons in the observed divergence in SWS/MWS responses between barrel and septal domains. However, this remains speculative, and some findings appear inconsistent. For instance, the increased response of SST+ neurons to MWS versus SWS is not confined to a specific domain. Why, then, would preferential recruitment of SST+ neurons lead to divergent dynamics between barrel and septal regions? The higher density of SST+ neurons in septal versus barrel L4 is not a sufficient explanation, particularly since the SWS/MWS response divergence is also observed in layers 2/3, where no difference in SST+ neuron density is found.

      Moreover, SST+ neuron-mediated inhibition is not necessarily restricted to the layer in which the cell body resides. It remains unclear through which differential microcircuits (barrel vs septum) the enhanced recruitment of SST+ neurons could account for the divergent responses to repeated SWS versus MWS stimulation.

      We fully appreciate the reviewer’s comment. We currently do not provide any evidence on the contribution of SST neurons in the barrels versus septa in layer 4 on the response divergence of spiking observed in SWS versus MWS. We only show that these neurons differentially distribute in the two domains in this layer. It is certainly known that there is molecular and circuit-based diversity of SST-positive neurons in different layers of the cortex, so it is plausible that this includes cells located in the two domains of vS1, something which has not been examined so far. Our data on their distribution are one piece of information that SST neurons may have a differential role in inhibiting barrel stellate cells versus septa ones. Morphological reconstructions of SST neurons in L4 of the somatosensory barrel cortex has shown that their dendrites and axons project locally and may confine to individual domains, even though not specifically examined (Fig. 3 of Scala F et al., 2019). The same study also showed that L4 SST cells receive excitatory input from local stellate cells) and is known that they are also directly excited by thalamocortical fibers (Beierlein et al., 2003; Tan et al., 2008), both of which facilitate.

      As shown in our supplementary figure, the divergence is also observed in L2/3 where, as the reviewer also points out, where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Regardless of the mechanism, the Elfn1 knock-out mouse line almost exclusively affects the incoming excitability onto SST neurons (see also reply to comment below), hence what can be supported by our data is that changing the incoming short-term synaptic plasticity onto these neurons brings the spiking dynamics between barrels and septa closer together.

      The Elfn1 KO mouse model seems too unspecific to suggest the role of the short-term plasticity in SST+ neurons in the differential response to repeated SWS vs MWS stimulation across domains. Why would Elfn1-dependent short-term plasticity in SST+ neurons be specific to a pathway, or a stimulation type (SWS vs MWS)? Moreover, the authors report that Elfn1 knockout alters synapses onto VIP+ as well as SST+ neurons (Stachniak et al., 2021; previous version of this paper)-so why attribute the phenotype solely to SST+ circuitry? In fact, the functional distinctions between barrel and septal domains appear largely abolished in the Elfn1 KO.

      Previous work by others and us has shown that globally removing Elfn1 selectively removes a synaptic process from the brain without altering brain anatomy or structure. This allows us to study how the temporal dynamics of inhibition shape activity, as opposed to inhibition from particular cell types. We will nevertheless update the text to discuss more global implications for SST interneuron dynamics and include a reference to VIP interneurons that contain Elfn1.

      When comparing SWS to MWS, we find that MWS replaces the neighboring excitation which would normally be preferentially removed by short-term plasticity in SST interneurons, thus providing a stable control comparison across animals and genotypes. On average, VIP interneurons failed to show modulation by MWS. We were unable to measure a substantial contribution of VIP cells to this process and also note that the Elfn1 expressing multipolar neurons comprise only ~5% of VIP neurons (Connor and Peters, 1984; Stachniak et al., 2021), a fraction that may be lost when averaging from 138 VIP cells. Moreover, the effect of Elfn1 loss on VIP neurons is quite different and marginal compared to that of SST cells, suggesting that the primary impact of Elfn1 knockout is mediated through SST+ interneuron circuitry. Therefore, even if we cannot rule out that these 5% of VIP neurons contribute to barrel domain segregation, we are of the opinion that their influence would be very limited if any.

      Reviewer #2 (Public review):

      Summary:

      Argunsah and colleagues demonstrate that SST-expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1) Careful physiological and imaging studies.

      (2) Novel result showing the role of SST+ neurons in shaping responses.

      (3) Good use of a knockout animal to further the main hypothesis.

      (4) Clear analytical techniques.

      We thank the reviewer for their appreciation of the study.

      Weaknesses:

      No major weaknesses were identified by this reviewer. Overall, I appreciated the paper but feel it overlooked a few issues and had some recommendations on how additional clarifications could strengthen the paper. These include:

      (1) Significant work from Jerry Chen on how S1 neurons that project to M1 versus S2 respond in a variety of behavioral tasks should be included (e.g. PMID: 26098757). Similarly, work from Barry Connor's lab on intracortical versus thalamocortical inputs to SST neurons, as well as excitatory inputs onto these neurons (e.g. PMID: 12815025) should be included.

      We thank the reviewer for these valuable resources that we overlooked. We will include Chen et al. (2015), Cruikshank et al. (2007) and Gibson et al. (1999) to contextualize S1 projections and SST+ inputs, strengthening the study’s foundation as well as Beierlein et al. (2003) which nicely show both local and thalamocortical facilitation of excitatory inputs onto L4 SST neurons, in contrast to PV cells. The paper also shows the gradual recruitment of SST neurons by thalamocortical inputs to provide feed-forward inhibition onto stellate cells (regular spiking) of the barrel cortex L4 in rat.

      (2) Using Layer 2/3 as a proxy to what is happening in layer 4 (~line 234). Given that layer 2/3 cells integrate information from multiple barrels, as well as receiving direct VPm thalamocortical input, and given the time window that is being looked at can receive input from other cortical locations, it is not clear that layer 2/3 is a proxy for what is happening in layer 4.

      We agree with the reviewer that what we observe in L2/3 is not necessarily what is taking place in L4 SST-positive cells. The data on L2/3 was included to show that these cells, as a population, can show divergent responses when it comes to SWS vs MWS, which is not seen in L2/3 VIP neurons. Regardless of the mechanisms underlying it, our overall data support that SST-positive neurons can change their activation based on the type of whisker stimulus and when the excitatory input dynamics onto these neurons change due to the removal of Elfn1 the recruitment of barrels vs septa spiking changes at the temporal domain. Having said that, the data shown in Supplementary Figure 3 on the response properties of L2/3 neurons above the septa vs above the barrels (one would say in the respective columns) do show the same divergence as in L4. This suggests that a circuit motif may exist that is common to both layers, involving SST neurons that sit in L4, L5 or even L2/3. This implies that despite the differences in the distribution of SST neurons in septa vs barrels of L4 there is an unidentified input-output spatial connectivity motif that engages in both L2/3 and L4. Please also see our response to a similar point raised by reviewer 1.

      (3) Line 267, when discussing distinct temporal response, it is not well defined what this is referring to. Are the neurons no longer showing peaks to whisker stimulation, or are the responses lasting a longer time? It is unclear why PV+ interneurons which may not be impacted by the Elfn1 KO and receive strong thalamocortical inputs, are not constraining activity.

      We thank the reviewer for their comment and will clarify the statement.

      This convergence of response profiles was further clear in stimulus-aligned stacked images, where the emergent differences between barrels and septa under SWS were largely abolished in the KO (Figure 4B). A distinction between directly stimulated barrels and neighboring barrels persisted in the KO. In addition, the initial response continued to differ between barrel and septa and also septa and neighbor (Figure 4B). This initial stimulus selectivity potentially represents distinct feedforward thalamocortical activity, which includes PV+ interneuron recruitment that is not directly impacted by the Elfn1 KO (Sun et al., 2006; Tan et al., 2008). PV+ cells are strongly excited by thalamocortical inputs, but these exhibit short-term depression, as does their output, contrasting with the sustained facilitation observed in SST+ neurons. These findings suggest that in WT animals, activity spillover from principal barrels is normally constrained by the progressive engagement of SST+ interneurons in septal regions, driven by Elfn1-dependent facilitation at their excitatory synapses. In the absence of Elfn1, this local inhibitory mechanism is disrupted, leading to longer responses in barrels, delayed but stronger responses in septa, and persistently stronger responses in unstimulated neighbors, resulting in a loss of distinction between the responses of barrel and septa domains that normally diverge over time (see Author response image 1 below).

      Author response image 1.

      A) Barrel responses are longer following whisker stimulation in KO. B) Septal responses are slightly delayed but stronger in KO. C) Unstimulated neighbors show longer persistent responses in KO.

      (4) Line 585 "the earliest CSD sink was identified as layer 4..." were post-hoc measurements made to determine where the different shank leads were based on the post-hoc histology?

      Post hoc histology was performed on plane-aligned brain sections which would allow us to detect barrels and septa, so as to confirm the insertion domains of each recorded shank. Layer specificity of each electrode therefore could therefore not be confirmed by histology as we did not have coronal sections in which to measure electrode depth.

      (5) For the retrograde tracing studies, how were the M1 and S2 injections targeted (stereotaxically or physiologically)? How was it determined that the injections were in the whisker region (or not)?

      During the retrograde virus injection, the location of M1 and S2 injections was determined by stereotaxic coordinates (Yamashita et al., 2018). After acquiring the light-sheet images, we were able to post hoc examine the injection site in 3D and confirm that the injections were successful in targeting the regions intended. Although it would have been informative to do so, we did not functionally determine the whisker-related M1 and whisker-related S2 region in this experiment.

      (6) Were there any baseline differences in spontaneous activity in the septa versus barrel regions, and did this change in the KO animals?

      Thank you for this interesting question. Our previous study found that there was a reduction in baseline activity in L4 barrel cortex of KO animals at postnatal day (P)12, but no differences were found at P21 (Stachniak et al., 2023).

      Reviewer #3 (Public review):

      Summary:

      This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics, particularly involving Elfn1-expressing SST⁺ interneurons, may mediate temporal integration of multi-whisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose that septa integrate MW input in an Elfn1-dependent manner, enabling functional segregation from barrel columns.

      Strengths:

      The core hypothesis is interesting and potentially impactful. While barrels have been extensively characterized, septa remain less understood, especially in mice, and this study's focus on septal integration of MW stimuli offers valuable insights into this underexplored area. If septa indeed act as selective integrators of distributed sensory input, this would add a novel computational role to cortical microcircuits beyond what is currently attributed to barrels alone. The narrative of this paper is intellectually stimulating.

      We thank the reviewer for finding the study intellectually stimulating.

      Weaknesses:

      The methods used in the current study lack the spatial and cellular resolution needed to conclusively support the central claims. The main physiological findings are based on unsorted multi-unit activity (MUA) recorded via low-channel-count silicon probes. MUA inherently pools signals from multiple neurons across different distances and cell types, making it difficult to assign activity to specific columns (barrel vs. septa) or neuron classes (e.g., SST⁺ vs. excitatory).

      The recording radius (~50-100 µm or more) and the narrow width of septa (~50-100 µm or less) make it likely that MUA from "septal" electrodes includes spikes from adjacent barrel neurons.

      The authors do not provide spike sorting, unit isolation, or anatomical validation that would strengthen spatial attribution. Calcium imaging is restricted to SST⁺ and VIP⁺ interneurons in superficial layers (L2/3), while the main MUA recordings are from layer 4, creating a mismatch in laminar relevance.

      We thank the reviewer for pointing out the possibility of contamination in septal electrodes. Importantly, it may not have been highlighted, although reported in the methods, but we used an extremely high threshold (7.5 std, in methods, line 583) for spike detection in order to overcome the issue raised here, which restricts such spatial contaminations. Since the spike amplitude decays rapidly with distance, at high thresholds, only nearby neurons contribute to our analysis, potentially one or two. We believe that this approach provides a very close approximation of single unit activity (SUA) in our reported data. We will include a sentence earlier in the manuscript to make this explicit and prevent further confusion.

      Regarding the point on calcium imaging being performed on L2/3 SST and VIP cells instead of L4. Both reviewer 1 and 2 brought up the same issue and we responded as follows. As shown in our supplementary figure, the divergence is also observed in L2/3 where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Furthermore, while the role of Elfn1 in mediating short-term facilitation is supported by prior studies, no new evidence is presented in this paper to confirm that this synaptic mechanism is indeed disrupted in the knockout mice used here.

      We thank Reviewer #3 for noting the absence of new evidence confirming Elfn1’s disruption of short-term facilitation in our knockout mice. We acknowledge that our study relies on previously strong published data demonstrating that Elfn1 mediates short-term synaptic facilitation of excitatory inputs onto SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023). These studies consistently show that Elfn1 knockout abolishes facilitation in SST+ synapses, leading to altered temporal dynamics, which we hypothesize underlies the observed loss of barrel-septa response divergence in our Elfn1 KO mice (Figure 4). Nevertheless, to address the point raised, we will clarify in the revised manuscript (around lines 245-247 and 271-272) that our conclusions are based on these established findings, stating: “Building on prior evidence that Elfn1 knockout disrupts short-term facilitation in SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023), we attribute the abolished barrel-septa divergence in Elfn1 KO mice to altered SST+ synaptic dynamics, though direct synaptic measurements were not performed here.”

      Additionally, since Elfn1 is constitutively knocked out from development, the possibility of altered circuit formation-including changes in barrel structure and interneuron distribution, cannot be excluded and is not addressed.

      We thank Reviewer #3 for raising the valid concern that constitutive Elfn1 knockout could potentially alter circuit formation, including barrel structure and interneuron distribution. To address this, we will clarify in the revised manuscript (around line ~271 and in the Discussion) that in our previous studies that included both whole-cell patch-clamp in acute brain slices ranging from postnatal day 11 to 22 (P11 - P21) and in vivo recordings from barrel cortex at P12 and P21, we saw no gross abnormalities in barrel structure, with Layer 4 barrels maintaining their characteristic size and organization, consistent with wild-type (WT) mice (Stachniak et al., 2019, 2023). While we cannot fully exclude subtle developmental changes, prior studies indicate that Elfn1 primarily modulates synaptic function rather than cortical cytoarchitecture (Tomioka et al., 2014). Elfn1 KO mice show no gross morphological or connectivity differences and the pattern and abundance of Elfn1 expressing cells (assessed by LacZ knock in) appears normal (Dolan and Mitchell, 2013).

      We will add the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013). Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without the usage of time-depended conditional knockout of the gene.”

      References

      (1) Beierlein, M., Gibson, J. R. & Connors, B. W. (2003). Two dynamically distinct inhibitory networks in layer 4 of the neocortex. J. Neurophysiol. 90, 2987–3000.

      (2) Burkhalter, A., D’Souza, R. D. & Ji, W. (2023). Integration of feedforward and feedback information streams in the modular architecture of mouse visual cortex. Annu. Rev. Neurosci. 46, 259–280.

      (3) Chen, J. L., Margolis, D. J., Stankov, A., Sumanovski, L. T., Schneider, B. L. & Helmchen, F. (2015). Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108.

      (4) Connor, J. R. & Peters, A. (1984). Vasoactive intestinal polypeptide-immunoreactive neurons in rat visual cortex. Neuroscience 12, 1027–1044.

      (5) Cruikshank, S. J., Lewis, T. J. & Connors, B. W. (2007). Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat. Neurosci. 10, 462–468.

      (6) Dolan, J. & Mitchell, K. J. (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491.

      (7) Gibson, J. R., Beierlein, M. & Connors, B. W. (1999). Two networks of electrically coupled inhibitory neurons in neocortex. Nature 402, 75–79.

      (8) Ji, W., Gămănuţ, R., Bista, P., D’Souza, R. D., Wang, Q. & Burkhalter, A. (2015). Modularity in the organization of mouse primary visual cortex. Neuron 87, 632–643.

      (9) Martin-Cortecero, J. & Nuñez, A. (2014). Tactile response adaptation to whisker stimulation in the lemniscal somatosensory pathway of rats. Brain Res. 1591, 27–37.

      (10) Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M. & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. J. Neurosci. 29, 5326–5335.

      (11) Meier, A. M., Wang, Q., Ji, W., Ganachaud, J. & Burkhalter, A. (2021). Modular network between postrhinal visual cortex, amygdala, and entorhinal cortex. J. Neurosci. 41, 4809–4825.

      (12) Meier, A. M., D’Souza, R. D., Ji, W., Han, E. B. & Burkhalter, A. (2025). Interdigitating modules for visual processing during locomotion and rest in mouse V1. bioRxiv 2025.02.21.639505.

      (13) Scala, F., Kobak, D., Shan, S., Bernaerts, Y., Laturnus, S., Cadwell, C. R., Hartmanis, L., Froudarakis, E., Castro, J. R., Tan, Z. H., et al. (2019). Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174.

      (14) Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J. & Ghosh, A. (2019). Elfn1-induced constitutive activation of mGluR7 determines frequency-dependent recruitment of somatostatin interneurons. J. Neurosci. 39, 4461–4475.

      (15) Stachniak, T. J., Kastli, R., Hanley, O., Argunsah, A. Ö., van der Valk, E. G. T., Kanatouris, G. & Karayannis, T. (2021). Postmitotic Prox1 expression controls the final specification of cortical VIP interneuron subtypes. J. Neurosci. 41, 8150–8166.

      (16) Stachniak, T. J., Argunsah, A. Ö., Yang, J. W., Cai, L. & Karayannis, T. (2023). Presynaptic kainate receptors onto somatostatin interneurons are recruited by activity throughout development and contribute to cortical sensory adaptation. J. Neurosci. 43, 7101–7118.

      (17) Sun, Q.-Q., Huguenard, J. R. & Prince, D. A. (2006). Barrel cortex microcircuits: Thalamocortical feedforward inhibition in spiny stellate cells is mediated by a small number of fast-spiking interneurons. J. Neurosci. 26, 1219–1230.

      (18) Sylwestrak, E. L. & Ghosh, A. (2012). Elfn1 regulates target-specific release probability at CA1-interneuron synapses. Science 338, 536–540.

      (19) Tan, Z., Hu, H., Huang, Z. J. & Agmon, A. (2008). Robust but delayed thalamocortical activation of dendritic-targeting inhibitory interneurons. Proc. Natl. Acad. Sci. USA 105, 2187–2192.

      (20) Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., Suzuki, T., Odagawa, M., Odaka, Y. S., Iwayama, Y., et al. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nat. Commun. 5, 4501.

      (21) Yamashita, T., Vavladeli, A., Pala, A., Galan, K., Crochet, S., Petersen, S. S. & Petersen, C. C. (2018). Diverse long-range axonal projections of excitatory layer 2/3 neurons in mouse barrel cortex. Front. Neuroanat. 12, 33.

    1. Author response:

      Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      Thank you for a nice summary of the key points of our study. Below you will find our responses to your questions.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      Great suggestions! For the first suggestion, we will include a clearer justification in the revised manuscript. For the second one, all participants received task practice prior to scanning, and task accuracy exceeded 90% (we will explicitly report the accuracy rate in revision), suggesting the tasks were not overly demanding. Although ceiling effects limit the interpretability of behavioral-performance correlations, we argue that higher task demands would likely require greater attentional effort, leading to stronger modulation in hPIT, which aligns with our findings when we manipulated the attentional load.

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      The response of hPIT is generally insensitive to stimulus category, however, the reviewer is correct in noticing that attentional modulation in hPIT is slightly stronger to faces than scenes and scrambled images. Although faces used in the task had neutral expressions and the scene pictures were also neutral, it is indeed possible that potential semantic familiarity or emotional salience may contribute to the subtle category-related effects in the results of experiment 3. This point will be noted in the revised manuscript.

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      Yes, individual differences exist. In the revised manuscript, we will include individual subject data points in the figure 6B.

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      We agree that besides resting-state connection, task-based functional connectivity analyses would have the potential to provide additional information about whether hPIT serves as a convergence node or a control hub. While fMRI data are not the best to generate directional flow of information due to the low temporal resolution, we will conduct task-based functional connectivity analyses.

      We also observed modest hemispheric asymmetries in connectivity—for instance, both left and right hPIT showed stronger connectivity with right-hemisphere attention nodes. This will be described in the revised supplement.

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      The size and location of hPIT are generally consistent across subjects, as shown in Supplementary Figure 1. The consistency is also supported by figure 4C. The hPIT is defined by conjunction maps across three tasks and then manually delineated avoiding overlapping voxels with FFA and LOp. The FFA was defined using an independent contrast (Exp3 contrast [face-scene]) and the Lop location was defined by anatomical parcellation (Glasser et al., 2016).

      Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strengths

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests - were conducted, and the results are clearly reported.

      Thank you for a nice summary of the key points and strengths of our study.

      Weaknesses

      (1) The sample size is relatively small (n = 15), and inter-subject variability is big in Figures 5 and 6, as seen in the spread of individual data points and error bars. The analysis of attention-modulated voxel map intersections appears to be influenced by multiple outliers.

      We agree that the sample size (n = 15) is not ideal, and we acknowledge that some data points in Figures 5 and 6 appear to be potential outliers. However, according to conventional outlier detection criteria, all data points are within three standard deviations of the group mean and were therefore retained for analysis. Moreover, the attention-modulated voxel intersection map shown in Figure 4C is insensitive to outliers, because the intersection map plotted is based on the number of subjects.

      (2) The authors acknowledge important limitations, including the lack of exploration of feature-based attention and the temporal constraints inherent to fMRI.

      Yes, we hope to address these limitations in future studies.

      (3) Prior research has established that regions such as the prefrontal cortex (PFC) and posterior parietal cortex (PPC) are involved in both endogenous and exogenous attention and have been proposed as attentional priority maps. It remains unclear what is uniquely contributed by hPIT, how it functionally interacts with these classical attentional hubs, and whether its role is complementary or redundant. The study would benefit from more direct comparisons with these regions.

      In this study, we define the ROI base on intersection across three different types of spatial attention tasks, and the hPIT stands out in showing spatial attentional modulation across tasks. This could be due to the weak lateralized responses in PFC/PPC. To evaluate whether a region qualifies as a priority map, we applied four criteria (as mentioned in introduction). While dorsal and ventral attention network (DAN and VAN) regions can be considered important components of the priority map system, our findings suggest that among the regions tested, hPIT meets all four criteria. In Experiment 2, we included regions such as VFC (as part of PFC) and IPS (as part of PPC), and our findings suggest these areas are more involved in top-down attention. We agree with the reviewer’s suggestion and will perform additional analysis on PPC and PFC.

      (4) The functional connectivity analysis is only performed on resting-state data, and this approach does not capture context-dependent interactions. Task-based data analysis can provide stronger evidence.

      We acknowledge that resting-state FC is limited in assessing task-specific communication. To further investigate the role of hPIT, we plan to conduct task-based functional connectivity analyses.

      (5) The study does not report whether attentional modulation in hPIT is consistent across the two hemispheres. A comparison of hemispheric effects could provide important insight into lateralization and inter-individual variability, especially given the bilateral localization of hPIT.

      We thank the reviewer for this suggestion. hPIT was localized bilaterally using the same intersection-based method in Experiment 1. We have now performed additional analysis and found in Experiment 3, the difference in attentional modulation between high and low load conditions was significant in the right hPIT but not in the left. This result will be reported in the revised manuscript.

    1. Author response:

      Below, we will address point by point any and all concerns of the reviewers.

      Reviewer #1:

      There are no major concerns, but some material could be added for clarity and to make the work more accessible to a more general scientific audience.

      We will add text for clarity and to make the work more accessible to a general audience per this comment and similar suggestions of the other reviewers.

      (1.1) A figure clearly showing the habituation protocol and the use of the dishabituators would be a good addition, even if the procedure has been done before and is cited. There can always be readers who are seeing this for the first time.

      We do think this is a good idea as the time scales of the experiment will be clearly marked as well and we plan to generate one in the revised manuscript.

      (1.2) It would also be nice to comment on other ways dishabituation can happen (for example, when the stimulus is removed for a short time and returns) and what their time scales are.

      If the stimulus is withheld, spontaneous recovery occurs, a process distinct from dishabituation and worth exploring on its own. In a previous publication (Semelidou et al. eLife 2018;7:e39569), we have shown that in this habituation paradigm with 4 min exposure either to the aversive Octanol, or the attractive Ethyl Acetate, spontaneous recovery occurs on or after 6 minutes after the habituated stimulus is withheld. This contrasts the immediate effect of the single dishabituating stimulus, delivered for a few seconds at the end of exposure to the habituator. Granted that per Thomson (Neurobiol Learn Mem. 2009), spontaneous recovery is a characteristic of habituation, we will work this point in the text.

      (1.3) And more generally, the paper could perhaps improve by making a stronger case for why the results are important not just for flies but for neuroscience in general.

      Thank you for the encouragement. We will try to rationally generalize our findings.

      Reviewer #2:

      (2.1) However, the claim that this represents a fundamental difference between homosensory and heterosensory pathways for dishabituation is overstated.

      We had no intention of stating more than the fact that footshock and yeast odor dishabituators relay these stimuli to the mushroom bodies via distinct dopaminergic neurons, hence differentiating distinct dishabituating stimuli via the mechanosensory (footshock) and olfactory (yeast odor) modalities as they engage the mushroom bodies. As the reviewer suggests we will use more measured and specific language to state the above.

      (2.2) The introductory section does not adequately present current broad models for habituation and dishabituation.

      This was not done intentionally, but rather because we aimed at a less extended introductory section and ostensibly this resulted in brief and possibly inadequate presentation of current habituation models. We will present a much more detailed introduction and detail of habituation and dishabituation models in the revised manuscript (Also see reply to point 3.5 below).

      (2.3) There are many different time scales, even for Drosophila olfactory habituation. These, as well as potential underlying mechanistic differences, need to be acknowledged; any claim should be specifically qualified for the time scales being studied here.

      We understand and appreciate the point of the reviewer, as well as its significance and we will address this both in the revised text, but also by the paradigm figure we will add as stated above (point 1.1), where the time scales will be explicitly included and emphasized.

      (2.4) Additionally, there are several unclear, vague, and inaccurate sections and statements. A more careful, precise, and considered presentation of current views, as well as more measured claims of the impact of the findings, would substantially enhance my enthusiasm.

      We will address these concerns of course, though pointing out the specific offending parts would ascertain addressing them thoroughly. As stated above, we will incorporate current views in the introduction and when discussing our results and their impact.

      Reviewer #3:

      (3.1) The key issue is that the main concepts of this manuscript appear to be based on a misunderstanding/misinterpretation of the literature. As the authors set out to settle the debate "whether the novel dishabituating stimulus elicits sensitization of the habituated circuits, or it engages distinct neuronal routes to bypass habituation reinstating the naïve response", it seems that the authors based their investigation on the premise that "sensitization" is mediated by a facilitatory process within the S-R pathway, and "dishabituation" by a facilitatory process outside the S-R pathway. This is not the status quo in the field, particularly with the prevailing theory like the Dual-Process Theory.

      We appreciate the reviewer’s comment and the opportunity to clarify the conceptual framework of our work. Our intention was in fact to test the Groves and Thomson hypothesis (Neurobiol Learn Mem. 2009), in our olfactory habituation system. As such, dishabituation could have been the result of a facilitatory process within the S-R pathway, or from mechanisms outside of it. Our experimental design allowed to distinguish these possibilities and our results clearly show that dishabituation involves circuitry outside the S-R pathway. We do thank the reviewer for pointing out that we have not articulated clearly this intention and we will take care to communicate this effectively in the revised manuscript.

      (3.2) The original version of Dual-Process Theory (Groves and Thompson 1970, but also see Thompson 2008, Neurobiol Learn Mem) already hypothesized that habituation happens within the specific S-R pathway, and sensitization occurs separately in an "organism-wide" state system that modulates the output of all S-R pathways.

      As mentioned above, we are aware of the Dual-Process hypothesis. In fact, our data demonstrate that activity outside the olfactory S-R pathway, engaging novel neuronal circuits, mediates dishabituation. Unlike habituation, these circuits mediating dishabituation include at minimum, the mushroom bodies, the dopaminergic system and the APL neurons. In our view this does not support the “organism-wide state” system, but rather particular circuits that in agreement with the Groves and Thomson hypothesis, are outside the S-R pathway and modulate its behavioral output. We will work these concepts in the discussion section of the revised manuscript.

      (3.3) Dishabituation is recognized by the Dual-Process Theory as sensitization (organism-wide facilitation) manifested on top of existing habituation (depressed S-R pathway). This notion has been supported by a wide range of studies, including cat spinal cord reflex (e.g. Spencer et al. 1966) and work in Aplysia on heterosynaptic facilitation for both sensitization and dishabituation. Therefore, simply showing that the newly identified facilitatory pathways are outside the S-R habituation pathway is insufficient to demonstrate dishabituation.

      We respectfully disagree with the concluding sentence here. In all of our experiments, we observe a clear recovery of olfactory avoidance after exposure to the footshock, or yeast odor dishabituators. Moreover, the dishabituators are emulated by (photo)activation of particular neuronal circuits and the recovery of olfactory avoidance is blocked when these circuits are silenced. Regardless of whether this recovery is classified as dishabituation via sensitization or another facilitatory process, the key point is that the habituated response is reliably reinstated contingent upon the dishabituating stimulus. We believe this meets the established criteria for dishabituation.

      (3.4) As behavioral facilitation of a habituated response can be achieved by dishabituating (specific recovery of the S-R pathway) and/or superimposed sensitizing (organism-wide) processes, dishabituation and sensitization of this olfactory response must be first dissociated; however, the study provided no evidence for the dissociation. Without this piece of evidence, the claim of this paper that the newly identified pathways mediate dishabituation is not fully supported.

      We agree with the reviewer that we have not provided specific evidence dissociating dishabituation and sensitization of the particular olfactory response beyond the evidence implicating particular circuitry in the outcome of facilitation of the olfactory response.

      It should be noted that in photoactivation of the implicated circuitries in naïve flies, we do not observe enhanced octanol avoidance, suggesting that activation of these circuits alone does not induce sensitization. Moreover, our results show that neither footshock nor yeast odor drive an organism-wide sensitization, as silencing specific circuits was sufficient to block dishabituation—something that would not be expected if a global sensitization process was responsible of reinstating the olfactory response.

      Nonetheless, we will also attempt to dissociate sensitization from dishabituation using mutants previously reported deficient in sensitization (Duerr and Quinn, PNAS 1982), assuming these mutants retain normal olfactory habituation. We will also try sensitization protocols in the case of within-modal dishabituation to further clarify the underlying mechanisms. In principle, this includes using diluted Octanol as the habituating stimulus and attempt dishabituation with concentrated octanol.

      (3.5) The literature review of this manuscript has some discrepancies. In the introduction, the authors wrote "initial studies in Aplysia were consistent with the "dual-process theory" (Groves and Thompson 1979), where response recovery due to dishabituation appeared to result from sensitization superimposed on habituation, thus driving reversal of the attenuated response (Carew, Castellucci et al. 1971, Hochner, Klein et al. 1986, Marcus, Nolen et al. 1988, Ghirardi, Braha et al. 1992, Cohen, Kaplan et al. 1997, Antonov, Kandel et al. 1999, Hawkins, Cohen et al. 2006)." Hochner 1986 and Marcus 1988 in fact indicated otherwise. Hochner 1986 suggests that dishabituation and sensitization involve different molecular processes, while Marcus 1988 showed that dishabituation and sensitization have different behavioral characteristics. Therefore, the authors' statement is not supported by the cited literature.

      We are grateful to the reviewer for pointing out these significant discrepancies, consequent of multiple rounds of edits followed by our own oversight. These important publications for this manuscript will be referenced properly in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      Beyond what is stated in the title of this paper, not much needs to be summarized. eIF2A in HeLa cells promotes translation initiation of neither the main ORFs nor short uORFs under any of the conditions tested. 

      Strengths: 

      Very comprehensive, in fact, given the huge amount of purely negative data, an admirably comprehensive and well-executed analysis of the factor of interest. 

      Weaknesses: 

      The study is limited to the HeLa cell line, focusing primarily on KO of eIF2A and neglecting the opposite scenario, higher eIF2A expression which could potentially result in an increase in non-canonical initiation events. 

      We thank the reviewer for the positive evaluation. As suggested by the reviewer in the detailed recommendations, we will clarify in the title, abstract and text that our conclusions are limited to HeLa cells. Furthermore, as suggested we will test the effect of eIF2A overexpression on the luciferase reporter constructs, and will upload a revised manuscript.

      Reviewer #2 (Public review):

      Summary 

      Roiuk et al describe a work in which they have investigated the role of eIF2A in translation initiation in mammals without much success. Thus, the manuscript focuses on negative results. Further, the results, while original, are generally not novel, but confirmatory, since related claims have been made before independently in different systems with Haikwad et al study recently published in eLife being the most relevant. 

      Despite this, we find this work highly important. This is because of a massive wealth of unreliable information and speculations regarding eIF2A role in translation arising from series of artifacts that began at the moment of eIF2A discovery. This, in combination with its misfortunate naming (eIF2A is often mixed up with alpha subunit of eIF2, eIF2S1) has generated a widespread confusion among researchers who are not experts in eukaryotic translation initiation. Given this, it is not only justifiable but critical to make independent efforts to clear up this confusion and I very much appreciate the authors' efforts in this regard.  

      Strengths 

      The experimental investigation described in this manuscript is thorough, appropriate and convincing. 

      Weaknesses 

      However, we are not entirely satisfied with the presentation of this work which we think should be improved. 

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the reviewer's suggestions made in the detailed recommendations.

      Reviewer #3 (Public review):

      Summary: 

      This is a valuable study providing solid evidence that the putative non-canonical initiation factor eIF2A has little or no role in the translation of any expressed mRNAs in cultured human (primarily HeLa) cells. Previous studies have implicated eIF2A in GTP-independent recruitment of initiator tRNA to the small (40S) ribosomal subunit, a function analogous to canonical initiation factor eIF2, and in supporting initiation on mRNAs that do not require scanning to select the AUG codon or that contain near-cognate start codons, especially upstream ORFs with non-AUG start codons, and may use the cognate elongator tRNA for initiation. Moreover, the detected functions for eIF2A were limited to, or enhanced by, stress conditions where canonical eIF2 is phosphorylated and inactivated, suggesting that eIF2A provides a back-up function for eIF2 in such stress conditions. CRISPR gene editing was used to construct two different knockout cell lines that were compared to the parental cell line in a large battery of assays for bulk or gene-specific translation in both unstressed conditions and when cells were treated with inhibitors that induce eIF2 phosphorylation. None of these assays identified any effects of eIF2A KO on translation in unstressed or stressed cells, indicating little or no role for eIF2A as a back-up to eIF2 and in translation initiation at near-cognate start codons, in these cultured cells. 

      The study is very thorough and generally well executed, examining bulk translation by puromycin labeling and polysome analysis and translational efficiencies of all expressed mRNAs by ribosome profiling, with extensive utilization of reporters equipped with the 5'UTRs of many different native transcripts to follow up on the limited number of genes whose transcripts showed significant differences in translational efficiencies (TEs) in the profiling experiments. They also looked for differences in translation of uORFs in the profiling data and examined reporters of uORF-containing mRNAs known to be translationally regulated by their uORFs in response to stress, going so far as to monitor peptide production from a uORF itself. The high precision and reproducibility of the replicate measurements instil strong confidence that the myriad of negative results they obtained reflects the lack of eIF2A function in these cells rather than data that would be too noisy to detect small effects on the eIF2A mutations. They also tested and found no evidence for a recent claim that eIF2A localizes to the cytoplasm in stress and exerts a global inhibition of translation. Given the numerous papers that have been published reporting functions of eIF2A in specific and general translational control, this study is important in providing abundant, high-quality data to the contrary, at least in these cultured cells. 

      Strengths: 

      The paper employed two CRISPR knock-out cell lines and subjected them to a combination of high-quality ribosome profiling experiments, interrogating both main coding sequences and uORFs throughout the translatome, which was complemented by extensive reporter analysis, and cell imaging in cells both unstressed and subjected to conditions of eIF2 phosphorylation, all in an effort to test previous conclusions about eIF2A functioning as an alternative to eIF2. 

      Weaknesses: 

      There is some question about whether their induction of eIF2 phosphorylation using tunicamycin was extensive enough to state forcefully that eIF2A has little or no role in the translatome when eIF2 function is strongly impaired. Also, similar conclusions regarding the minimal role of eIF2A were reached previously for a different human cell line from a study that also enlisted ribosome profiling under conditions of extensive eIF2 phosphorylation; although that study lacked the extensive use of reporters to confirm or refute the identification by ribosome profiling of a small group of mRNAs regulated by eIF2A during stress. 

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the recommendations made in the detailed recommendations. Regarding the two points mentioned here:

      (1) The reason eIF2alpha phosphorylation does not increase appreciably is because unfortunately the antibody is very poor. The fact that the Integrated Stress Response (ISR) is induced by our treatment can be seen, for instance, by the fact that ATF4 protein levels increase strongly (in the very same samples where eIF2alpha phosphorylation does not increase much, in Suppl. Fig. 5E). We will strengthen the conclusion that the ISR is indeed activated with additional experiments/data as suggested by the reviewer.

      (2) We agree that our results are in line with results from the previous study mentioned by the reviewer, so we will revise the manuscript to mention this other study more extensively in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I suggest to state (already in the abstract, but perhaps also even in the title, definitely in the rest of the paper) that this analysis is limited to the HeLa cell line. 

      As suggested, we have now specified in both the title and the abstract that the work is done in HeLa cells.

      (2) In my view, it is a pity that the authors - given the tools are available - did not check the impact of high eIF2A levels on expression of individual mRNAs under normal and stress conditions. I am not suggesting to repeat ribo-seq in this setup, it would be too much to ask for, but re-examining some of the many reporters the authors generated with eIF2A overexpressed may point to some function, e.g. increased number of non-canonical initiation events (non-AUG-initiated)? If anything, the use of HeLa and the primary focus on eIF2A KO neglecting the prospective impact of eIF2A overexpression should be mentioned as two main limitations of this study. 

      We thank the reviewer for the good suggestion to test our synthetic reporters with eIF2A overexpression. New Suppl. Fig. 4G now shows that overexpression of eIF2A does not affect translation of synthetic reporters carrying an ATG start codon in different initiation contexts, or carrying near-cognate start codons, in agreement with a lack of effect on translation which we previously observed with loss of eIF2A.

      (3) Ribo-seq with eIF2A. Did the authors focus on ORFs that are known, or whose isoforms are known, to be non-AUG initiated? Would the loss of eIF2A decrease FPs in their CDSes under at least some conditions?

      We have now assessed the read distribution on the eIF4G2 transcript in both the control and tunicamycin conditions ( Author response image 1). In our hands, eIF4G2 is one of the best examples of non-AUG initiation in human cells, since the main coding sequence starts with GTG and the CDS is well translated. Nonetheless, we do not observe any significant changes in read distribution (panels A-B) or overall translation efficiency of eIF4G2 upon eIF2A loss (panels C-D).

      Author response image 1.

      (A-B) Average reads occupancy on the eIF4G2 (ENST0000339995) transcript in DMSO treated (panel A, n=3) or tunicamycin treated samples (panel B, n=2) derived from either control (black) or eIF2A-KO (red) HeLa cells. Reads counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C-D) The total number of reads mapping to the eIF4G2 CDS, normalized to library sequencing depth per replica was quantified. No significant difference between control and eIF2A-KO cells was observed in either DMSO treated (panel C) or tunicamycin treated (panel D) cells. Significance by unpaired, two-sided, t-test. ns = not significant.

      Thank you for giving me the opportunity to review this article.

      Reviewer #2 (Recommendations for the authors):

      While some of our suggestions below may be considered subtle, in our opinion they are important and it would be good if the authors consider them for their revision, we also have a couple of technical suggestions. 

      (1) Abstract. 

      The authors failed to identify the role of eIF2A in translation initiation and have provided compelling evidence that eIF2A is not involved in recognition of non-AUG codons as start codons nor in recruitment of initiator tRNA during stress conditions which are two activities most commonly misattributed to eIF2A. However, they have not exhausted all possible potential functions of eIF2A, see below, it is also possible that eIF2A may have a role not yet suggested by anyone and it may function in translation initiation in special circumstances that have not been tested yet. The authors indeed discuss such possibility in the Discussion section. Given that there is genetic evidence (that is unaffected by biochemical impurities) linking eIF2A to other initiation factors (5B and 4E), we are not yet convinced that eIF2A does not have any role in translation initiation and therefore we find the last sentence of the abstract premature. We suggest to soften this statement into something like this: whether eIF2A has any role in translation remains unknown, it may even have a role in a different aspect of RNA Biology. 

      We agree with the reviewer. We changed the last sentence of the abstract to read as follows:

      “It is possible that eIF2A plays a role in translation regulation in specific conditions that we have not tested here, or that it plays a role in a different aspect of RNA biology.”

      (2) Recently eIF2A has been implicated in ribosomal frameshifting, see Wei et al 2023 DOI: 10.1016/j.celrep.2023.112987 

      Could authors look into PEG10 mRNA ribosome profile to see if there are detectable statistically significant changes in footprint density downstream of frameshift site between WT and eIF2A Kos? It is likely that the coverage will be insufficient to give a definitive answer, but it is worth checking, it would be a pity to miss it. 

      We thank the reviewer for this suggestion. We have now looked at the distribution of ribosome footprints on the PEG10 transcript variant that is expressed in HeLa cells (ENST00000482108) and indeed observe coverage downstream of the annotated stop codon, consistent with a frameshifting event that results in an extended protein isoform being translated. Visual assessment of the read distribution between the main ORF and the "ORF extension" does not show a substantial difference between control and eIF2A knock-out cells ( Author response image 2A-B). Additionally, we quantified the ratio of reads mapping to the PEG10 ORF upstream of the slippery site versus those mapping downstream, extending into the predicted longer protein. Nonetheless, we could not detect significant changes between control and eIF2A-KO cells in either tested condition ( Author response image 2C-D).

      Author response image 2.

      (A-B) Average reads occupancy on the PEG10 (ENST00000482108) transcript in DMSO treated (panel A, n=3) or tunicamycin treated samples (panel B, n=2) derived from either control (black) or eIF2A-KO (red) HeLa cells are shown. Reads counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C-D) The ratio of reads mapping to the ORF upstream of the slippery site to reads mapping to the predicted extended protein downstream to the slippery site is shown. Reads counts were normalized to the sequencing depth. Neither DMSO treated samples (panel C) nor tunicamycin treated samples (panel D) had a significant difference between control and eIF2A-KO cells. Significance by unpaired, two-sided, t-test. ns = not significant.

      (3) Introduction 

      Given the volume of unreliable claims regarding eIF2A in the literature and the overall confusion it is very difficult (may even be impossible) to write a clear coherent introduction into the topic. Nonetheless, there are few points that need to be taken into account. 

      The authors state that eIF2A is capable to recruit initiator tRNA citing Zoll et al 2002. This activity was later shown to be a biochemical artefact (which was most likely reproduced by Kim et al 2018), eIF2A fraction was contaminated with eIF2D which does bind tRNAs in GTP-independent manner. eIF2A purified from RRL separates from initiator tRNA binding activity, see Dmitriev et al 2010 DOI: 10.1074/jbc.M110.119693. This point is also relevant to the second paragraph of Discussion, it should be acknowledged that it has been shown previously that eIF2A does not bind the initiator tRNA.

      We appreciate the advice provided by the reviewer. We have modified both the introduction and the 2nd paragraph of the discussion to reflect that the tRNA-binding activity is due to contaminating eIF2D rather than eIF2A.

      In many cases the authors describe certain claims as facts even though they refute them themselves. For example 

      "Such eIF2A-driven non-AUG initiation events were shown to play a crucial role in different aspects of cell physiology and disease progression: cellular adaptation during the integrated stress response (Chen et al., 2019; Starck et al., 2016)"  While non-AUG initiation events do play crucial roles in different aspects of cell physiology (reviewed in Andreev et al 2023 doi: 10.1186/s13059-022-02674-2) eIF2A has nothing to do with it as the authors show themselves. Therefore different language should be used, e.g.. "eIF2A has been suggested (or proposed or reported) to be responsible for non-AUG initiation events that were shown to play ..." 

      The word "shown" is used in many other instances for the claims that the authors refute. "Shown" is only appropriate for strong evidence that leaves little doubt. 

      We agree with the reviewer and made the suggested changes in the text.

      (4) Supplementary Fig. 1. 

      Panel C is used to argue that eIF2A has a higher concentration than in the nucleus, perhaps it is worth explaining how this conclusion was drawn. If levels in cytoplasm are comparable to GAPDH and Tubulin but less than c-Myc in nucleus does it really mean that there is less eIF2A in the nucleus than in cytoplasm? This is not obvious to us. Also, presumably WCL stands for Whole Cell Lysate, it would be nice to introduce this abbreviation somewhere. 

      To compare levels of eIF2A in the nuclear and cytosolic fractions, we lysed the two fractions in equal volumes of buffer (i.e. the cytosolic fraction was extracted in 200 µl of hypotonic buffer, and the nuclear fraction was extracted in 200 µl of cell extraction buffer). This assures that per microliter of lysate we have the same number of "cytosols" or nuclei. Hence, equal intensity bands in the cytosolic and nuclear fractions would mean that half of the protein is in the nucleus and half is in the cytosol. We originally described this in the Methods section, but now also mention it in the Results and in the figure legend.

      We replaced WCL with "whole cell" in the figure. 

      (5) The differential translation analysis is described very briefly "To obtain values of translation efficiency, log2 fold changes, and adjusted p values the DESeq2 software package was used". Was TE calculated based on ribosome footprint to RNA-seq ratios? How exactly DESeq2 was used here? TE measured in this way spuriously correlates with RNA-seq values, see Larsson et al 2010 DOI: 10.1073/pnas.1006821107, perhaps it would be worse assessing differential translation with anota2seq (Oertlin et al 2019 doi: 10.1093/nar/gkz223.)? Anota2seq avoids calculating the ratios and enables comprehensive analysis of differential translation including detection of buffered translation which might be the case here while avoiding artefacts that may arise from varying RNA levels.  

      We now specified in more detail in the Methods section how we analyzed the data. Indeed, the DeSeq2 was used on translation efficiency values, which we calculated as the ratio of ribosome footprints to RNA-seq. 

      As suggested, we have now also performed the analysis using anota2seq (Suppl. Fig. 3C) and this analysis identified zero transcripts that are translationally regulated, in agreement with our analysis.

      (6) Section "eIF2a-inactivating stresses do not redirect tRNA delivery function to eIF2A." 

      The description of ISR mechanism is a bit inaccurate. Strictly speaking eIF2alpha phosphorylation does not inactivate it eIF2alpha. It results in formation of a very stable eIF2*GDP*eIF2B complex, thus severely depleting eIF2B which serves as a GEF for eIF2. This in turn reduces the ternary complex (eIF2*GTP*tRNAi) concentration since there is no free eIF2B to exchange GDP for GTP. Without getting into much detail, we think it would be more accurate to say that eIF2alpha phosphorylation leads to ternary complex depletion instead of saying that stress inactivates eIF2alpha. 

      We agree with the reviewer - we were trying to use simple, compact wording. We have now reworded the section title to "No detectable role for eIF2A in translation when eIF2 is inhibited" and rephrased the subsequent text to be correct.

      Also the subtitle uses eIF2a with small a that stands for alpha which potentially could lead to substantial confusion since in this case the difference between eIF2alpha and eIF2A is only in capitalisation of the last letter, many text-mining engines such as modern LLMs may not be able to pick the differences. Perhaps it would be better to refer to eIF2alpha by the HGNC approved name of its gene - eIF2S1 to avoid further confusions. For clarity it may be stated at the beginning that eIF2S1 is commonly known as eIF2alpha. 

      We thank the reviewer for this point. We have removed all instances of eIF2a (with lowercase a) from the manuscript to avoid this source of confusion. In the first instance of eIF2a we also added the official HGNC gene name. However, we prefer to use eIF2a instead of eIF2S1 because people outside the translation field tend to know the subunit as eIF2a, and we think it is important that also people outside the translation field read this manuscript, since some of the questionable papers on eIF2A come from labs working at the interface between translation and other fields.

      Minor 

      Introduction 

      (7) "uses the CAT anticodon" change CAT to CAU 

      We corrected CAT to CAU

      (8) "In the canonical initiation pathway", change "canonical" to "most common", canonical is somewhat a judgemental statement that originates in theology. Same applies to numerous occurrences of "canonical AUG", simply using "AUG" would be simpler and more accurate as you will avoid giving impression that there are "non-canonical AUGs".  

      Done.

      (9) "eIF2A was initially considered to be a functional analogue of prokaryotic IF2 (Merrick and Anderson, 1975), however later this role was reassigned to the above-mentioned heterotrimeric factor eIF2 (a,b,g) (Levin et al., 1973)." - there is a chronological contradiction within this sentence, the initial consideration is attributed to 1975 while its later reassignment to 1973. 

      We are grateful to the reviewer for spotting this mistake. There was a citation problem; we fixed it and now cite the correct paper for the initial discovery of eIF2A to PMID 5472357 (Shafritz et al 1970).

      (10) "On the other hand, studies on the role of eIF2A on viral IRES translation have arrived at conflicting results." Remove "On the other hand" since conflicting results have been mentioned above. In fact the entire sentence is somewhat redundant given prior "For example, eIF2A has been studied in the context of internal ribosome entry sites (IRES), where it was found to act both as a suppressor and an activator of IRESmediated initiation."  

      We have rewritten the paragraph to make it more coherent.

      (11) Fig. 1. C-D. is using CHX abbreviation for cycloheximide, this need to be mentioned on the legend or elsewhere in the text. Otherwise CHX may not be clear for a reader uninitiated in ribosome profiling. 

      We now mention in the figure legend that CHX stands for cycloheximide and indicate that it was used as a negative control to block translation. 

      (12) Page 7, section "Ribosome profiling reveals a few eIF2Adependent transcripts" 

      In this section you describe ribosome profiling experiments and identify few transcripts whose translation seems to be changing based on ribosome profiling data. Then you attempt to verify them using gene expression reporters and reasonably suggest that these are false positives. In essence this section argues that there are no eIF2A-dependent transcripts, therefore the title of this subsection is misleading, it makes sense to rename it so that it better reflects the content of this section. 

      We agree and have renamed the section to "Ribosome profiling identifies no eIF2Adependent transcripts"

      (13) Page 8, top. Rephrase "To do this, we performed ribosome profiling on control and eIF2AKO cells, which sequences the mRNA footprints protected by ribosomes."  

      Fixed.

      (14) Page 10, bottom. "Several studies have reported that eIF2A can delivery alternative initiator tRNAs to uORFs with nearcognate start codons". Change "delivery" to "deliver". 

      Thanks for spotting it. We corrected to “deliver”

      (15) Page 13 "This suggests that, as in non-stressed conditions, eIF2A has a minimal effect on global translation also when eIF2a activity is low." - rephrase to avoid impression that eIF2alpha activity is low in normal conditions, also please see comment #6 above. 

      We fixed this sentence to read: “This suggests that, as in non-stressed conditions, eIF2A has a minimal effect on global translation also when the integrated stress response is active.”

      Reviewer #3 (Recommendations for the authors):

      - The experimental data in Fig. S5E do not support the claim of increased eIF2 phosphorylation on TM treatment; although, comparing Fig. S5A with Fig. 1B supports a marked reduction in bulk translation and the reporter data in Fig. 4A show the expected induction of the uORF-containing reporters by TM. Because these are the conditions employed for ribosome profiling in stress conditions shown in Fig. 4B, it would be reassuring to document TM-induced translational efficiencies of ATF4 and the other known mRNAs resistant to eIF2 phosphorylation in the ribosome profiling data, including gene browser images of the replicate experiments. If the induction of TEs by TM for such mRNAs was not robust, it would be valuable to repeat the analysis using arsenite (SA) treatment, which produces a greater inhibition of bulk translation. 

      Unfortunately, the eIF2alpha antibody is not very good and also detects the nonphosphorylated protein, causing high background and poor apparent induction in response to tunicamycin. The fact that the ISR was activated is visible from the induction of ATF that was assessed by western blot in the Suppl. Fig. 5E. To ensure that our ribosome profiling libraries also recorded the activation of ISR we built single gene plots for ATF4 both in control and HeLa eIF2A-KO cell. As shown in  Author response image 3 A&B in both cell lines tunicamycin treatment led to the induction of ATF4. This can also be seen by the 4-fold induction in ATF4 translation efficiency in response to tunicamycin in both WT and eIF2A-KO cells ( Author response image 3C). Additionally, we checked that another marker induced by tunicamycin, HSPA5, is also translationally upregulated in both cell lines, as well as the downstream target of ATF4 – PPP1R15B. ( Author response image 3C). 

      Author response image 3.

      (A-B) Average read occupancy on the ATF4 (ENST00000674920) transcript in DMSO treated (n=3) or tunicamycin treated samples (n=2) derived from either control (panel A) or eIF2A-KO (panel B) HeLa cells are shown. Read counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C) Scatter plot of log2(fold change) of Translation Efficiency TM/DMSO for control cells on the xaxis versus eIF2AKO cells on the y-axis. The induction of ATF4 as well as the downstream target PPP1R15B are shown. The upregulation of HSP5A translation, the other hallmark of ER-stress induced by tunicamycin treatment is shown.

      - It should be pointed out in the text that in both published studies being cited here of cells lacking eIF2A, that by Gaikwad et al. on a yeast eIF2A deletion mutant, and that by Ichihara et al. on human HEK293 CRISPR KO cells, the analyses included stress conditions in which eIF2 phosphorylation is induced (amino acid starvation or SA treatment, respectively), as was conducted here.  

      Good point - we added this information into the introduction: 

      "Furthermore, loss of eIF2A in several systems did not recapitulate these effects on non-AUG initiation in either non-stressed or stress conditions (caused either by amino acid depletion or sodium arsenate treatment) (Gaikwad et al., 2024; Ichihara et al., 2021)."

      - The Ichihara et al. (2021) study just mentioned reached some of the same conclusions for HEK cells obtained here by conducting ribosome profiling in untreated and SA-treated cells, finding only 1 mRNA (untreated) or four mRNAs (SA-treated cells) that showed significantly reduced TEs in the eIF2A knockout vs. parental cells. It seems appropriate for the authors to expand their treatment of this prior work by summarizing its findings in some detail and also noting how their study goes beyond this previous one. 

      We have added a paragraph to the discussion pointing out that our data agree fully with Ichihara et al. (2021), and that Ichihara et al. (2021) also found only very few mRNAs that change in TE upon loss of eIF2A in either non-stressed or stressed conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Comments on introduction:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003_)._

      Comments on materials and methods:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      Comments on results:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      Comments on discussion:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high-fat diet are due in part to adipokinetic hormone (Akh) signaling activation. High-fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on a high-fat diet. Elimination of one of two AkhR-expressing cardiac neurons results in arrhythmia similar to a high-fat diet.

      Strengths:

      The authors propose a novel mechanism for high-fat diet-induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Weaknesses:

      Major comments:

      (1) The authors state, "Arrhythmic pathology is rooted in the cardiac conduction system." This assertion is incorrect as a blanket statement on arrhythmias. There are certain arrhythmias that have been attributable to the conduction system, such as bradycardic rhythms, heart block, sinus node reentry, inappropriate sinus tachycardia, AV nodal reentrant tachycardia, bundle branch reentry, fascicular ventricular tachycardia, or idiopathic ventricular fibrillation to name a few. However the etiological mechanism of many atrial and ventricular arrhythmias, such as atrial fibrillation or substrate-based ventricular tachycardia, are not rooted in the conduction system. The introduction should be revised to reflect a clear focus (away from?) on atrial fibrillation (AF). In addition, AF susceptibility is known to be modulated by autonomic tone, which is topically relevant (irrelevant?) to this manuscript.

      Thank you for the helpful comment. We rephrased the sentence as “Arrhythmic pathology is often rooted in the cardiac conduction system”.

      (2) The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high-fat diet compared to normal-fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      We also observed that fly heart rate is highly variable with periods of fast and slow rates. To control heart rate variability, Ocorr et al. used semi-intact flies to record the heartbeat  (https://doi.org/10.1073/pnas.0609278104). We consider it a rigorous method to get highly consistent results with high quality videos/images. Since our work has a focus on the neuronal inputs to the heart, we did not use the semi-intact method. Our concern is that it is likely to disrupt the neuronal processes during the dissection. Using OCT, we recorded the heartbeat of intact flies in an 8 s time window, when the heartbeat was relatively stable. The different groups of flies, which were fed on a high-fat diet or a normal-fat diet, were recorded using the same method. Thus, we could compare the differences in heart rate.

      (3) The authors state, "to test whether the HFD-induced increase in Akh in the APC affects APC neuron activity, we used CaLexA (https://doi.org/10.3109/01677063.2011.642910)." According to the reference, CaLexA is a tool to map active neurons and would not indicate, as the authors state, whether Akh affects APC neuron activity specifically. It is equally possible that APC neurons may be activated by HFD and produce more Akh. Please clarify this language.

      Thank you for clarifying the calcium reporter, CaLexA. We rephrased this sentence to “to test whether HFD affects APC neuron activity, we used CaLexA”.

      (4) Are the AkhR+ neurons parasympathetic or sympathetic? Please provide additional experimentation that characterizes these neurons. The AkhR+ neurons appear to be anti-arrhythmic. Please expand the discussion to include a working hypothesis of the overall findings on Akh, AkhR, and AkhR+ neurons.

      Noyes et al. showed that Akh treatment increases heartbeat (Noyes, B. E., F. N. Katz, and M. H. Schaffer. 1995. “Identification and Expression of the Drosophila Adipokinetic Hormone Gene.” Molecular and Cellular Endocrinology 109 (2): 133–41.), suggesting that AkhR+ neurons are sympathetic. We showed that high-fat diet induced Akh expression and secretion, which led to stimulation of AkhR+ neuron and increased heart rate, supporting the sympathetic role of the AkhR+ neurons. Additional explanation on the sympathetic & anti-arrhythmic role of the Akh, AkhR, and AkhR+ neurons were added to the discussion.

      (5) The authors state, "Heart function is dependent on glucose as an energy source." However, the heart's main energy source is fatty acids with minimal use of glucose (doi: 10.1016/j.cbpa.2006.09.014). Glucose becomes more utilized by cardiomyocytes under heart failure conditions. Please amend/revise this statement.

      Thank you for pointing this out and providing the reference. We rephrased this sentence “Heart function is dependent on continuous ATP production. Cardiac ATP in Drosophila might come from fatty acids, glucose, and lactate (Kodde et al., 2007), as well as trehalose.”

      Reviewer #2 (Public Review):

      This manuscript explores mechanisms underlying heart contractility problems in metabolic disease using Drosophila as a model. They confirm, as others have demonstrated, that a high-fat diet (HFD) induces cardiac problems in flies. They showed that a high-fat diet increased Akh mRNA levels and calcium levels in the Akh-producing cells (APC), suggesting there is increased production and release of this hormone in a HFD context. When they knock down Akh production in the APCs using RNAi they see that cardiac contractility problems are abolished. They similarly show that levels of the Akh receptor (Akhr) are increased on a HFD and that loss of Akhr also rescues contractility problems on a HFD.

      One highlight of the paper was the identification of a pair of neurons that express a receptor for the metabolic hormone Akh, and showing initial data that these neurons innervate the cardiac muscle. They then overexpress cell death gene reaper (rpr) in all Akhr-positive cells with Akhr-GAL4 and see that cardiac contractility becomes abnormal.

      However, this paper contains several findings that have been reported elsewhere and it contains key flaws in both experimental design and data interpretation. There is some rationale for doing the experiments, and the data and images are of good quality. However, others have shown that HFD induces cardiac contractility problems (Birse 2010), that Akh mRNA levels are changed with HFD (Liao 2021) that Akh modulates cardiac rhythms (Noyes 1995), so Figures 1-4 are largely a confirmation of what is already known. This limits the overall magnitude of the advances presented in these figures. Overall, the stated concerns limit the impact of the manuscript in advancing our understanding of heart contractility.

      We thank the reviewer for the positive comments and appreciate the reviewer for the instructive suggestions. Birse 2010 (PMID: 21035763) was cited in our manuscript. Liao 2021 showed that Akh mRNA levels are changed with HFD. We added the reference to the revised manuscript and modified the text as: “In consistent with a previous work (Liao et al., 2020), we showed that the expression of Akh was significantly up-regulated in the flies fed a HFD, compared to NFD-fed flies (Figure 2B)”. Our qPCR verified Liao’s results. On top of this, we investigated the calcium levels in the Akh producing cells (APCs) and showed elevated calcium levels in the APC in HFD fed flies. In the revised version, we added more data to show that Akh protein levels were increased with HFD (Figure 2E-F). In line with Noyes' discovery, which showed that Akh injection caused cardioaccelation in prepupae, we showed that genetic manipulation of Akh expression affected heartbeat in the adults.   

      Reviewer #3 (Public Review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' augments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      It is intriguing to see an increase in Akh mRNA levels in HFD-fed animals. This is a key result for linking HFD-induced arrhythmia to Akh. Thus, demonstrating that HFD also increases the Akh protein levels and Akh is secreted more should significantly strengthen the manuscript.

      Thank you for the positive comments and the instructive suggestions. We performed immunostaining to show that Akh protein levels increased, which is consistent with elevated Akh mRNA expression in HFD-fed flies. The data was added to Figure 2, panels E and F. Akh secretion from the APCs is regulated by APC activity (https://doi.org/10.1038/s41586-019-1675-4). We used a calcium reporter CaLexA (https://doi.org/10.3109/01677063.2011.642910) to monitor APC activity and showed that HFD increased APC activity (Figure 2, C-D).

      The experiments employing an AkhR null allele nicely demonstrate its requirement for HFD-induced cardiac arrhythmia. Depletion of Akh in Akh-expressing cells recapitulates the consequence of AkhR knockout, supporting that both Akh and its receptor are required for HFD-induced cardiac arrhythmia. Given that RNAi is associated with off-target effects and some RNAi reagents do not work, testing multiple independent RNAi lines is the standard procedure. It is also important to show the on-target effect of the RNAi reagents used in the study.

      Indeed, RNAi approaches can suffer from off-target effects. For Akh experiments, we used an RNAi line BL_34960, which was generated using artificial microRNAs shRNA (DOI: 10.1038/nmeth.1592). In comparison to long-hairpin constructs, shRNA constructs are expected to be advantageous, e.g., more efficient and minimized off-target. We performed immunostaining to determine Akh-Gal4>UAS-Akh-RNAi efficiency. We showed that anti-Akh fluorescence diminished in Akh-Gal4>UAS-Akh-RNAi APCs. The data was added to Figure 3-figure supplement 1.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. The experiments presented in Figure 6 cannot justify the authors' conclusion. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutants could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs will allow for specific manipulation of ACNs, which is crucial for studying the specific role of ACNs in controlling cardiac rhythms.

      Thank you for the insightful comments. We have been trying to find a way to only target the AkhR neurons using split-Gal4. Up to now, it’s not successful. Akh/AkhR signaling shall play a key role in the ACNs, however, we cannot rule out the possibility that ACNs also receive signals other than Akh in the modulation of heartbeat.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

      We added more data to show that AkhR+ neurons are positive in anti-Akh staining, indicating the AkhR+ neurons indeed receive Akh.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Typo in line 765: "increased Akh section into the circulation." Section should be secretion.

      Thank you for finding the typo. We changed section to secretion.

      Reviewer #2 (Recommendations For The Authors):

      One interesting extension to our knowledge in Figures 3 & 4 is that loss of Akhr and loss of Akh both block the cardiac contractility defects that accompany a HFD. The main concern I have with the Akh finding is that the authors use only a GAL4 control and no UAS alone control. Metabolic phenotypes often show strain-specific effects, so to make conclusions it is essential that the authors include a UAS alone control alongside the other genotypes to be sure it does not rescue the cardiac contractility defects that accompany a HFD by itself.

      I am interested in the authors' identification of a pair of Akhr-positive neurons that innervate the cardiac muscle. I am not aware of any other studies identifying these neurons, or revealing their function. The contents of Figure 5 therefore represent the largest advance in the study. However, the characterization of these neurons is very superficial, and a lot more work to understand their regulation and function in a HFD context is needed to make conclusions about their role in any HFD-induced cardiac contractility problems. Or to determine how Akh influences the function of these specific neurons in an HFD context.

      The reason I say this is that the authors ablate all Akhr-positive cells in Figure 6 and show that this disturbs normal cardiac contractility. While studies on the one pair of Akhr-positive neurons would be really interesting, ablating all Akhr-positive cells, which includes the fat and many other cell types in the fly, is not a scientifically rigorous approach to answering this question. As a result, the authors are only able to make the claim that ablating many cell types throughout the animal disrupts cardiac contractility, which does not advance our understanding of mechanisms underlying heart contractility problems. In addition, because the experiments they designed did not test whether it was Akh binding to Akhr on those neurons that regulate cardiac contractility problems in a HFD context, their experiments do not support their model in Figure 7.

      The authors also make conclusions that are fairly speculative around Line 231 when describing their model in Figure 7. These claims are simply not supported by the data they present and must be removed. For example, the authors have not identified an endocrine-heart axis, they simply showed that changes in Akh can influence the heart, but this is not necessarily a direct effect on a specific cell type. They do not show data that Akh binds the newly identified Akhr-positive neuron pair to mediate the effects of HFD-induced contractility defects - they just ablate all Akhr-positive cells (fat, neurons, and other types) and show cardiac defects. If those neurons did mediate the abnormal cardiac rhythm promoted by Akh, then ablating those neurons (and not a large number of additional tissues) should rescue HFD-induced heart defects just like reducing Akhr or Akh did (but this is the opposite of what they see). Overall, concerns with experimental design, data interpretation, and relatively few findings that aren't reported elsewhere reduce the impact of this paper.

      We appreciate the positive comments and helpful suggestions. Indeed, it is important to get clean genetic access to the cardiac neurons. We intended to use split Gal4 system to target the AkhR cardiac neurons. We have tried to build a split Gal4 driver AkhR-p65.AD. Two rounds of injection were carried out. However, we did not recover a transgenic line.

      In the revised version, we performed immunostaining using Akh antibodies to show that anti-Akh fluorescence was observed in AkhR neurons (Figure 5-figure supplement 1), indicating an endocrine-heart axis.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

    1. Author response:

      Our response aims to address the following:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we will incorporate them in the revised manuscript.

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We will clarify this in the discussion section of the revised manuscript as advised.

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Mast cells have previously been reported to play an important role in bacterial immune defense and act protectively in sepsis. However, many of these findings were based on studies using Kit mutant mice. In this study, the authors conducted a detailed investigation using mast cell-deficient Cpa3 Cre-Master mice. As a result, the authors found that the Cpa3 Cre-Master mice exhibited responses similar to wild-type mice in terms of bacterial immune defense. This suggests that the observed phenotype is not due to mast cell-dependent bacterial immune defense, but rather is associated with dysbiosis of the gut microbiota.

      Strengths:

      Mast cells have long been reported to play an important role in the protective response against sepsis, and their function in infection defense has been demonstrated. However, Kit mutant mice have been reported to exhibit impaired peristalsis, and several mast cell-specific genetically modified mouse lines have since been developed and examined in detail. This study presents an important finding by logically demonstrating that the exacerbation of sepsis in Kit mice is due to alterations in the gut microbiota, and that the phenotype previously thought to be mast cell-dependent was, in fact, not.

      In addition, the experiments were carefully designed using mice with matched genetic backgrounds. These findings underscore the importance of microbiota composition in interpreting immune phenotypes and highlight the need for co-housing controls in mutant mouse studies.

      A major strength of this work is the robustness of the CLP data, generated over eight years by three independent researchers across two institutions with large sample sizes, lending strong support to the conclusions.

      Weaknesses:

      The study assesses only a limited subset of gut bacterial species, leaving the extent to which E. coli expansion contributes to the observed phenotype unclear.

      We will add new data based on 16S rRNA sequencing to the revised version.

      Moreover, in the cohousing experiments, there is no evidence provided to confirm successful microbiota normalization between groups.

      We note that co-housing is a generally accepted method for microbiota equalization or conversion (Caruso et al., Cell Rep. 2019, Ridaura et al., Science 2013, and reviewed in Moore et al., Clin. Transl. Immunol. 2016). In any case, Kit<sup>W/Wv</sup> mutants were made resistant to CLP by co-housing. Similar microbiota sequencing results between groups,while useful, would again only be correlative.

      A more detailed analysis of the microbial composition would be necessary to strengthen the reliability of the findings.

      See above

      It is also important to note that Cpa3-deficient mice exhibit not only mast cell depletion but also defects in basophils and T cells. These additional immunological alterations may counterbalance one another, potentially masking phenotypic changes and complicating interpretation.

      Regarding basophils in Cpa3<sup>Cre</sup> mice, compared to wild-type mice, basophils are reduced to about 39% of normal (Feyerabend et al., Immunity 2011). In Kit<sup>W/Wv</sup> mice, compared to wild-type mice, basophils are reduced to about 11% of normal. To our knowlegde, there has been no phenotype reported in which a reduction in basophils compensates for the loss for mast cells. Given that Kit<sup>W/Wv</sup> mice have about threefold lower numbers of basophils, and are highly susceptible to sepsis, there is no evidence that a reduction in basophils is protective in mast cell-deficient mice. On the contrary, mice that were normal for mast cells but had their basophils depleted were more susceptible to sepsis (Piliponsky et al., Nat. Immunol. 2019). Hence, basophils appear to be protective, and their reduction increases susceptibility. In light of these data and considerations, there is no evidence for a reduction in basophils to counterbalance the loss of mast cells in Cpa3<sup>Cre</sup> mice.

      Regarding T cells, there is no evidence, and there are no reports, that Cpa3<sup>Cre</sup> mice have defects in T cells (Feyerabend et al., Immunity 2011, Feyerabend et al., Cell Metabolism 2016). Cpa3 is weakly and transiently expressed early in the T cell lineage (Feyerabend et al., Immunity 2009; for expression levels in T cells versus mast cells, see below figure from the Immgen Database). In summary, in contrast to the reviewer's claim, there are no known defects in T cell development or functions in Cpa3<sup>Cre</sup> mice.

      Author response image 1.

      Generated from the Immgen database. Shown are RNAseq gene expression levels of diverse T-cell and mast cell populations.

      Furthermore, it remains to be determined whether the altered gut microbiota observed in Kit<sup>W/Wv</sup> mice is a consequence of impaired intestinal motility, whether a similar phenotype is observed in KitW-sh/W-sh mice, and whether comparable results occur in SCF-deficient models. Addressing these questions would provide greater clarity on the contribution of mast cells versus secondary factors in the observed phenotypes.

      Mice without mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice. Hence, mast cells are not involved in the immunity against sepsis, and 'secondary factors' are not involved in this simple experiment (both groups of mice, wild type and Cpa3<sup>Cre</sup> mice, were on the idential genetic background). Second, Kit<sup>W/Wv</sup> mice are also as resistant to sepsis as wild-type mice when confronted with the identical intestinal slurry. Therefore, Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. Hence, in our view, the underlying immunological question regarding the role of mast cells in sepsis has been conclusively addressed by our data. Future studies may address the mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice, and other Kit mutants and steel mutants could be examined as well. These questions are, however, unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis, and would therefore not affect the central conclusion of our manuscript ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency").

      Given that Kit<sup>W/Wv</sup> mice exhibit impaired peristalsis, is the observed increase in E. coli a consequence of this dysfunction?

      See above

      Previous studies with BMMC reconstitution experiments have indicated that mast cells are a source of TNF - how does this align with the current findings?

      It is possible that cultured and transplanted mast cells (BMMC) produce TNF. Given that we did not find a reduction in TNF levels in the peritoneal lavage or serum in mice without mast cells undergoing sepsis, under physiological conditions mast cell-derived TNF does not seem to have a measuable impact on total TNF levels.

      Reviewer #2 (Public review):

      Summary:

      This study presents a useful finding that the high susceptibility to CLP sepsis of Kit-mutant mice is not due to mast cell deficiency, but to dysbiosis.

      However, the present data are insufficient and incomplete to support the conclusion, and would benefit from more rigorous approaches. With the mechanism part strengthened, this paper would be of interest to researchers on mast cell biology and mucosal immunology.

      We disagree with this view that our data are insufficient and incomplete. Our results demonstrate that mice lacking mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice, indicating that mast cells do not play a detectable role in immunity against sepsis. Additionally, we show that Kit<sup>W/Wv</sup> mice exhibit the same resistance to sepsis as wild-type mice when confronted with the identical intestinal slurry. This finding demonstrates that Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. These central data are both sufficient and complete, given that our data fully address the immunological questions regarding the role of mast cells in sepsis. Our study aimed to investigate the role of mast cells in sepsis, not to examine the mechanisms of dysbiosis or associated pathological phenotypes in Kit mutant controls.

      Recommendations:

      (1) The authors showed that E. coli increases in the cecum of Kit-mutant mice, which causes high CLP susceptibility. However, they did not provide any evidence E. coli is responsible for the high susceptibility.

      We showed that E. coli CFUs were increased in the cecum of Kit-mutant mice, but we did not state that this causes CLP susceptibility. We wrote: 'Hence, Kit<sup>W/Wv</sup> microbiota contains high levels of E. coli, which may underlie the observed pathogenicity'. We demonstrated that intestinal slurry from Kit<sup>W/Wv</sup> mice is more pathogenic compared to intestinal slurry from wild-type mice. However, we did not search for, or identify the bacterial species that causes this increased pathogenicity because we were adressing the role of mast cell in sepsis. 

      In the Figure 3 experiments, the authors administered the same number of cecal bacteria and did not show the number of E. coli after the administration.

      The samples were split and one aliquot was analysed by microbiology and the other aliquot was injected intraperitoneally. Fig. 3d shows the colony forming units (for Lactobacilli and E coli) from aliquots of cecal slurry used in the intraperitoneal injection experiments shown in Fig. 3a-c. Hence, our data show the colony forming units that were injected into the mice. It is unclear to us why this is not the key information rather than 'the number of E. coli after the administration'.

      The authors should provide evidence showing that depletion of E. coli decreases susceptibility.

      See response to point 1 above.

      (2) The author should provide direct evidence of dysbiosis by, for example, shotgun sequencing of cecal and fecal contents.

      The large increase in E coli counts in Kit<sup>W/Wv</sup> is evidence of dysbiosis. To obtain data beyond classical microbiology, we also performed 16S rRNA sequencing which will be included in the revision.

      (3) In case the authors find dysbiosis, they should analyze the mechanisms by which Kit mutation causes dysbiosis.

      The mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice (which emerged from our work) belongs to other research areas that address the role of Kit in intestinal pathophysiology. These questions are unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis. Regardless of the results of such experiments, the conclusion ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency") remains unaffected. In brief, further explorations of pathological phenotypes of a control mutant will not add to the core message. Along these lines, the review process and the revision shall center on making the core of a paper as conclusive as possible, and not widen a paper by requests 'tangential to the main conclusion' (Kaelin Jr. Nature 2017).

      References

      Caruso, R., Ono, M., Bunker, M. E., Núñez, G. & Inohara, N. Dynamic and Asymmetric Changes of the Microbial Communities after Cohousing in Laboratory Mice. Cell Rep. 27, 3401-3412.e3 (2019).

      Feyerabend, T. B. et al. Deletion of Notch1 Converts Pro-T Cells to Dendritic Cells and Promotes Thymic B Cells by Cell-Extrinsic and Cell-Intrinsic Mechanisms. Immunity 30, 67–79 (2009).

      Feyerabend, T. B. et al. Cre-Mediated Cell Ablation Contests Mast Cell Contribution in Models of Antibody- and T Cell-Mediated Autoimmunity. Immunity 35, 832–844 (2011).

      Feyerabend, T. B., Gutierrez, D. A. & Rodewald, H.-R. Of Mouse Models of Mast Cell Deficiency and Metabolic Syndrome. Cell Metab 24, 1–2 (2016).

      Kaelin Jr, W. G. Publish houses of brick, not mansions of straw. Nature 545, 387–387 (2017).

      Moore, R. J. & Stanley, D. Experimental design considerations in microbiota/inflammation studies. Clin. Transl. Immunol. 5, e92 (2016).

      Piliponsky, A. M. et al. Basophil-derived tumor necrosis factor can enhance survival in a sepsis model in mice. Nat. Immunol. 20, 129–140 (2019).

      Ridaura, V. K. et al. Gut Microbiota from Twins Discordant for Obesity Modulate Metabolism in Mice. Science 341, 1241214 (2013).

    1. Author response:

      The following is the authors’ response to the previous reviews

      In response to Reviewer #1, we have replaced the original images in Figure 1A with new immunofluorescence data showing matched DAPI staining density between control and AD patient samples. We also have updated the PINK1 staining images of mouse brain sections in Figure 1C to eliminate potential non-specific signals. These revisions provide clearer evidence supporting our conclusions about PINK1/pUb’s role in neurodegeneration.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism. The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.

      The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

      Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

      Recommendations for the authors:

      We thank the reviewers for their comments which have improved and strengthened our manuscript. Please see our responses to specific comments below in blue.

      Reviewer #1 (Recommendations for the authors):

      I have several small comments:

      (1) There has been recently a preprint from the Yao lab about the role of NR2F2 is steroidogenic cells (https://www.biorxiv.org/content/10.1101/2024.09.16.613312v1). They performed cKO of NR2F2 using the Wt1creERT2 and found similar results. You should present and discuss this paper in light of your results.

      Estermann et al., report a very similar phenotype of FLC hypoplasia in an independent mouse model of Nr2f2 conditional mutation. We have now referred to this article in the discussion of our manuscript as suggested.

      (2) In the introduction I think it is important to mention that the steroidogenic progenitors are derived from Wnt5a positive cells (https://pubmed.ncbi.nlm.nih.gov/35705036/).

      We have mentioned this point in the introduction as suggested.

      (3) In both models you show a decrease in the number of FLC (60% or 40%) and yet they both present with undescended testis. It is important to discuss the fact that there is no need for a complete ablation of testosterone and INSL3 in order to get cryptorchidism.

      We have mentioned this point in the discussion as suggested.

      The fact that you get only partial reduction in FLC is likely due to redundancy with additional factors, possibly the ARX like you stated in the discussion and it will be interesting to explore that in the future but is beyond the scope of the current paper.

      We agree with the reviewer, this question could be addressed by analyzing Arx,Nr2f2 double mutants.

      (4) In page 8 line 11 you mention data not shown- not sure if this is allowed in the journal .

      The data is now shown in Figure S5A as suggested.

      (5) In Figure 2- it will be good if you add a schematic model of the mouse strains used as well as the experimental and control mice next to the Tam scheme. Similar scheme should be in figure 3 for Nr5a1-cre.

      We have modified Figures 2 and 3 as suggested.

      (6) There is a clear and pronounced effect of the testis cords number and size. It will be good if you could qualify testis cord numbers/ diameter in the mutants even if you do not follow in detail the effect on Sertoli cells

      We have quantified testis cords numbers and area in E14.5 Control and Wt1<sup>CreERT2/+</sup>; Nr2f2<sup>flox/flox</sup> testes. This data is now shown in Figure S2M.

      (7) It will be good to present the undescended testis in the Wt1-cre model in figure 2 and not in the supp figure

      The data is now shown in Figure 2H-I as suggested.

      (8) Please add labelling of the testis, kidney, bladder, vas deferens in figure 3 N+O and in the Wt1-cre model

      We have added the labels in Figures 2 and 3 as suggested.

      (9) In figure 5 which present both models- it will be good to use the scheme I suggested before to highlight which results refer to which ko model.

      We have modified Figure 5 as suggested.

      Reviewer #2 (Recommendations for the authors):  

      The work presented in this manuscript gave me food for thought. I have always been intrigued by the fact that of the large number of interstitial cells in the testis, a minority differentiate into mature androgen-producing Leydig cells. In other words, how is the number of functional steroidogenic cells defined from a large pool of progenitor cells (ARX and NR2F2 positive ones)? This may have a link with the levels of androgens produced (a kind of feedback control) or the effectiveness of these androgens on the target tissues (i.e.: as spermatogenesis efficiency in adults). In addition, there must be specific signals (probably linked to gonadotropins) that induce the recruitment of Leydig cells from the progenitor pool. Perhaps the genetic models generated in this study could help to address these questions. I leave it to the authors to judge.

      We agree with the reviewer. How NR2F2 (and other factors) integrate extrinsic cues to regulate the recruitment of a subset of interstitial steroidogenic progenitors along the Leydig cell differentiation pathway is a fascinating question beyond the scope of this work.

      In addition to this reflection, I propose a few minor modifications likely to improve the quality of the manuscript:

      (1) Page 3, lane 3: I suggest to replace "growth" by "differentiation"

      We have modified the text as suggested.

      (2) Page 3, lane 4: the "scrotum" is missing in the parenthesis. Please add it before "and penis"

      We have modified the text as suggested.

      (3) Page 5, lanes 21-24: kidney hypoplasia is also evident on Fig S2H (stated in the figure legend). It could be also mentioned in this sentence and it implies "...that NR2F2 function is required for testicular and kidney development."

      We have modified the text as suggested.

      (4) Page 5, lanes 28-30. In addition to the reduction in the number of HSD3B-positive cells, HSD3B staining seems clearly more faint in mutant FLC (Fig 2M) compared to adrenal cells on the same section or FLC in control gonads. This fits well with other results on the level of steroidogenic enzymes (Fig 2O) and those presented thereafter (Fig S4 I-J and Fig 5). Perhaps the author could mention this fact.

      We have modified the text as suggested in the results section “NR2F2 is required for FLC maturation” (Page 8).

      (5) Page 5, lanes 31-34: testicular descent is hugely sensible to INSL3 in the mouse (by contrast with other species where androgens seem to be more critical). I was wondering if you can check a better phenotypic marker for the absence (or reduction) of androgens like the differentiation of epididymides by HE staining or the anogenital distance at birth.

      We have measured the anogenital distance at P0 and P1 as suggested and have included the corresponding graph in Fig. S3P

      (6) Page 8, lanes 21-22: "HSD3B positive FLC were smaller and more elongated". It is clear on Fig 5F but not evident on Fig 5D. Could the authors propose another image?

      We have modified Figure 5 as suggested and provide now another example of HSD3B positive FLCs in a Nr5a1Cre; Nr2f2<sup>flox/flox</sup> mutant gonad (Fig. 5D) and the corresponding control littermate (Fig. 5C).

      (7) Page 14, lane 12: "(arrow in I)" should be "(arrow in H)"

      We have modified the text as suggested. Please note that ACTA 2 expression is now shown in Figure S2 G-H.

      (8) Page 15, lane 6: "Arrows indicate NR5A1 positive FLC". There is no arrow on Fig4 C,D; but a kind of scale bar on the enlargement shown in C.

      We have modified Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302):

      “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].”

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Line 303-313):

      “In our simulation experiments, we assume the coexistence of the Pavlovian fear system and the instrumental system to demonstrate the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone, with higher punishment sensitivity, therefore we do not argue for the necessity for the Pavlovian fear system here. Instead, the Pavlovian fear system itself could be a potential biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies; the neural substrates for the Pavlovian fear system are well known (e.g., the limbic loop and amygdala, further see Supplementary Fig. 16). Additionally, Pavlovian fear system provides a separate punishment memory that cannot be erased by greater rewards like [Elfwing and Seymour, 2017, Wang et al., 2018]. This fundamental point can be observed in our simple T-maze simulations, where the Pavlovian fear system encourages avoidance behaviour and the agent chooses the smaller reward instead of the greater reward.”

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302 onwards) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      Thanks to the reviewer’s comments, we have now mentioned this point in Lines 299-302.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We hope our additions to the Discussion section, from Line 290 to Line 313 address the reviewer’s concerns.  

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We have now added a line discussing this. (Line 356-358)

      “Future work could also use a formal account of uncertainty which could fit the fear-conditioned skin-conductance response better than Pearce-Hall associability [Tzovara et al., 2018].”

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      Thank you, we have added further explanations in the discussion section. We have further improved the writing in abstract, introduction and Methods section taking into account recommendations from reviewer #2 and #3.

      Reviewer #2 (Recommendations for the authors): 

      (1) Why is there no flexible omega in Figures 3B and 3C? Did I miss this? 

      Thank you. We have now added additional text to explain our motivation in Experiment 2, which only varies the fixed omega and omits the flexible omega (Lines 136-140).

      “In this set of results, we wish to qualitatively tease apart the role of a Pavlovian bias in shaping and sculpting the instrumental value and also provide more insight into the resulting safety-efficiency trade-off. Having shown the benefits of a flexible ω in the previous section, here we only vary the fixed ω to illustrate the effect of a constant bias and are not concerned with the flexible bias in this experiment.”

      We encourage the reader to consider this akin to an additional study that will explain how Pavlovian bias to withdraw can play a role in avoiding punishments similar to that of punishment sensitivity. This is particularly important as we do have neural correlates for Pavlovian biases but lack a clear neural correlation for punishment sensitivity so far, as mentioned in our new additions to the Discussion section (Lines 303-313).

      (2) The introduction of the flexible omega and the PAL agent in the results is a bit sudden. Some more details are needed to understand this during the first read of this passage. 

      We thank reviewer #2 for bringing this to our notice. We have attempted to refine our passage by including sentences like - 

      “The standard (rational) reinforcement learning system is modelled as the instrumental learning system. The additional Pavlovian fear system biases the withdrawal actions to aid in safe exploration, in line with our hypothesis.”

      “Both systems learn using a basic temporal difference updating rule (or in instances, its special case, the Rescorla-Wagner rule)”

      “We implement the flexible ω using Pearce-Hall associability (see equation 15 in Methods). The Pearce-Hall associability maintains a running average of absolute temporal difference errors (δ) as per equation 14. This acts as a crude but easy-to-compute metric for outcome uncertainty which gates the influence of the Pavlovian fear system, in line with our hypothesis. This implies that higher the outcome uncertainty, as is the case in early exploration, the more cautious our agent will be, resulting in safer exploration”

      (3) In my view, the possibility of modeling moving predators is extremely interesting. I would include Figure 8D and the corresponding explanation in the main text. 

      Response with revision: We thank the reviewer for finding our simulation on moving predators extremely interesting. Unfortunately, since our instrumental system is not model-based, and especially is not explicitly modelling the predator dynamics, our simulation might not be a very accurate representation of real moving predator environments. As pointed out by Reviewer #1, perhaps several other systems other than Pavlovian fear responses are necessary for safe behaviour in such environments and we hope to address these in future studies. Thanks again for taking an interest in our simulations.

      (4) The VR experiment should be mentioned more clearly in the abstract and the introduction. It should be mentioned a bit more clearly why VR was helpful and why the authors did not use a simple bird's eye grid world task. 

      I cannot assess the RLDDM and I did not check the code. 

      Thank you, we have now mentioned the VR experiment more clearly in the abstract and the introduction. We also now further mention that the VR experiment “builds upon previous Go-No Go studies studying Pavlovian-Instrumental transfer (Guitart-Masip et al, 2012; Cavanagh et al, 2013). The virtual-reality approach confers a greater ecological validity and the immersive nature may contribute better fear conditioning, making it easier to distinguish the aversive components.”

      A bird’s eye grid world may not invoke a strong withdrawal response, as seen in these immersive approach-withdrawal tasks where we can clearly distinguish a Pavlovian fear-based withdrawal response. We did include immersive VR maze results in the supplementary materials, but future work is needed to isolate the different systems at play in such a complex behaviour.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      Thank you, we have now attempted to clarify these points in the Discussion section by adding the following text (Lines 313-321):

      “ We next discuss the plausibility of pre-training to select the hardwired actions In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesised to function as a Pavlovian fear/threat learning system [Menegas et al., 2018].”

      Reviewer #3 (Recommendations for the authors): 

      I have relatively little to suggest, as in my view the paper is robust, thorough, and creative, and does enough to support the primary argument being made at the most fundamental level. My suggestions for improvement are as follows: 

      (1) Some aspects of the model are potentially unrealistic (as described in the public review), and the paper may benefit from some discussion of these issues or attempts to make the model more realistic - i.e., to what extent is this plausible in explaining more complex avoidance behaviour? Primarily, the fact that pre-training is required to identify actions subject to Pavlovian bias seems unlikely to be effective in real-world situations - is there a better way to achieve this in cases where there isn't necessarily an instinctual Pavlovian response? 

      Thank you, we agree that the advantage of Pavlovian bias is restricted to the bias/instinctual Pavlovian response conferred by evolution. Future work is needed to model more complex avoidance behaviour such as escapes. We hope to have made this more clear with our edits to the Discussion (Lines 299-302) in our response to Reviewer #1’s comments, specifically:

      “The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020]”  

      (2) The description of the model in the method can be a little hard to follow and would benefit from further explanation of certain parameters. In general, it would be good to ensure that all terms mentioned in equations are described clearly in the text (for example, in Equation1 it isn't clear what k refers to). 

      Thank you, we have now added further information on all of the parameters in Equation 1 and overall improved the Methods section writing, for instance using time subscript for less confusion while introducing the parameters. We use the standard notation used in Sutton and Barto textbook. k refers to the timesteps into the future, and is now explained better in the Methods section.

      (3) Another point of clarification in Equation 1 - does the policy account for the Pavlovian influence or is this purely instrumental? 

      Thank you, Equation 1 is purely instrumental. We have now specifically mentioned this. The Pavlovian influence follows later. They are combined into propensities for action as per equations 11-13.

      (4) I was curious whether similar outcomes could be achieved by more complex instrumental models without the need for Pavlovian influences. For example, could different risk-sensitive decision rules (e.g., conditional value at risk) that rely only on the instrumental system afford safe behaviour without the need for an additional Pavlovian system? 

      Thank you for your comment. Yes, CVaR can achieve safe exploration/cautious behaviour in choices similar to Pavlovian avoidance learning. But we think both differ in the following ways:

      (1) CVaR provides the correct solution to the wrong problem (objective that only maximises the lower tail of the distribution of outcomes)

      (2) Pavlovian bias provides the wrong solution to the right problem (normative objective, but a Pavlovian bias which may be vestige of evolution)

      Here we use the “wrong problem, wrong solution, wrong environment” categorisation terminology from Huys et al. 2015.

      Huys, Q. J., Guitart-Masip, M., Dolan, R. J., & Dayan, P. (2015). Decision-theoretic psychiatry. Clinical Psychological Science, 3(3), 400-421.

      Secondly, we find an effect of Pavlovian bias on reaction times - slowing down of approach responses and faster withdrawal responses. We do not think this can be best explained in a CVaR type model and is a direction for future work. We think such model-based methods are slower to compute, but Pavlovian withdrawal bias is quicker response.

      We have now included this in brief in Lines 280-288.

      (5) Figure 5 would benefit from a clearer caption as it is not necessarily clear from the current one that the left panels refer to choices and the right panels to reaction times. 

      Thank you, we have improved the caption for Fig. 5.

      (6) It would be good to include some indication of the quality of the model fits for the human behavioural study (i.e., diagnostics such as R-hat) to ensure that differences in model fit between models are not due to convergence issues with different models. This would be especially helpful for the RLDDM models as these can be difficult to fit successfully.

      Thank you, we observed that all Rhat values were strictly less than 1.05 (most parameters were less than 1.01 and generally close to 1), indicating that the models converged. We have now added this line to the results (Line 246-248). Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302): “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].” In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      As suggested reviewer 1, we tested whether overexpression of Mad could rescue or mitigate the loss of sakura phenotypic defects, by using nos-Gal4-VP16 > UASp-Mad-GFP in the background of sakura<sup>null</sup>. As shown in Fig S11, we did not observe any mitigation of defects.

      Then, we also tested whether expressing a constitutive active form of Tkv, by using UAS-Dcr2, NGT-Gal4 > UASp-tkv.Q235D in the background of sakura<sup>RNAi</sup>. As shown in Fig S12, we did not observe any mitigation of defects by this approach either.

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      We examined Bam and DE-Cadherin expression in later RNAi knockdowns driven by ToskGal4. As shown in Fig S9, Bam was not expressed in these later knockdowns compared with controls. DE-Cadherin staining suggested a disorganized structure in late-stage egg chambers.

      We agree that we overstated a role of Sakura in regulating Orb in the initial manuscript. We changed the text to avoid overstating.

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

      We changed the text to avoid overstating a role of Sakura in regulating Orb localization.

      Based on our newly added results showing that transgenic overexpression of Mad could not rescue or mitigate the phenotypic defects of sakura<sup>null</sup> mutant (Fig S11), we do not think the reason for less pMad is less translation of Mad.

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General Comments:

      (1) The gene nomenclature: As mentioned in the text, Sakura means cherry blossom and is one of the national flowers of Japan. I am not sure whether the phenotype of the CG14545 mutant is related to Sakura or not. I would like to suggest the authors reconsider the naming.

      The striking phenotype of sakura mutant­ is tumorous and germless ovarioles. The tumorous phenotype, exhibiting lots of round fusome in germarium visualized by anti-Hts staining, looks like cherry blossom blooming to us. Also, the germless phenotype reminds us falling of the cherry blossom, especially considering that the ratio of tumorous phenotype decreases and that of germless decreases over fly age. Furthermore, “Sakura” symbolizes birth and renewal in Japanese culture (the last author of this manuscript is Japanese). Our findings indicated that the gene sakura is involved in regulation of renewal and differentiation of GSCs (which leads to birth). These are the reasons for the naming, which we would like to keep.

      (2) In many of the microscopic photographs in the figures, especially for the merged confocal images, the resolution looks low, and the images appear blurred, making it difficult to judge the authors' claims. Also, the Alpha Fold structure in Figure 10A requires higher contrast images. The magnification of the images is often inadequate (e.g. Figures 3A, 3B, 5E, 7A, etc). The authors should take high-magnification images separately for the germarium and several different stages of the egg chambers and lay out the figures.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      Specific Comments

      (1) How Sakura can cooperate with Otu remains unanswered. Sakura does not regulate deubiquitinase activity in vitro. Both sakura and otu appear to be involved in the Dpp-Smad signaling pathway and in the spatial control of Bam expression in the germarium, whereas Otu has been reported to act in concert with Bam to deubiquitinate and stabilize Cyc A for proper cystoblast differentiation. Therefore, it is plausible that the stabilization of Cyc A in the Sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. The authors may need to provide much deeper insight into the mechanism by which Sakura plays roles in these seemingly separable steps to orchestrate germ cell maintenance and differentiation during early oogenesis.

      Yes, it is possible that the stabilization of CycA in the sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. To test the significance and role of the Sakura-Otu interaction, we have attempted to identify Sakura point mutants that lose interaction with Otu. If such point mutants were successfully obtained, we were planning to test if their transgene expression could rescue the phenotypes of sakura mutant as the wild-type transgene did. However, after designing and testing the interaction of over 30 point mutants with Otu, we could not obtain such mutant version of Sakura yet. We will continue making efforts, but it is beyond the scope of the current study. We hope to address this important point in future studies.

      (2) Figure 3A and Figure 4: The authors show that piRNA production is abolished in Sakura KO ovaries. It is known that piRNA amplification (the ping-pong cycle) occurs in the Vasa-positive perinuclear nuage in nurse cells. Is the nuage normally formed in the absence of Sakura? The authors provide high-magnification images in the germarium expressing Vas-GFP. How does Sakura, and possibly Out, contribute to piRNA production? Are the defects a direct or indirect consequence of the loss of Sakura?

      We provided higher magnification images of germarium expressing Vasa-EGFP in sakura mutant background (Fig 3A and 3B). The nuage formation does not seem to be dysregulated in sakura mutant. Currently, we do not know if the piRNA defects are direct or indirect consequence of the loss of Sakura. This question cannot be answered easily. We hope to address this in future studies.

      (3) Figure 7 and Figure 12: The authors showed that Dpp-Smad signaling was abolished in Sakura KO germline cells. The same defects were also observed in otu mutant ovaries (Figure 12B). How does the Sakura-Otu axis contribute to the Dpp-Smad pathway in the germline?

      As we mentioned in the response to comment (1), we attempted to test the significance and role of the Sakura-Otu interaction, including in the Dpp-Smad pathway in the germline, but we have not yet been able to obtain loss-of-interaction mutant(s) of Sakura. We hope to address this in future studies.

      (4) Figure 9 and Fig 10: The authors raised antibodies against both Sakura and Otu, but their specificities were not provided. For Western blot data, the authors should provide whole gel images as source data files. Also, the authors argue that the Otu band they observed corresponds to the 98-kDa isoform (lines 302-304). The molecular weight on the Western blot alone would be insufficient to support this argument.

      When we submitted the initial manuscript, we also submitted original, uncropped, and unmodified whole Western blot images for all gel images to the eLife journal, as requested. We did the same for this revised submission. I believe eLife makes all those files available for downloading to readers.

      In the newly added Fig S13B, we used very young 2-5 hours ovaries and 3-7 days ovaries. 2-5 days ovaries contain only mostly pre-differentiated germ cells. Older ovaries (3-7 days in our case here) contain all 14 stages of oogenesis and later stages predominate in whole ovary lysates.

      As reported in previous literature (Sass et al. 1995), we detected a higher abundance of the 104 kDa Otu isoform than the 98 kDa isoform in from 2-5 hours ovaries and predominantly the 98 kDa isoform in 3-7 days ovaries (Fig S13B). These results confirmed that the major Otu isoform we detected in Western blot, all of which uses old ovaries except for the 2-5 hours ovaries in Fig S13B, is the 98 kDa isoform.

      (5) Otu has been reported to regulate ovo and Sxl in the female germline. Is Sakura involved in their regulation?

      We examined sxl alternative splicing pattern in sakura mutant ovaries. As shown in Fig S6, we detected the male-specific isoform of sxl RNA and a reduced level of the female-specific sxl isoform in sakura mutant ovaries. Thus Sakura seems to be involved in sxl splicing in the female germline, while further studies will be needed to understand whether Sakura has a direct or indirect role here.

      (6) Lines 443-447: The GSC loss phenotype in piwi mutant ovaries is thought to occur in a somatic cell-autonomous manner: both piwi-mutant germline clones and germline-specific piwi knockdown do not show the GSC-loss phenotype. In contrast, the authors provide compelling evidence that Sakura functions in the germline. Therefore, the Piwi-mediated GSC maintenance pathway is likely to be independent of the Sakura-Otu axis.

      We changed the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a cleanly written manuscript, with some sentences/sections that are confusing the way they are constructed (i.e. Line 37-38, 334, section on Flp/FRT experiments).

      We rewrote those sections to avoid confusion.

      Comment for all merged image data: the quality of the merged images is very poor - the individual channels are better but should also be reprocessed for more resolved image data sets. Also, it would be helpful to have boundaries drawn in an individual panel to identify the regions of the germarium, as cartooned in Figure S1A (which should be brought into Figure 1) F-actin or Vsg staining would have helped throughout the manuscript to enhance the visualization of described phenotypes.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      We outlined the germarium in Fig 1E.

      We brought the former FigS1 into Fig 1A.

      We provided Phalloidin (F-Actin) staining images in Fig S7.

      All p-values seem off. I recommend running the data through the student t-test again.

      We used the student t-test to calculate p-values and confirmed that they are correct. We don’t understand why the reviewer thinks all p-values seem off.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      Figure 1

      (1) Within the text, C is mentioned before A.

      We updated the text and now we mentioned Fig 1A before Fig 1C.

      (2) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      (3) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      (4) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakura<sup>null</sup> phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      (5) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      (6) Line 95 "as shown below" is not clear that it's referencing panel D.

      We now referenced Fig 1D.

      (7) Re: Figures 1 E and F. There is no mention of Hts or Vasa proteins in the text.<br /> "Sakura-EGFP was not expressed in somatic cells such as terminal filament, cap cells, escort cells, or follicle cells (Figure 1E). In the egg chamber, Sakura-EGFP was detected in the cytoplasm of nurse cells and was enriched in developing oocytes (Figure 1F)". Outline these areas or label these structures/sites in the images. The color of Merge labels is confusing as the blue is not easily seen.

      We mentioned Hts and Vasa in the text. We labeled the structures/sites in the images and updated the color labeling.

      Figure 2

      (1) Entire figure is not essential to be a main figure, but rather supplemental.

      We don’t agree with the reviewer. We think that the female fertility assay data, where sakura null mutant exhibits strikingly strong phenotype, which was completely rescued by our Sakura-EGFP transgene, is very important data and we would like to present them in a main figure.

      (2) 2A- one star (*) significance does not seem correct for the presented values between 0 and 100+.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      (3) 2C images are extremely low quality. Should be presented as bigger panels.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images. We also presented as bigger panels.

      Figure 3

      (1) "We observed that some sakura<sup>null</sup> /null ovarioles were devoid of germ cells ("germless"), while others retained germ cells (Fig 3A)" What is described is, that it is hard to see. Must have a zoomed-in panel.

      We provided zoomed-in panels in Fig 3B

      (2) C - The control doesn't seem to match. Must zoom in.

      We provided matched control and also zoomed in.

      (3) For clarity, separate the tumorous and germless images.

      In the new image, only one tumorous and one germless ovarioles are shown with clear labeling and outline, for clarity.

      (4) Use arrows to help clearly indicate the changes that occur. As they are presented, they are difficult to see.

      We updated all the panels to enhance clarity.

      (5) Line 158 seems like a strong statement since it could be indirect.

      We softened the statement.

      Figure 4

      (1) Line 188-189 - Conclusion is an overstatement.

      We softened the statement.

      (2) Is the piRNA reduction due to a change in transcription? Or a direct effect by Sakura?

      We do not know the answers to these questions. We hope to address these in future studies.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer’s point. We think using numbers, not %, makes more sense.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      (3) In Line 218 there's an extra parenthesis after the PGC acronym.

      We corrected the error.

      (4) TOsk-Gal4 fly is not in the Methods section.

      We mentioned TOsk-Gal4 in the Methods.

      Figure 6:

      (1) The FLP-FRT section must be rewritten.

      We rewrote the FLP-FRT section.

      (2) A - include statistics.

      We included statistics using the chi-square test.

      (3) B - is not recalled in the Results text.

      We referred Fig 6B in the text.

      (4) Line 232 references Figure 3, but not a specific panel.

      We referred Fig 3A, 3C, 3D, and 3E, in the text.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      (1) There should be CycA expression in the control during the first 4 divisions.

      Yes, there is CycA expression observed in the control during the first 4 divisions, while it’s much weaker than in sakura<sup>null</sup> clone.

      (2) Helpful to add the dotted lines to delineate (A) as well.

      We added a dotted outline for germarium in Fig 7A.

      (3) Line 263 CycA is miswritten as CyA.

      We corrected the typo.

      Figure 9

      (1) Otu antibody control?

      We validated Otu antibody in newly added Fig 10C and Fig S13A.

      (2) Which Sakura-EGFP line was used? sakura het. or null background? This isn't mentioned in the text, nor legend.

      We used Sakura-EGFP in the background of sakura[+/+]. We added this information in the methods and figure legend.

      (3) C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti-Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      Figure 10

      (1) A- The resolution of images of the ribbon protein structure is poor.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      (2) A table summarizing the interactions between domains would help bring clarity to the data presented.

      We added a table summarizing the fragment interaction results.

      (3) Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer’s points. In our study, even for the full-length proteins.

      We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      Figure 12

      (1) A - control and RNAi lines do not match.

      We provided matched images.

      (2) In general, since for Sakura, only its binding to Otu was identified and since they phenocopy each other, doesn't most of the characterization of Sakura just look at Otu phenotypes? Does Sakura knockdown affect Otu localization or expression level (and vice versa)?

      We tested this by Western (Fig S15) and IF (Fig 12). Sakura knockdown did not decrease Otu protein level, and Otu knockdown did not decrease Sakura protein level (Fig S15). In sakura<sup>null</sup> clone, Otu level was not notably affected (Fig 12). In sakura<sup>null</sup> clone, Otu lost its localization to the posterior position within egg chambers.

      Figure S6

      (1) It is Luciferase, not Lucifarase.

      We corrected the typo.

      Reviewer #3 (Recommendations for the authors):

      (1) It is interesting that germless and tumorous phenotypes coexist in the same population of flies. Additional consideration of these essentially opposite phenotypes would significantly strengthen the study. For example, do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age? The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype? Is transposon expression involved in either phenotype? Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole? Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes? What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts? It may not be necessary to answer all of these questions, but more insight into how these two phenotypes can be caused by loss of sakura would be helpful.

      We performed new experiments to answer these questions.

      do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age?

      Tumorous and germless ovarioles coexist in the same fly (in the same ovary). Tumorous ovarioles are present in very young (0-1 day old) flies, including newly eclosed (Fig S5). The ratio of germless ovarioles increases and that of tumorous ovarioles decreases with age (Fig S5).

      The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype?

      bam knockdown effect on tumorous phenotype is shown in Fig S10. bam knockdown increased the ratio of tumorous ovarioles and the number of GSC-like cells.

      Is transposon expression involved in either phenotype?

      Since our transposon-piRNA reporter uses germline-specific nos promoter, it is expressed only in germ line cells, so we cannot examine in germless ovarioles.

      Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole?

      Yes, Sakura mutant GSC clones overgrow. Please compare Fig 6C and Fig S8.

      Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes?

      Fig S10 and Fig S12 show the ovariole phenotypes of sakura RNAi driven by NGT-Gal4. It causes both germless and tumorous phenotypes.

      What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts?

      Our mosaic clone was induced at the adult stage, so we already have data of adulthood-specific loss of function. Gal80ts does not work well with nos-Gal4.

      (2) The idea that the excessive bam expression in tumorous ovaries is due to a failure of bam repression by dpp signaling is not well-supported by the data. Dpp signaling is activated in a very narrow region immediately adjacent to the niche but the images in Figure 7A show bam expression in cells that are very far away from the niche. Thus, it seems more likely to be due to a failure to turn bam expression off at the 16-cell stage than to a failure to keep it off in the niche region. To determine whether bam repression in the niche region is impaired, it would be important to examine cells adjacent to the niche directly at a higher magnification than is shown in Figure 7A.

      We provided higher magnification images of cells adjacent to the niche in new Fig 7A.

      We found that cells adjacent to the niche also express Bam-GFP.

      That said, we agree with the reviewer. A failure to turn bam expression off at the 16-cell stage may be an additional or even a main cause of bam misexpression in sakura mutant. We added this in the Discussion.

      (3) In addition, several minor comments should be addressed:

      a. Does anti-Sakura work for immunofluorescence?

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies.

      b. Please provide insets to show the phenotypes indicated by the different color stars in Figure 3C more clearly.

      We provided new, higher-magnification images to show the phenotypes more clearly.

      c. Please indicate the frequency of the expression patterns shown in Figure 4D (do all ovarioles in each genotype show those patterns or is there variable penetrance?).

      We indicated the frequency.

      d. An image showing TOskGal4 driving a fluorophore should be provided so that readers can see which cells express Gal4 with this driver combination.

      It has been already done in the paper ElMaghraby et al, GENETICS, 2022, 220(1), iyab179, so we did not repeat the same experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mallimadugula et al. combined Molecular Dynamics (MD) simulations, thiol-labeling experiments, and RNA-binding assays to study and compare the RNA-binding behavior of the Interferon Inhibitory Domain (IID) from Viral Protein 35 (VP35) of Zaire ebolavirus, Reston ebolavirus, and Marburg marburgvirus. Although the structures and sequences of these viruses are similar, the authors suggest that differences in RNA binding stem from variations in their intrinsic dynamics, particularly the opening of a cryptic pocket. More precisely, the dynamics of this pocket may influence whether the IID binds to RNA blunt ends or the RNA backbone.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Strengths:

      The use of extensive Adaptive Sampling combined with biochemical assays clearly points to the opening of the Interferon Inhibitory Domain (IID) as a factor for RNA binding. This type of approach is especially useful to assess how protein dynamics can affect its function.

      Weaknesses:

      Although a connection between the cryptic pocket dynamics and RNA binding mode is proposed, the precise molecular mechanism linking pocket opening to RNA binding still remains unclear.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether a cryptic pocket in the VP35 protein of Zaire ebolavirus has a functional role in RNA binding and, by extension, in immune evasion. They sought to address whether this pocket could be an effective therapeutic target resistant to evolutionary evasion by studying its role in dsRNA binding among different filovirus VP35 homologs. Through simulations and experiments, they demonstrated that cryptic pocket dynamics modulate the RNA binding modes, directly influencing how VP35 variants block RIG-I and MDA5-mediated immune responses.

      The authors successfully achieved their aim, showing that the cryptic pocket is not a random structural feature but rather an allosteric regulator of dsRNA binding. Their results not only explain functional differences in VP35 homologs despite their structural similarity but also suggest that targeting this cryptic pocket may offer a viable strategy for drug development with reduced risk of resistance.

      This work represents a significant advance in the field of viral immunoevasion and therapeutic targeting of traditionally "undruggable" protein features. By demonstrating the functional relevance of cryptic pockets, the study challenges long-standing assumptions and provides a compelling basis for exploring new drug discovery strategies targeting these previously overlooked regions.

      Strengths:

      The combination of molecular simulations and experimental approaches is a major strength, enabling the authors to connect structural dynamics with functional outcomes. The use of homologous VP35 proteins from different filoviruses strengthens the study's generality, and the incorporation of point mutations adds mechanistic depth. Furthermore, the ability to reconcile functional differences that could not be explained by crystal structures alone highlights the utility of dynamic studies in uncovering hidden allosteric features.

      Weaknesses:

      While the methodology is robust, certain limitations should be acknowledged. For example, the study would benefit from a more detailed quantitative analysis of how specific mutations impact RNA binding and cryptic pocket dynamics, as this could provide greater mechanistic insight. This study would also benefit from providing a clear rationale for the selection of the amber03 force field and considering the inclusion of volume-based approaches for pocket analysis. Such revisions will strengthen the robustness and impact of the study.

      Reviewer #3 (Public review):

      Summary:

      The authors suggest a mechanism that explains the preference of viral protein 35 (VP35) homologs to bind the backbone of double-stranded RNA versus blunt ends. These preferences have a biological impact in terms of the ability of different viruses to escape the immune response of the host.

      The proposed mechanism involves the existence of a cryptic pocket, where VP35 binds the blunt ends of dsRNA when the cryptic pocket is closed and preferentially binds the RNA double-stranded backbone when the pocket is open.

      The authors performed MD simulation results, thiol labelling experiments, fluorescence polarization assays, as well as point mutations to support their hypothesis.

      Strengths:

      This is a genuinely interesting scientific question, which is approached through multiple complementary experiments as well as extensive MD simulations. Moreover, structural biology studies focused on RNA-protein interactions are particularly rare, highlighting the importance of further research in this area.

      Weaknesses:

      - Sequence similarity between Ebola-Zaire (94% similarity) explains their similar behaviour in simulations and experimental assays. Marburg instead is a more distant homolog (~80% similarity relative to Ebola/Zaire). This difference is sequence and structure can explain the propensities, without the need to involve the existence of a cryptic pocket.  

      - No real evidence for the presence of a cryptic pocket is presented, but rather a distance probability distribution between two residues obtained from extensive MD simulations. It would be interesting to characterise the modelled RNA-protein interface in more detail

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Before assessing the overall quality and significance of this work, this reviewer needs to specify the context of this review. This reviewer's expertise lies in biased and unbiased molecular dynamics simulations and structural biology. Hence, while this reviewer can overall understand the results for thiol-labeling and RNA-binding assays, this review will not assess the quality of these biochemical assays and will mainly focus on the modelling results.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Beyond the clear qualities of this work, I would like to mention a few points that may help to better contextualize and rationalize the results presented here.

      - First, both the introduction and discussion sections seem relatively condensed. Extending them to, for example, better describe the methodological context and discuss the methodological limitations and potential future developments related to biased simulations may help the reader get a better idea of the significance of this work.

      - The authors presented 3 homologs in this study: IIDs of Reston, Zaire, and Marburg viruses. While Zaire and Reston are relatively similar in terms of sequence (Figure S1). The sequences clearly differ between Marburg and the two other viruses. Can the author indicate a similarity/identity score for each sequence alignment and extend Figure S1 to really compare Marburg sequence with Reston and Zaire? Can they also discuss how these differences may impact the comparison of the three IIDs? This may also help the reader to understand why sometimes the authors compare the three viruses and why sometimes they are focusing only on comparing Zaire and Reston.

      We would like to thank the reviewer for raising this point and we agree that additional details about the sequence comparison provide more context for the choices of substitutions we made. Therefore, we have updated Fig S1 to include a detailed pairwise comparison of all the IID sequences including the percentage sequence similarity and identity. We have also added the following sentences to the results section where we first introduced the substitutions between Zaire and Reston IIDs

      “While the sequence of Marburg IID differs significantly from Reston and Zaire IIDs with a sequence identity of 42% and 45% respectively (Fig S1), the sequences of Reston and Zaire IID are 88% identical and 94% similar. Particularly, substitutions between these homologs are all distal to the RNA-binding interfaces and all the residues known to make contacts with dsRNA from structural studies are identical. Therefore, we reasoned that comparing these two homologs would help us identify minimal substitutions that control pocket opening probability and allow us to study its effect on dsRNA binding with minimal perturbation of other factors.”

      - In this work, the authors mentioned the cryptic pocket but only illustrated the opening of this pocket by using a simple distance between residues (Figure 2) and a SASA of one cysteine (Figure 3). In previous work done by the authors (Cruz et al. , Nature Communications, 2022), they better characterized residues involved in RNA binding and forming the cryptic pocket. Thus, would it be possible to better described this cryptic pocket (residues involved, volume, etc ..) and better explain how, structurally speaking, it can affect RNA binding mode (blunt ends vs backbone) ?

      We thank the reviewer for pointing out the need for clarification on the residues involved in RNA binding and pocket opening and the mechanism linking them. We have performed the CARDS analysis on Reston and Marburg IID simulations as we had done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section.

      - As a counter-example, the authors used C315 for SASA calculation and thiol labeling (Figure 3). This cysteine is mainly buried as seen by SASA for Reston and Marburg and thiol labelling (Figure 3 E,G,H). Would it be possible to also get thiol labeling rates for Cystein 264 in Reston and its equivalent to see a case where the residue is solvent exposed?

      We have shown the SASA for C264 from the simulations in Fig S4 and the thiol labeling rates for all 4 cysteines in Reston IID in Fig S6. Comparing these rates to the rates of all 4 cysteines obtained for Zaire IID (Fig 4 in Cruz et Al, 2022), we observe that the rates for C264, which is expected to be exposed are significantly faster than those of C315 which is largely buried in all variants.  

      - I strongly support here the will of the authors to share their data by depositing them in an OSF repository. These data help this reviewer to assess some of the results produced by the authors and help to better understand the dynamics of their respective systems. I have just a few comments that need to be addressed regarding these data: o While there are data for WT Reston and Marburg, there is no data for Zaire. Is this because these data correspond to the previous work (Cruz et al. 2022) (in this case, it would be good to make this clear in the main text) or is it an omission? o There is no center.xtc file in the Marburg-MSM directory o There is no protmasses.pdb in the Reston-MSM directory

      - In general, if possible, it would be good to use the same name for each type of file presented in each directory to help a potential user understand a bit more how to use these data.

      - If possible, adding a bit more of metadata and explanations on the OSF webpage would be very beneficial to help find these data. To help in this direction, the authors may have a look to the guidelines presented at the end of this article: https://elifesciences.org/articles/90061

      We thank the reviewer for pointing out the omissions from the OSF repository. We have added the missing files and followed a uniform naming convention. We have also added documentation in the metadata section of the OSF repository to help others use the data.  

      Indeed, the simulation data used for Zaire IID is available on the OSF repository corresponding to Cruz et al. 2022 at https://osf.io/5pg2a. We have also clarified this in the data availability section of the main text.  

      Minor point:

      In Figure 2, there is a slight bump for the 225-295 distance around 1 nm for Reston. Can the author comment it ? As these results are based on long AS, even if very small, do the authors think this population is significant?

      Comparing the probability distributions obtained from bootstrapping the frames used to calculate the MSM equilibrium probabilities (Revised Fig1), we observe that the bump for the Reston IID distribution is persistent in all bootstraps indicating that it might indeed be significant. This is also consistent with our observation that the cysteine 296 does get fully labeled in our thiol labeling experiments, albeit significantly slowly compared to the other homologs.  

      Reviewer #2 (Recommendations for the authors):

      I recommend that the authors implement moderate revisions prior to the publication of this research article, addressing the identified weaknesses (see below).

      The authors should provide a rationale for their selection of the amber03 force field (Duan et al., JCTC 24, 1999-2012, 2003) for molecular dynamics simulations, particularly given the availability of more recent and optimized versions of the AMBER force fields. These newer force fields may offer improved parameterization for biomolecular systems, potentially enhancing the accuracy and reliability of the simulation results.

      We chose the Amber03 force field because it has performed well in much of our past work, including the original prediction of the cryptic pocket that we study in this manuscript. The results presented in this manuscript also demonstrate the predictive power of Amber03.

      Additionally, while the authors utilized solvent-accessible surface area (SASA) for cryptic pocket analysis, volume-based approaches may be more suitable for this purpose. Several studies (e.g., Sztain et al. J. Chem. Inf. Model. 2021, 61, 7, 3495-3501) have demonstrated the utility of volume analysis in identifying and characterizing cryptic pockets. The authors could consider incorporating such methodologies to provide a more comprehensive assessment of pocket dynamics.

      The authors propose that the cryptic pocket is not merely a random structural feature but functions as an allosteric regulator of dsRNA binding. To further substantiate this claim, an in-depth analysis of this allosteric effect using for instance network analysis could significantly enhance the study. Such an approach could identify key residues and interaction networks within the protein that mediate the allosteric regulation. This type of mechanistic insight would not only provide a stronger theoretical framework but also offer valuable information for the rational design of therapeutic interventions targeting the cryptic pocket.  

      We thank the reviewer for pointing out the need for clarification on the molecular mechanism linking the opening of the cryptic pocket to RNA binding. We have performed the CARDS analysis on Reston and Marburg IID simulations as was done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section. Briefly, we do find a community (blue) comprising the pocket residues in Reston and Marburg IIDs as we did in Zaire. Similarly, we find that many of the RNA binding residues fall into the orange and green communities as in Zaire. However, there are differences in exactly which residues are clustered into which of these two communities. There are also differences in how strongly connected these communities are in the three homologs. Therefore, while we can conclude that pocket residues likely have varying influence on the RNA binding residues in the homologs, it is hard to say exactly what that variation is from this analysis alone.  

      Reviewer #3 (Recommendations for the authors):

      - MD simulations: All simulations were initialised from the 3 crystal structures, is it correct? In all cases, RNA ds was not included in simulations, right? Were crystallographic MG ions in the vicinity of the binding site included? these are known to influence structural dynamics to a large extent.

      All simulations were indeed initialized using only protein atoms from the crystal structures 3FKE, 4GHL, and 3L2A. Therefore, crystallographic Mg ions were not included in the simulations. However, we do agree with the reviewer and think that the effect of parameters such as salt concentration, specifically Mg ions which are known to be important for the stability of dsRNA, on the pocket opening equilibrium merits detailed study in future work.

      - Figure 2: Would it be possible to perform e.g. a block error analysis and show the statistical errors of the distributions?

      We agree that showing the statistical variation in the MSM equilibrium probabilities is important for comparing the different distributions. Therefore, we have updated Figs 2 and 5 to show the distributions obtained from MSMs constructed using 100 and 10 random samples of the data respectively to indicate the extent of the statistical variability in the MSM construction.  

      - More detailed structural biology experiments (such as NMR or HDX-MS) could potentially shed more light on the differential behaviour of the three different homologs, providing more evidence for the presence of the cryptic pocket.

      We agree that NMR and HDX-MS are powerful means to study dynamics and are actively exploring these approaches for our future work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Tie et.al., the authors couple the methodology which they have developed to measure LQ (localization quotient) of proteins within the Golgi apparatus along with RUSH based cargo release to quantify the speed of different cargos traveling through Golgi stacks in nocodazole induced Golgi ministacks to differentiate between cisternal progression vs stable compartment model of the Golgi apparatus. The debate between cisternal progression model and stable compartment model has been intense and going on for decades and important to understand the basic way of function/organization of the Golgi apparatus. As per the stable compartment model, cisterna are stable structures and cargo moves along the Golgi apparatus in vesicular carriers. While as per cisternal progression model, Golgi cisterna themselves mature acquiring new identity from the cis face to the trans face and act as transport carriers themselves. In this work, authors provide a missing part regarding intra-Golgi speed for transport of different cargoes as well as the speed of TGN exit and based on the differences in the transport velocities for different cargoes tested favor a stable compartment model. The argument which authors make is that if there is cisternal progression, all the cargoes should have a similar intra-Golgi transport speed which is essentially the rate at which the Golgi cisterna mature. Furthermore, using a combination of BFA and Nocodazole treatments authors show that the compartments remain stable in cells for at least 30-60 minutes after BFA treatment.

      Strengths:

      The method to accurately measure localization of a protein within the Golgi stack is rigorously tested in the previous publications from the same authors and in combination with pulse chase approaches has been used to quantify transport velocities of cargoes through the Golgi. This is a novel aspect in this paper and differences in intra-Golgi velocities for different cargoes tested makes a case for a stable compartment model.

      Weaknesses:

      Experiments are only tested in one cell line (HeLa cells) and predominantly derived from experimental paradigm using RUSH assays where a secretory cargo is released in a wave (not the most physiological condition) and therefore additional approaches would make a more compelling case for the model.

      We have added datasets from 293T cells in the revamped manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the use of quantitative imaging approaches, which have been a key element of the labs work over the past years, to address one of the major unresolved discussions in trafficking: intra-Golgi transport. The approach used has been clearly described in the labs previous papers, and is thus clearly described. The authors clearly address the weaknesses in this manuscript and do not overstate the conclusions drawn from the data. The only weakness not addressed is the concept of blocking COPI transport with BFA, which is a strong inhibitor and causes general disruption of the system. This is an interesting element of the paper, which I think could be improved upon by using more specific COPI inhibitors instead, although I understand that this is not necessarily straightforward.

      I commend the authors on their clear and precise presentation of this body of work, incorporating mathematical modelling with a fundamental question in cell biology. In all, I think that this is a very robust body of work, that provides a sound conclusion in support of the stable compartment model for the Golgi.

      General points:

      The manuscript contains a lot of background in its results sections, and the authors may wish to consider rebalancing the text: The section beginning at Line 175 is about 90% background and 10% data. Could some data currently in supplementary be included here to redress this balance, or this part combined with another?

      In the revamped manuscript, we have moved the background information on rapid partitioning and rim progression models to the Introduction.

      Reviewer #3 (Public Review):

      The manuscript by Tie et al. provides a quantitative assessment of intra-Golgi transport of diverse cargos. Quantitative approaches using fluorescence microscopy of RUSH synchronized cargos, namely GLIM and measurement of Golgi residence time, previously developed by the author's team (publications from 20216 to 2022), are being used here.

      Most of the results have been already published by the same team in 2016, 2017, 2020 and 2021. In this manuscript, very few new data have been added. The authors have put together measurements of intra-Golgi transport kinetics and Golgi residence time of many cargos. The quantitative results are supported by a large number of Golgi mini-stacks/cells analyzed. They are discussed with regard to the intra-Golgi transport models being debated in the field, namely the cisternal maturation/progression model and the stable compartments model. However, over the past decades, the cisternal progression model has been mostly accepted thanks to many experimental data.

      The authors show that different cargos have distinct intra-Golgi transport kinetics and that the Golgi residence time of glycosyltransferases is high. From this and the experiment using brefeldinA, the authors suggest that the rim progression model, adapted from the stable compartments model, fits with their experimental data.

      Strengths:

      The major strength of this manuscript is to put together many quantitative results that the authors previously obtained and to discuss them to give food for thought about the intraGolgi transport mechanism.

      The analysis by fluorescence microscopy of intra-Golgi transport is tough and is a tour de force of the authors even if their approach show limitations, which are clearly stated. Their work is remarkable in regards to the numbers of Golgi markers and secretory cargos which have been analyzed.

      Weaknesses:

      As previously mentioned, most of the data provided here were already published and thus accessible for the community. Is there is a need to publish them again?

      The authors' discussion about the intra-Golgi transport model is rather simplistic. In the introduction, there is no mention of the most recent models, namely the rapid partitioning and the rim progression models. To my opinion, the tubular connections between cisternae and the diffusion/biochemical properties of cargos are not enough taken into account to interpret the results. Indeed, tubular connections and biochemical properties of the cargos may affect their transit through the Golgi and the kinetics with which they reach the TGN for Golgi exit.

      Nocodazole is being used to form Golgi mini-stacks, which are necessary to allow intra-Golgi measurement. The use of nocodazole might affect cellular homeostasis but this is clearly stated by the authors and is acceptable as we need to perturb the system to conduct this analysis. However, the manual selection of the Golgi mini-stack being analyzed raises a major concern. As far as I understood, the authors select the mini-stacks where the cargo and the Golgi reference markers are clearly detectable and separated, which might introduce a bias in the analysis.

      The terms 'Golgi residence time ' is being used but it corresponds to the residence time in the trans-cisterna only as the cargo has been accumulated in the trans-Golgi thanks to a 20{degree sign}C block. The kinetics of disappearance of the protein of interest is then monitored after 20{degree sign}C to 37{degree sign}C switch.

      Another concern also lies in the differences that would be introduced by different expression levels of the cargo on the kinetics of their intra-Golgi transport and of their packaging into post-Golgi carriers.

      Please see below for our replies to intra-Golgi transport models, the Golgi residence time, and different expression levels of cargos.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The data shown by the authors to measure differential intra Golgi velocities based on previously established methodology make a case for a stable compartment model, however more data is needed to make a complete story and the clarity of presentation can be improved.

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Main points:

      (1) Along with the studies in yeast, which authors describe in this paper, the main evidence for cisternal maturation model in mammalian cells comes from Bonfanti et.al., (https://doi.org/10.1016/S0092-8674(00)81723-7), which used EM to visualize a wave of Collagen through Golgi stacks. It is therefore important this work needs to include collagen as one of the cargos tested. Can the authors use the RUSH-Col1AGFP (see: https://doi.org/10.1083/jcb.202005166) as a cargo to monitor intra-Golgi velocities?

      I understand that Hela cells are not professional collagen-secreting, but the authors can use U2OS cells to measure collagen export and two other extreme (slow and fast) cargos to validate the same trend in intra-Golgi transport velocities is seen in other cell lines. This will address three concerns: a. This is not a Hela-specific phenomenon; b. Transport of large cargoes like collagen agree with their proposal; c. To see if the same cargo has the same (similar) intra-Golgi speed and the trend between different cargoes is conserved across cell lines.

      Due to the difficulty of manipulating and imaging the procollagen-I RUSH reporter, we selected the collagenX-RUSH reporter (SBP-GFP-collagenX) instead. Our previous study (Tie et al., eLife, 2028) demonstrated that SBP-GFP-collagenX assembles as a large molecular weight particle, each having ~ 190 copies of SBP-GFP-collagenX. With an estimated mean size of ~ 40 nm, these aggregates are not as large as FM4 aggregates and procollagen-I (> 300 nm) and, therefore, are not excluded from conventional transport vesicles, which typically have a size of 50 – 100 nm. However, collagenX has distinct intra-Golgi transport behaviour from conventional secretory cargos -- while conventional secretory cargos localize to the cisternal interior, collagenX partitions to the cisternal rim (Tie et al., eLife, 2028).

      We studied the intra-Golgi transport of SBP-GFP-collagenX in HeLa cells via GLIM and side averaging. The new results are included in Figure 3 of the revamped manuscript. CollagenX has similar intra-Golgi transport kinetics as conventional secretory cargos, displaying the first-order exponential function in LQ vs. time and velocity vs. time plots.

      The side-averaging images are consistent with previous and current results. collagenX displays a double-punctum during the intra-Golgi transport, indicating a cisternal rim localization, as expected for large secretory cargos. Therefore, our new data demonstrated that cisternal rim partitioned large-size secretory cargos might follow intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      We tried SBP-GFP-CD59 and SBP-GFP-Tac-TC, cargos with fast and slow intra-Golgi transport velocities, respectively, in 293T cells. Results are included in Figure 2, Supplementary Figure 2, and Table 1 of the revamped manuscript. We found that SBP-GFPTac-TC showed similar t<sub>intra</sub>s, 17 and 14 min, respectively, in HeLa and 293T cells. Considering our previous finding that glycosylation has an essential role in the Golgi exit (Sun et al., JBC, 2020), the distinct intra-Golgi transport kinetics of SBP-GFP-CD59 (t<sub>intra</sub>s, 13 and 5 min, respectively, in HeLa and 293T cells) might be due to its distinct luminal glycosylation between HeLa and 293T cells. Supporting this hypothesis, SBP-GFP-Tac-TC does not have any glycosylation sites due to the truncation of the Tac luminal domain.

      (2) RUSH assay has its own caveats which authors also refer to in the manuscript. Authors should test their model by using pulse chase approaches by SNAP tagged constructs which will allow them to do pulse chase assays without the requirement to release cargo as a wave (see: doi: 10.1242/jcs.231373). It is not necessary to test all the cargoes but the two on the ends of the spectrum (slow and fast). To avoid massive overexpression, authors could express the proteins using weaker promoters. Authors could also use this approach to simultaneously measure the two cargoes by tagging them with CLIP and SNAP tags and doing the pulse chase simultaneously (see: DOI: 10.1083/jcb.202206132). In this case it may be difficult to stain both GM130 and TGN, but authors could monitor the rate of segregation from the GM130 signal.

      During the RUSH assay, the sudden release of a large amount of secretory reporters does not occur under native secretory conditions and, consequently, might introduce artifacts. The reviewer suggests using pulse-chase labeling of SNAP (or CLIP)-tagged secretory cargos, which occurs in a steady state and hence more closely resembles native secretory transport. This is an excellent suggestion. However, we have not yet tested this method due to the following concerns.

      The standard protocol involves blocking existing reporters, pulse-labeling newly synthesized reporters, and chasing their movement along the secretory pathway. However, the typical 20minute pulse labeling period used in the two references would be too long, as a substantial portion of the reporters would already reach the trans-Golgi or exit the Golgi before the chase begins. Conversely, reducing the pulse labeling time would significantly weaken the GLIM signal.

      (3) While the intra-Golgi velocities are different for different cargoes tested, authors should show a control that the arrival of the cargoes from ER to the cis-Golgi follows similar kinetics or if there are differences there is no correlation with the intra-Golgi velocities. In other words, do cargoes which show slow intra-Golgi velocities also take more time to reach the cis-Golgi and vice versa.

      In nocodazole-induced Golgi ministacks, the ER exit site, ERGIC, and cis-Golgi are spatially closely associated. At the earliest measurable time point—5 minutes after biotin treatment— we observed that the secretory cargo had already reached the cis-Golgi (Figure 2 and Supplementary Figure 2). The rapid ER-to-cis-Golgi transport exceeds the temporal resolution of our current protocol, making it difficult to address the reviewer’s question (see our reply to Minor Points (2) of Reviewer #2 for more detailed discussion on this).

      (4) Were the different cargos traveling (at different speeds) through Golgi at the rims, or in the middle of ministack, or by vesicles?

      Please also refer to our reply to Question 1 of Reviewer #1. For the nocodazole-induced Golgi ministack, we previously investigated the lateral cisternal localization of RUSH secretory reporters using our en face average imaging (Tie et al., eLife, 2018). We found that small or conventional cargos (such as CD59 and E-cadherin) partition to the cisternal interior while large cargos (collagenX and FM4-CD8a) partition to the cisternal rim during their intra-Golgi transport. Using GLIM, we showed that the intra-Golgi transport kinetics of collagenX is similar to that of small cargos as both follow the first-order exponential function (Figure 3A-C). Therefore, cisternal rim partitioned large size secretory cargos might have intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      (5) Figure 4, under both nocodazole and BFA treatment for 30mins, would the stacks have the same number (274 nm per LQ) as thickness? Or does it shrink a little? Considering extended BFA treatment reduced intact Golgi ministacks. This is important to understand the LQ numbers of those Golgi proteins. Besides, can they include one ERGIC marker in this assay, would it be approaching cis-Golgi? Images used for quantification in Figure 4 should be shown in the main figure.

      We define the axial size of the Golgi ministack as the axial distance from the GM130 to the GalT-mCherry, d<sub>(GM130-GalT-mCherry)</sub>, measured using the Gaussian centers of their line intensity profiles. As the reviewer suggested, we measured the axial size of the ministack during the nocodazole and BFA treatment. Indeed, we found a decrease in the ministack axial size from 300 ± 10 nm at 0 min to 190 ± 30 nm at 30 min of BFA treatment. This observation is further confirmed by our side average imaging. The new data is presented in Fig. 6G.

      Our study focuses on changes in the organization of the Golgi ministack. So, we didn’t include ERGIC53 in the current analysis. Instead, we quantified the axial distance between GalTmCherry and CD8a-furin, d<sub>(GalT-mCherry-CD8a-furin)</sub>, and found that it decreased from 200 ± 20 nm at 0 min to 100 ± 30 nm at 30 min of BFA treatment, suggesting the collapse of the TGN. The collapse of the TGN is further visualized by our side average imaging. The new data is presented in Fig. 6H.

      Therefore, our new data demonstrates that the Golgi ministack shrinks, and the TGN collapses under BFA treatment.

      Minor points:

      (1) The LQ data come from confocal/airy scan images, but no such images were shown in this paper. The authors can't assume every reader to have prior knowledge of their previous work. It will be beneficial to have one example image and how the LQ was measured.

      As advised by the reviewer, we have prepared Supplementary Figure 1 to provide a brief illustration of the principle behind GLIM and image processing steps involved.

      (2) The cargos used in this paper need to be introduced: what are they, how were they used in previous literature. Especially the furin constructs come out of the blue (also see point 7).

      As suggested by the reviewer, we have included a schematic diagram in Fig. 1 of the revised manuscript to illustrate all RUSH reporters and their corresponding ER hooks. In this diagram, we also highlight the key sequence differences in the cytosolic tails of different furin mutants.

      Additionally, we have added references for each RUSH reporter at the beginning of the Results and Discussion section.

      (3) There are two categories of exocytosis, constitutive and regulated. It important to state that the phenomenon observed is in cells predominantly showing only constitutive secretion.

      As the reviewer advised, we have added the following sentences in the section titled “Limitations of the study”.

      “Third, all RUSH reporters used in this study are constitutive secretory cargos. As a result, the intra-Golgi transport dynamics observed here might not reflect those of regulated secretion, which involves the synchronized release of a large quantity of cargo in response to a specific signal.”

      (4) All the cargoes show a progressive reduction in instantaneous velocities from cis to medial to trans. Authors should discuss how do they mechanistically explain this. Is the rate of vesicle production progressively decreasing from cis to trans and if so, why?

      As our imaging methods cannot differentiate vesicles from the cisternal rim, we could not tell if the vesicle production rate had changed during the intra-Golgi transport. We have provided an explanation of the progressive reduction of the intra-Golgi transport velocity in the Results and Discussion section. Please see the text below.

      “The progressive reduction in intra-Golgi transport of secretory cargo might result from the enzyme matrix's retention at the trans-Golgi. As the secretory cargos progress along the Golgi stack from the cis to the trans-side, more and more cargos become temporarily retained in the trans-Golgi region, gradually reducing their overall intra-Golgi transport velocity. If the release or Golgi exit of these cargos from the enzyme matrix follows a constant probability per unit time, i.e., a first-order kinetics process, the rate of cargo exiting from the Golgi should follow the first-order exponential function. Since the mechanism underlying intra-Golgi transport kinetics reflects fundamental molecular and cellular processes of the Golgi, further experimental data are essential to rigorously test this hypothesis.”

      (5) The supp file 1 nicely listed the raw data for plotting, and n for numbers of ministacks. Could the authors also show number of cells or experiment repeats?

      In the revamped version of the Supplementary File 1, we have added the cell number for each LQ measurement.

      (6) This recent work used novel multiplexing methods to show that nocodazole-treated cells had similar protein organization as in control may be cited. It also showed the effect of BFA. https://www.cell.com/cell/abstract/S0092-8674(24)00236-8.

      We have added this reference to the Introduction section to support that nocodazole-induced Golgi ministacks have a similar organization as the native Golgi. However, our BFA treatment was combined with the nocodazole treatment, while this paper’s BFA treatment does not contain nocodazole.

      (7) Figure 1G-J, authors should show a schematic to show the difference between different furin constructs. Also, LQ values in Fig 1I start from 1. Authors may need to include even earlier timepoints.

      As suggested by the reviewer, we have shown the domain organization of wild type and mutant furin RUSH reporters in Figure 1, highlighting key amino acids in the cytosolic tail. Please also see our reply to Minor Points (2) of Reviewer #1.

      In the revised manuscript, Fig. 1l (SBP-GFP-CD8a-furin-AC #1) has been updated to become Fig. 2J. In this dataset, the first time point was selected at a relatively late stage (20 min), resulting in an initial LQ value of 0.92. However, this should not pose an issue, as SBP-GFPCD8a-furin-AC reaches a plateau of ~ 1.6. The number of data points is sufficient to capture the rising phase and fit the first-order exponential function curve with an adjusted R<sup>2</sup> = 0.99. Furthermore, we have four independent datasets in total on the intra-Golgi transport of SBPGFP-CD8a-furin-AC (#1-4), demonstrating the consistency of our measurements.

      (8) Figure 2A need to show the data points, not just the lines.

      In the revamped manuscript, Fig. 2A has been updated to become Fig. 4A. The plot of Fig. 4A is calculated based on Equation 3.

      So, it does not have data points. However, t<sub>intra</sub> is calculated based on the experimental LQ vs. t kinetic data. 

      (9) Imaging and camera settings like exposure time, pixel size, etc should be reported in Methods.

      As suggested by the reviewer, we have supplied this information in the Materials and Methods section of the revised manuscript.

      (1) The exposure time and pixel size for the wide-field microscopy:

      “The image pixel size is 65 nm. The range of exposure time is 400 – 5000 ms for each channel.”

      (2) The exposure time and pixel size for the spinning disk confocal microscopy: “The image pixel size is 89 nm. The range of exposure time is 200 – 500 ms for each channel.”

      (3) The pixel dwelling time and pixel size for the Airyscan microscopy:

      “For side averaging, images were acquired under 63× objective (NA 1.40), zoomed in 3.5× to achieve 45 nm pixel size using the SR mode. The pixel dwelling time is 1.16 µs.”

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Minor points:

      (1) Equation 2: A should be in front of the ln2. It's already resolved in equation 3, so likely only needs changing in the text

      As suggested by the reviewer, we have changed it accordingly.

      (2) Line 152: Why is there a lack of experimental data? High ER background and low golgi signal make it difficult to select ministacks: would be good to see examples of these images. Is 0 a relevant timepoint as cargo is still at the ER? Instead would a timepoint <5' be better demonstrate initial arrival in fast cargo, and 0' discarded?

      We observed that RUSH reporters typically do not exit the ER in < 5 min of biotin treatment, resulting in a high ER background and low Golgi signal. Example images of SBP-GFP-CD59 are shown below (scale bar: 10 µm). Possible reasons include: 1) the time required for biotin diffusion into the ER, 2) the time needed to displace the RUSH hook from the RUSH reporter, and 3) the time for recruitment of RUSH reporters to ER exit sites. As a result, we could not obtain LQs for time points earlier than 5 min during the biotin chase.

      Author response image 1.

      Despite the challenge in measuring LQs at early time points, 0 is still a relevant time point. At t = 0 min, RUSH reporters should be at the ER membrane near the ER exit site, a definitive pre-Golgi location along the Golgi axis, although we still don’t have a good method to determine its LQ.

      (3) Table 1 Line 474: 1-3 independent replicates: is there a better way of incorporating this into the table to make it more streamlined? It would be useful to see each cargo as a mean with error. Is there a more demonstrative way to present the table, for example (but does not have to be) fastest cargo first (Tintra) as in Table 2?

      As suggested by the reviewer, we revised Table 1. We calculated the mean and SD of t<sub>intra</sub> and arranged our RUSH reporters in ascending order based on their t<sub>intra</sub> values.

      (4) Line 264 / Fig 3B: It's unclear to me why the VHH-anti-GFP-mCherry internalisation approach was used, when the cells were expressing GFP, that could be used for imaging. Also, this introduces a question over trafficking of the VHH itself, to access the same compartments as the GFP-proteins are localised. It would be useful to describe the choice of this approach briefly in the text.

      Here, the surface-labeling approach is used to investigate if GFP-Tac-TC possesses a Golgi retrieval pathway after its exocytosis to the plasma membrane. When VHH-anti-GFP-mCherry is added to the tissue culture medium, it binds to the cell surface-exposed GFP-fused MGAT1, MGAT2, Tac, Tac-TC, CD8a, and CD8a-TC. Next, VHH-anti-GFP-mCherry traces the internalized GFP-fused transmembrane proteins. The surface-labeling approach has two advantages in this case. 1) It is much more sensitive in revealing the minor number of GFPtransmembrane proteins at the plasma membrane and endosomes, which are usually drowned in the strong Golgi and ER background fluorescence in the GFP channel. 2) While the GFP fluorescence distribution has reached a dynamic equilibrium, the surface labeling approach can reveal the endocytic trafficking route and dynamics.

      As the reviewer suggested, we added the following sentence to describe the choice of the cellsurface labeling – “By binding to the cell surface-exposed GFP, VHH-anti-GFP-mCherry serves as a sensitive probe to track the endocytic trafficking itinerary of the above GFP-fused transmembrane proteins”. 

      Regarding the trafficking of VHH-anti-GFP-mCherry itself, in HeLa cells that do not express GFP-fused transmembrane proteins, VHH-anti-GFP-mCherry can be internalized by fluidphase endocytosis. However, the fluid-phase endocytosis is negligible under our experimental condition, as we previously demonstrated (Sun et al., JCS, 2021; PMID: 34533190).

      (5) 446 Typo "internalization"

      It has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      Below are my recommendations for the authors to improve their manuscript:

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      (1) Line 48: Tie at al. 2016 is cited. Please add references to original work showing that cargos transit from cis to trans Golgi cisternae.

      After reviewing the literature, we identified two references that provide some of the earliest morphological evidence of secretory cargo transit from the cis- to the trans-Golgi:

      (1) Castle et al, JCB, 1972; PMID: 5025103

      (2) Bergmann and Singer, JCB, 1983; PMID: 6315743

      The first study utilized pulse-chase autoradiographic EM imaging to track secretory protein movement, while the second employed immuno-EM imaging to observe the synchronized release of VSVGtsO45. Accordingly, we have removed Tie et al., 2016 and replaced it with these newly identified references.

      (2) I would suggest to cite earlier (in the Introduction) the rapid partitioning and rim progression models.

      As suggested, we have moved the rapid partitioning and rim progression models to the Introduction section.

      (3) Figure 1: LQ vs. time plot for SBP-GFP-CD8a-furinAC (panel I, 0.9 to 1.75 in 150 min) is different from Fig 7G of Tie et al. 2016 (LQ O-1.5 in 100 min). Please comment on why those 2 sets of data are different.

      We appreciate the reviewer for pointing out this error. In our previous publication (Tie et al., MBoC, 2016), we presented a total of four datasets on SBP-GFP-CD8a-furin-AC. However, in the earlier version of our manuscript, we mistakenly listed only three datasets, inadvertently omitting Fig. 7G from Tie et al., MBoC, 2016.

      In the revised version, we have now included Fig. S2T (SBP-GFP-CD8a-furin-AC #4), which corresponds to Fig. 7G from Tie et al., MBoC, 2016.

      (4) As mentioned in the public review, I think measurement of the expression level of the cargos is necessary to compare their transport kinetics.

      The reviewer raises a valid concern that is challenging to address. All our data were obtained by imaging overexpressed reporters, and we assume that their overexpression does not significantly impact the Golgi or the secretory pathway. Our previous studies have demonstrated that overexpression does not substantially affect LQs (Figure S2 of Tie et al., MBoC, 2016, and Figure S1 of Tie et al., JCB, 2022).

      We acknowledge this concern as one of the limitations in our study at the end of our manuscript:

      “First, our approach relied on the overexpression of fluorescence protein-tagged cargos. The synchronized release of a large amount of cargo could significantly saturate and skew the intra-Golgi transport.” 

      (5) To my opinion, cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only retrograde transport. Please comment on how this would affect intra-Golgi transport kinetics.

      We believe the reviewer is suggesting “cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only anterograde transport.”

      Transient cisternal continuities have been reported to facilitate the anterograde transport of large quantities of secretory cargos (Beznoussenko et al., 2014; PMID: 24867214) (Marsh et al., 2004; PMID: 15064406) (Trucco et al., 2004; PMID: 15502824). However, we are not aware of any reports demonstrating that such continuities facilitate the retrograde transport of secretory cargo, although Trucco et al. (2004) speculated that Golgi enzymes might use these connections to diffuse bidirectionally (anterograde and retrograde direction). For this reason, we did not discuss this scenario in our manuscript.

      (6) Lines 188-190: I don't understand why the rapid partitioning model is excluded. Please detail more the arguments used for this statement.

      Below is the section from the Introduction that addresses the reviewer's question.

      “This model (rapid partitioning model) suggests that cargos rapidly diffuse throughout the Golgi stack, segregating into multiple post-translational processing and export domains, where cargos are packed into carriers bound for the plasma membrane. Nonetheless, synchronized traffic waves have been observed through various techniques, including EM (Trucco et al., 2004) and advanced light microscopy methods we developed, such as GLIM and side-averaging(Tie et al., 2016; Tie et al., 2022). These findings suggest that the rapid partitioning model might not accurately represent the true nature of the intra-Golgi transport.”

      (7) I would suggest replacing the 'Golgi residence time' by another name as it reflects mainly the time of Golgi exit if I am not mistaken.

      We believe the term “Golgi residence time” more accurately reflects the underlying mechanism – retention. The same approach to measure the Golgi residence time can also be applied to Golgi enzymes such as ST6GAL1. Its slow Golgi exit kinetics (t<sub>1/2</sub> = 5.3 hours) (Sun et al., JCS, 2021) should be primarily due to a strong Golgi retention at its steady state Golgi localization.

      In contrast, the conventional secretory cargos’ Golgi exit times are usually much shorter (t<sub>1/2</sub> < 20 min) (Table 2) due to weaker Golgi retention. In a broader sense, the Golgi exit kinetics of a secretory cargo should be influenced by its Golgi retention. Furthermore, we have consistently used the term “Golgi residence time” in our previous publications. So, we propose maintaining this terminology in the current manuscript.

      (8) Lines 300-306: I would suggest that the authors remove this part as it is highly speculative and not supported by data.

      We have relocated this discussion to the section titled "Our data supports the rim progression model, a modified version of the stable compartment model."

      Our enzyme matrix hypothesis offers a potential explanation for key observations, including the differential cisternal localization of small and large cargos and the interior localization of Golgi enzymes. Cryo-FIB-ET has shown that the interior of Golgi cisternae is enriched with densely packed Golgi enzymes (Engel et al., PNAS, 2015; PMID: 26311849), supporting this hypothesis.

      Additionally, this hypothesis helps explain the gradual reduction in intra-Golgi transport velocities of secretory cargos, as requested by Reviewer #1 (Minor Points 4). For these reasons, we propose retaining this discussion in the manuscript.

      (9) In Figure 3B, percentage of MGAT2-GFP cells with anti-GFP signal at the Golgi is of 41% while Sun et al. 2021 reported 25%, please comment this difference. Reply:

      We included more cells for the quantification. The percentage of cells showing Golgi localization of VHH-anti-GFP-mCherry is now 32% (n = 266 cells). The observed difference, 32% vs. 25% (Sun et al., JCS, 2021), is likely due to uncontrollable variations in experimental conditions, which might have influenced the endocytic Golgi targeting efficiency.

      (10) The effects of brefeldinA are pleiotropic as it disassembles COPI and clathrin coats but also induces tubulation of endosomes. I would recommend using Golgicide A, which is more specific.

      We agree with the reviewer that Golgicide A might be more specific as an inhibitor of Arf1. We will certainly consider using this inhibitor next time.

    1. Author response:

      Reviewer #1:

      We appreciate the Reviewer's positive feedback on the strengths of our study.

      The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place.

      We thank the Reviewer for this valuable feedback. We agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating hypotheses of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We will contextualize our statements about stable binders and limit our claims to stating that the protein-peptide complex is stable within 1 μs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 72 μs. Additionally, we will include a plot showing the distribution of groove width across all replicas.

      Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses, and we agree that we do not currently have data to comment on binding affinity. We will, therefore, remove all references to this term. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, we plan to use orthogonal methods, namely MM/PB(GB)SA calculations for calculating binding free energies from existing trajectories (as performed by https://doi.org/10.1021/acs.jcim.4c00975). We will add predictions of all the peptides using AlphaFold 3, to confirm the binding region.

      Reviewer #2:

      We thank the Reviewer for their positive feedback.

      Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We will add appropriate computational information in the main text.

      More quantitative analysis in addition to visual structures.

      We will add an uncertainty estimate for the HDX calculations using bootstrapping and include additional information on bond distances for Y161. We will also incorporate time-series data showing the distance of the peptide from the groove across all replicas.

      Reviewer #3:

      We appreciate the Reviewer's positive feedback on our work.

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer's thoughtful comment. As noted in our response to Reviewer 1, we plan to address the concern about sampling by applying orthogonal methods. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Contractile Injection Systems (CIS) are versatile machines that can form pores in membranes or deliver effectors. They can act extra or intracellularly. When intracellular they are positioned to face the exterior of the cell and hence should be anchored to the cell envelope. The authors previously reported the characterization of a CIS in Streptomyces coelicolor, including significant information on the architecture of the apparatus. However, how the tubular structure is attached to the envelope was not investigated. Here they provide a wealth of evidence to demonstrate that a specific gene within the CIS gene cluster, cisA, encodes a membrane protein that anchors the CIS to the envelope. More specifically, they show that:

      - CisA is not required for assembly of the structure but is important for proper contraction and CIS-mediated cell death

      - CisA is associated to the membrane (fluorescence microscopy, cell fractionation) through a transmembrane segment (lacZ-phoA topology fusions in E. coli)

      - Structural prediction of interaction between CisA and a CIS baseplate component<br /> - In addition they provide a high-resolution model structure of the >750-polypeptide Streptomyces CIS in its extended conformation, revealing new details of this fascinating machine, notably in the baseplate and cap complexes.

      All the experiments are well controlled including trans-complemented of all tested phenotypes.

      One important information we miss is the oligomeric state of CisA.

      Thank you for this suggestion. We now provide information on the potential oligomeric state of CisA. We performed further AlphaFold3 modelling of CisA using an increasing number of CisA protomers (1 to 8). We ran predictions for the configuration using the sequence of the well-folded C-terminal CisA domain (amino acids 285-468), which includes the transmembrane domain and the conserved domain that shares similarities to carbohydrate-degrading domains. The obtained confidence scores (mean values for pTM=0.73, ipTM=0.7, n=5) indicate that CisA can assemble into a pentamer and that this oligomerization is mediated through the interaction of the C-terminal solute-binding like superfamily domain.

      We have added this information to the revised manuscript (Fig. 3b/c) and further discuss the possible implications of CisA oligomerization for its proposed mode of action.

      While it would have been great to test the interaction between CisA and Cis11, to perform cryo-electron microscopy assays of detergent-extracted CIS structures to maintain the interaction with CisA, I believe that the toxicity of CisA upon overexpression or upon expression in E. coli render these studies difficult and will require a significant amount of time and optimization to be performed. It is worth mentioning that this study is of significant novelty in the CIS field because, except for Type VI secretion systems, very few membrane proteins or complexes responsible for CIS attachment have been identified and studied.

      We thank this reviewer for their highly supportive and positive comments on our manuscript and we are grateful for their recognition of the novelty of our study, particularly in the context of membrane proteins and complexes involved in CIS attachment.

      We agree that further experimental evidence on direct interaction between CisA and Cis11 would have strengthened our model on CisA function. However, as noted by this reviewer, this additional work is technically challenging and currently beyond the scope of this study.

      Reviewer #2 (Public review):

      Summary:

      The overall question that is addressed in this study is how the S. coelicolor contractile injection system (CISSc) works and affects both cell viability and differentiation, which it has been implicated to do in previous work from this group and others. The CISSc system has been enigmatic in the sense that it is free-floating in the cytoplasm in an extended form and is seen in contracted conformation (i.e. after having been triggered) mainly in dead and partially lysed cells, suggesting involvement in some kind of regulated cell death. So, how do the structure and function of the CISSc system compare to those of related CIS from other bacteria, does it interact with the cytoplasmic membrane, how does it do that, and is the membrane interaction involved in the suggested role in stress-induced, regulated cell death? The authors address these questions by investigating the role of a membrane protein, CisA, that is encoded by a gene in the CIS gene cluster in S. coelicolor. Further, they analyse the structure of the assembled CISSc, purified from the cytoplasm of S. coelicolor, using single-particle cryo-electron microscopy.

      Strengths:

      The beautiful visualisation of the CIS system both by cryo-electron tomography of intact bacterial cells and by single-particle electron microscopy of purified CIS assemblies are clearly the strengths of the paper, both in terms of methods and results. Further, the paper provides genetic evidence that the membrane protein CisA is required for the contraction of the CISSc assemblies that are seen in partially lysed or ghost cells of the wild type. The conclusion that CisA is a transmembrane protein and the inferred membrane topology are well supported by experimental data. The cryo-EM data suggest that CisA is not a stable part of the extended form of the CISSc assemblies. These findings raise the question of what CisA does.

      We thank Reviewer #2 for the overall positive evaluation of our manuscript and the constructive criticism.

      Weaknesses:

      The investigations of the role of CisA in function, membrane interaction, and triggering of contraction of CIS assemblies, are important parts of the paper and are highlighted in the title. However, the experimental data provided to answer these questions appear partially incomplete and not as conclusive as one would expect.

      We acknowledge that some aspects of our work remain unanswered. We are currently unable to conduct additional experiments because the two leading postdoctoral researchers on this project have moved on to new positions. We currently don’t have the extra manpower with a similar skill set to pick up the project.

      The stress-induced loss of viability is only monitored with one method: an in vivo assay where cytoplasmic sfGFP signal is compared to FM5-95 membrane stain. Addition of a sublethal level of nisin lead to loss of sfGFP signal in individual hyphae in the WT, but not in the cisA mutant (similarly to what was previously reported for a CIS-negative mutant). Technically, this experiment and the example images that are shown give rise to some concern. Only individual hyphal fragments are shown that do not look like healthy and growing S. coelicolor hyphae. Under the stated growth conditions, S. coelicolor strains would normally have grown as dense hyphal pellets. It is therefore surprising that only these unbranched hyphal fragments are shown in Fig. 4ab.

      We thank this Reviewer for their thoughtful criticism regarding the viability assays and the data presented in Figure 4. We acknowledge the importance of ensuring that the presented images reflect the physiological state of S. coelicolor under the stated growth conditions and recognize that hyphal fragments shown in Figure 4 do not fully capture the typical morphology of S. coelicolor. As pointed out by this reviewer, S. coelicolor grows in large hyphal clumps when cultured in liquid media, making the quantification of fluorescence intensities in hyphae expressing cytoplasmic GFP or stained with the membrane dye FM5-95 particularly challenging. To improve the image analysis and quantification of GFP and FM5-95-fluorescent intensities across the three S. coelicolor strains (wildtype, cisA deletion mutant and the complemented cisA mutant), we vortexed the cell samples before imaging to break up hyphal clumps, increasing hyphal fragments. The hyphae shown in our images were selected as representative examples across three biological replicates.

      Further, S. coelicolor would likely be in a stationary phase when grown 48 h in the rich medium that is stated, giving rise to concern about the physiological state of the hyphae that were used for the viability assay. It would be valuable to know whether actively growing mycelium is affected in the same way by the nisin treatment, and also whether the cell death effect could be detected by other methods.

      The reasoning behind growing S. coelicolor for 48 h before performing the fluorescence-based viability assay was that we (DOI: 10.1038/s41564-023-01341-x ) and others (e.g.: DOI: 10.1038/s41467-023-37087-7 ) previously showed that the levels of CIS particles peak at the transition from vegetative to reproductive/stationary growth, thus indicating that CIS activity is highest during this growth stage. The obtained results in this manuscript are consistent with previous results, in which we showed a similar effect on the viability of wildtype versus cis-deficient S. coelicolor strains (DOI: 10.1038/s41564-023-01341-x ) using nisin, the protonophore CCCP and UV radiation. The results presented in this study and our previous study are based on biological triplicate experiments and appropriate controls. Furthermore, our results are in agreement with the findings reported in a complementary study by Vladimirov et al. (DOI: 10.1038/s41467-023-37087-7 ) that used a different approach (SYTO9/PI staining of hyphal pellets) to demonstrate that CIS-deficient mutants exhibit decreased hyphal death.

      Taken together, we believe that the results obtained from our fluorescence-based viability assay provide strong experimental evidence that functional CIS mediate hyphal cell death in response to exogenous stress.

      The model presented in Fig. 5 suggests that stress leads to a CisA-dependent attachment of CIS assemblies to the cytoplasmic membrane, and then triggering of contraction, leading to cell death. This model makes testable predictions that have not been challenged experimentally. Given that sublethal doses of nisin seem to trigger cell death, there appear to be possibilities to monitor whether activation of the system (via CisA?) indeed leads to at least temporally increased interaction of CIS with the membrane.

      We thank this reviewer for their suggestions on how to test our model further. This is a challenging experiment because we do not know the exact dynamics of how nisin stress is perceived and transmitted to CisA and CIS particles.

      In an attempt to address this point, we have performed co-immunoprecipitation experiments using S. coelicolor cells that produced CisA-FLAG as bait, and which were treated with a sub-lethal nisin concentration for 0/15/45 min.  Mass spectrometry analysis of co-eluted peptides did not show the presence of CIS-associated peptides at the analyzed timepoints. While we cannot exclude the possibility that our experimental assay requires further optimization to successfully demonstrate a CisA-CIS interaction (e.g. optimization of the use of detergents to improve the solubilization of CisA from Streptomyces membrane, which is currently not an established method), an alternative and equally valid hypothesis is that the interaction between CIS particles and CisA is transient and therefore difficult to capture. We would like to mention, however, that we did detect CisA peptides in crude purifications of CIS particles from nisin-stressed cells (Supplementary Table 2, manuscript: line 301/302), supporting our proposed model that CisA can associate with CIS particles in vivo.

      Further, would not the model predict that stress leads to an increased number of contracted CIS assemblies in the cytoplasm? No clear difference in length of the isolated assemblies if Fig. S7 is seen between untreated and nisin-exposed cells, and also no difference between assemblies from WT and cisA mutant hyphae.

      The reviewer is correct that there is no clear difference in length in the isolated CIS particles shown in Figure S7. This is in line with our results, which show that CisA is not required for the correct assembly of CIS particles and their ability to contract in the presence and absence of nisin treatment. The purpose of Figure S7 was to support this statement. We would like to note that the particles shown in Figure S7 were purified from cell lysates using a crude sheath preparation protocol, during which CIS particles generally contract irrespective of the presence or absence of CisA. Thus, we cannot comment on whether there is an increased number of contracted CIS assemblies in the cytoplasm of nisin-exposed cells. To answer this point, we would need to acquire additional cryo-electron tomograms (cyroET) of the different strains treated with nisin. CryoET is an extremely time and labor-intensive task and given that we currently don’t know the exact dynamics of the CIS-CisA interaction following exogenous stress, we believe this experiment is beyond the scope of this work.

      The interaction of CisA with the CIS assembly is critical for the model but is only supported by Alphafold modelling, predicting interaction between cytoplasmic parts of CisA and Cis11 protein in the baseplate wedge. An experimental demonstration of this interaction would have strengthened the conclusions.

      We agree that direct experimental evidence of this interaction would have further strengthened the conclusions of our study, and we have extensively tried to provide additional experimental evidence. Unfortunately, because of the toxicity of cisA expression in E. coli and the possibly transient nature of the interaction under the experimental conditions used, we were unable to confirm this interaction by biochemical or biophysical techniques, such as co-purification or bacterial two-hybrid assays. Despite these technical challenges, we believe that the AlphaFold predictions provided a valuable hypothesis about the role of CisA in firing and the function of CIS particles in S. coelicolor.

      The cisA mutant showed a similarly accelerated sporulation as was previously reported for CIS-negative strains, which supports the conclusion that CisA is required for function of CISSc. But the results do not add any new insights into how CIS/CisA affects the progression of the developmental life cycle and whether this effect has anything to do with the regulated cell death that is caused by CIS. The same applies to the effect on secondary metabolite production, with no further mechanistic insights added, except reporting similar effects of CIS and CisA inactivations.

      Thank you for your feedback on this aspect of the manuscript. We would like to note that the main focus of this study was to provide further insight into how CIS contraction and firing are mediated in Streptomyces. We used the analysis of accelerated sporulation and secondary metabolite production as a readout to directly assess the functionality of CIS in the presence or absence of CisA and to complement the in situ cryoET data. In summary, our data significantly expand our knowledge of CIS function and firing in Streptomyces and suggest a model in which CisA plays an essential role in mediating the interaction of CIS particles with the membrane, which is required for CIS-mediated cell death. We discuss this model in more detail in the revised manuscript (Line 274-283).

      We agree that we still don’t fully understand the full nature of the signals that trigger CIS contraction, but we do know that the production of CIS is an integral part of the Streptomyces multicellular life cycle as demonstrated by two independent previous studies by us and others (DOI: 10.1038/s41564-023-01341-x and DOI: 10.1038/s41467-023-37087-7 ).

      We further speculate that the assembly and CisA-dependent firing of Streptomyces CIS particles could present a molecular mechanism to dismantle part of the vegetative mycelium. This form of “regulated cell death” could provide two key benefits: (1) to prevent the spread of local cellular damage to the rest of mycelium and (2) to provide additional nutrients for the rest of the mycelium to delay the terminal differentiation into spores, which in turn also affects the production of secondary metabolites.

      Concluding remarks:

      The work will be of interest to anyone interested in contractile injection systems, T6SS, or similar machineries, as well for people working on the biology of streptomycetes. There is also a potential impact of the work in the understanding of how such molecular machineries could have been co-opted during evolution to become a mechanism for regulated cell death. However, this latter aspect remains still poorly understood. Even though this paper adds excellent new structural insights and identifies a putative membrane anchor, it remains elusive how the Streptomyces CIS may lead to cell death. It is also unclear what the advantage would be to trigger death of hyphal compartments in response to stress, as well as how such cell death may impact (or accelerate) the developmental progression. Finally, it is inescapable to wonder whether the Streptomyces CIS could have any role in protection against phage infection.

      We thank Reviewer #2 for the overall supportive assessment of our work. We will briefly discuss functional CIS's impact on Streptomyces development in the revised manuscript. We previously tested if Streptomyces could defend against phages but have not found any experimental evidence to support this idea (unpublished data). The analysis of phage defense mechanisms is an underdeveloped area in Streptomyces research, partly due to the currently limited availability of a diverse phage panel.

      Reviewer #3 (Public review):

      Summary:

      In this work, Casu et al. have reported the characterization of a previously uncharacterized membrane protein CisA encoded in a non-canonical contractile injection system of Streptomyces coelicolor, CISSc, which is a cytosolic CISs significantly distinct from both intracellular membrane-anchored T6SSs and extracellular CISs. The authors have presented the first high-resolution structure of extended CISSc structure. It revealed important structural insights in this conformational state. To further explore how CISSc interacted with cytoplasmic membrane, they further set out to investigate CisA that was previously hypothesized to be the membrane adaptor. However, the structure revealed that it was not associated with CISSc. Using fluorescence microscope and cell fractionation assay, the authors verified that CisA is indeed a membrane-associated protein. They further determined experimentally that CisA had a cytosolic N-terminal domain and a periplasmic C-terminus. The functional analysis of cisA mutant revealed that it is not required for CISSc assembly but is essential for the contraction, as a result, the deletion significantly affects CISSc-mediated cell death upon stress, timely differentiation, as well as secondary metabolite production. Although the work did not resolve the mechanistic detail how CisA interacts with CISSc structure, it provides solid data and a strong foundation for future investigation toward understanding the mechanism of CISSc contraction, and potentially, the relation between the membrane association of CISSc, the sheath contraction and the cell death.

      Strengths:

      The paper is well-structured, and the conclusion of the study is supported by solid data and careful data interpretation was presented. The authors provided strong evidence on (1) the high-resolution structure of extended CISSc determined by cryo-EM, and the subsequent comparison with known eCIS structures, which sheds light on both its similarity and different features from other subtypes of eCISs in detail; (2) the topological features of CisA using fluorescence microscopic analysis, cell fractionation and PhoA-LacZα reporter assays, (3) functions of CisA in CISSc-mediated cell death and secondary metabolite production, likely via the regulation of sheath contraction.

      Weaknesses:

      (1) The data presented are not sufficient to provide mechanistic details of CisA-mediated CISSc contraction, as authors are not able to experimentally demonstrate the direct interaction between CisA with baseplate complex of CISSc (hypothesized to be via Cis11 by structural modeling), since they could not express cisA in E. coli due to its potential toxicity. Therefore, there is a lack of biochemical analysis of direct interaction between CisA and baseplate wedge. In addition, there is no direct evidence showing that CisA is responsible for tethering CISSc to the membrane upon stress, and the spatial and temporal relation between membrane association and contraction remains unclear. Further investigation will be needed to address these questions in future.

      We thank Reviewer #3 for the supportive evaluation and constructive feedback of our study in the non-public review. We appreciate the recognition of the technical limitations of experimentally demonstrating a direct interaction between CisA and CIS baseplate complex, and we agree that further investigations in the future will hopefully provide a full mechanistic understanding of the spatiotemporal interaction of CisA and CIS particular and the subsequent CIS firing.

      To further improve the manuscript, we will revise the text and clarify figures and figure legends as suggested in the non-public review.

      Discussion:

      Overall, the work provides a valuable contribution to our understanding on the structure of a much less understood subtype of CISs, which is unique compared to both membrane-anchored T6SSs and host-membrane targeting eCISs. Importantly, the work serves as a good foundation to further investigate how the sheath contraction works here. The work contributes to expanding our understanding of the diverse CIS superfamilies.

      Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Magnification of the potential CisA-Cis11 model, with side chains at the interface, should be shown in Supplementary Figures 9/10 to help the reader appreciates the intercation between the two subunits.

      Done. A zoomed-in view of the relevant side chains at the CisA-Cis11 interface has been added to Supplementary Figure 9e. For clarity, we decided not to highlight these residues in Supplementary Figure 10 because they are identical to those in Figure 9e.

      - A model where CisA is positionned onto the baseplate (by merging the CisA-Cis11 model and the baseplate structure) will also be informative for the reader.

      We agree that such a presentation would be helpful to visualize the proposed CisA-Cis11 interaction. However, the Cis11 residues predicted to bind CisA are buried in our cryoEM single-particle structure of the elongated Streptomyces CIS. This is not surprising, as the structure is based on a previously established non-contractile CIS mutant variant (PMCID: PMC10066040), which means we were only able to capture one specific configuration of the baseplate complex in the current work. This baseplate configuration is most likely structurally distinct from the baseplate configuration in contracted CIS particles. A similar observation was also reported for the baseplate complex of eCIS particles from Algoriphagus machipongonesis (PMCID: PMC8894135 ).  

      We speculate that in Streptomyces, initial non-specific contacts between CisA and cytoplasmic CIS particles induce a rearrangement of baseplate components, resulting in the exposure of the relevant Cis11 residues, which in turn facilitates a transient interaction between CisA and Cis11. This interaction then leads to additional conformational changes within the baseplate complex, triggering sheath contraction and CIS firing.

      We believe that a transient binding step is a crucial part of the activation process, contributing to the dynamic nature of the system.

      - Providing information on the oligomeric state of CisA will strenghten the manuscript. Authors may consider having blue-native gel analysis of CisA-3xFLAG extracted from Streptomyces or E. coli membranes, or in vivo chemical cross-linking coupled to SDS-PAGE analyses. In case these quite straightforward experiments are not possible, the authors may consider providing AF3 models of various CisA multimers.

      Thank you for these suggestions. Unfortunately, we currently don’t have the capability to conduct additional experiments. However, we have performed additional AF3 modelling to explore potential different configurations of CisA. The results of these analyses suggest that CisA can assemble into a pentamer (see also Response to reviewer 1). We speculate that CisA may exist in different oligomeric states and that membrane-localized CisA monomers oligomerize into a larger protein complex in response to a cellular or extracellular (e.g. nisin) signal, which could then directly or indirectly interact with CIS particles in the cytoplasm to facilitate their recruitment to the membrane and CIS firing. Such a stress-dependent conformational change of CisA could also be a safety mechanism to prevent accidental interaction of CisA with CIS particles and CIS firing.

      We now show the AF model for the predicted CisA pentamer in Figure 3b/c and discuss the potential implications of the different CisA configurations in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      - The quantification of contracted versus extended CIS assemblies in the cytoplasm is only presented for the tomograms from the cisA mutant (graph in Fig. S2d). However, there are no data for the WT and complemented mutant to compare with. It would help to add such data, or at least refer to the previous quantification done for the WT in the previous paper. Further, would it be possible to illustrate the difference by measuring lengths of CIS assemblies and plot length distributions (assuming the extended ones are long and contracted are short)?

      Thank you for your suggestions. We have included the results from our previous quantification of CIS assembly states observed in the WT in the revised manuscript (lines 106–110).

      In the acquired tomograms of CIS particles observed in intact and dead hyphae, we consistently observed only two CIS conformations: the fully extended state (average length of 233 nm, diameter of 18 nm) and the fully contracted state (average length of 124 nm, diameter of 23 nm). We have added this information to the revised manuscript (lines 112-114).

      - The Western blot in Fig. 3d, top panel, contains additional bands that are not mentioned. Are they non-specific bands? Absent in disA mutant? It would help if it was clarified in the legend what they are.

      Correct, these additional bands are unspecific bands, which are also visible in the lysate and soluble fraction of wild-type sample (negative control, no FLAG-tagged protein). We have now labelled these bands in the figure and clarified the figure legend.

      - Fig. S8a needs improvement. It was not possible to clearly see the stated effect of disA deletion on secondary metabolite production in these photos.

      We agree and have removed figure panel S8a from the manuscript. The quantification of total actinorhodin production shown in Figure S8b convincingly shows a significantly reduction of actinorhodin production in the cisA deletion mutant compared to the wildtype and the complement mutant.

      - It is not an important point, but the paragraph in lines 109-116 appears more like a re-iteration of the Introduction than Results.

      We agree. We have removed the highlighted text from the Results section and added some of the information to the introduction.

      - Line 206 appears to have a typo. Should it not be WT instead of WT cisA?

      Correct. This is a typo which has been fixed. Thank you.

      - At the end of the Discussion, it is suggested that a stepwise mechanism of recruiting CIS to the membrane and then triggering firing would prevent unwanted activation and self-inflicted death. Since both steps appear to be dependent in DisA, it would be good to more clearly spell out how such a stepwise mechanism would work and how it could prevent spontaneous and erroneous firing of the system.

      Thank you for this suggestion. We have revised the text to clarify the proposed stepwise mechanism. Based on additional structural modeling, we propose that the conserved extra-cytoplasmic domain of CisA may play a role in sensing stress signals. Binding of a ‘stress-associated molecule’ could induce a conformational change in CisA, a hypothesis supported by: (1) Foldseek protein structure searches, which suggest that the conserved C-terminal CisA domain resembles substrate/solute-binding proteins, and (2) AlphaFold3 models predicting that CisA can form a pentamer via its putative substrate-binding domain. This suggests that a transition from CisA monomers to pentamers in response to stress may serve as a key checkpoint, activating CisA and facilitating the recruitment of CIS assemblies to the membrane, either directly or indirectly. Conversely, in the absence of a stress signal, CisA is likely to remain in its monomeric (resting) form, incapable of triggering CIS firing. We have revised the discussion to explain the proposed model in more detail.

      We recognize that this model poses many testable hypotheses that we currently cannot test but aim to address in the future.

      Reviewer #3 (Recommendations for the authors):

      There are a few concerns potentially worth addressing to strengthen the study or for future investigation.

      (1) It would be worth considering moving the first part of the result ('CisA is required for CISSc contraction in situ') after presenting the structure of extended CISSc, and combining it with the last part of the result section ('CisA is essential for the cellular function of CISSc'), as both parts describe the functional characterization of CisA.

      We appreciate the reviewer’s suggestion but have chosen to retain the current order of the results. As this manuscript focuses on the role of CisA, we believe that first establishing a functional link between CisA and CIS contraction provides essential context and motivation for the study.

      (2) Line 169: it is not clear to me if the fusion of CisA with mCherry is functional (if it complements the native CisA). Moreover, it was not shown if its localization changes under nisin stress or in the strain with non-contractile CISSc.

      We have not tested if the CisA-mCherry fusion is fully functional. While we cannot exclude the possibility that the activity of this protein fusion is compromised in vivo, we believe that the described accumulation of CisA-mCherry at the membrane is accurate. This conclusion is further supported by the results obtained from protein fractionation experiments and the membrane topology assay (Figure 3).

      We did not examine if the localization of CisA-mCherry changes in CIS mutant strains under nisin-stress, but this is something we will follow up on in the future.

      (3) In ref 18, the previous work from the same team presented a functional fluorescent fusion of Cis2 (sheath), thus, it will be interesting to see if (i) Cis2 localization and dynamics is affected by the absence of CisA under normal and stressed conditions; (ii) if Cis2 shows any co-localization with CisA under normal and especially stressed conditions, and potentially, its timing correlation to ghost cell formation by time-lapse imaging of both fusions.

      We thank this reviewer for the suggestions, and we plan to address these questions in the future.

      (4) Line 261: it was hypothesized by authors that the cytosolic portion of CisA was required for interacting with Cis11. While it was not possible to verify the direct interaction at current state, a S. coelicolor mutant lacking this cytosolic domain may be of help to indirectly test the hypothesis. Moreover, it would be interesting to see if the cytosolic region alone is enough to induce the contraction upon stress (by removing the TM-C region). If so, whether it leads to cell death, or if it is insufficient to cause cell death without membrane association despite the sheath contraction. If not, it would suggest that membrane association occurs before contraction.

      These are really great suggestions and if we had the manpower and resources, we would have performed these experiments. We plan to follow up on these questions in the future.

      However, additional structural modelling of CisA indicates that CisA may exist in different configurations (see response to Reviewer #1 and #2), a monomeric and/or a pentameric configuration. In these structural models (revised Figure 3), CisA oligomerization is mediated by the annotated periplasmic solute-binding domain. It is conceivable that CisA oligomerization (e.g. in response to a stress signal) presents a critical checkpoint that results in a conformational change within CisA monomers that subsequently drives CisA oligomerization into a configuration primed to interact with CIS particles. We would therefore speculate that the expression of just the cytoplasmic CisA domain may not be sufficient for CIS contraction and cell death.

      (5) Line 263: as it was not possible to express full-length cisA in E. coli, making it difficult to assess the interaction between CisA and Cis11, it may be worth considering expressing the cytosolic portion of CisA (ΔTM-C) instead of full-length CisA, or alternatively performing a co-immunoprecipitation assay of CisA (i.e., with an affinity tag) from S. coelicolor cultures under stressed conditions. However, I am aware that these may be beyond the scope of this work but can be considered for future investigation in general.

      Thank you for your suggestions and your understanding that some of this work is beyond the scope of this work. We have performed CisA-FLAG co-immunoprecipitation experiments from S. coelicolor cultures that were treated with nisin for 0/15/45 min. However, mass spectrometry analysis of co-eluted peptides did not show the presence of CIS-associated peptides at the analysed timepoints. While we cannot exclude technical issues with our assays that resulted in an inefficient solubilization of CisA from Streptomyces membranes, an alternative hypothesis is that the interaction between CIS particles and CisA is very transient and therefore difficult to capture. We would like to mention, however, that we did detect CisA peptides in crude purifications of CIS particles from nisin-stressed cells (Supplementary Table 2, manuscript: line 301/302), supporting our proposed model that CisA can associate with CIS particles in vivo.

      Minor points:

      (1) I will suggest moving Supplementary Fig 2d with control quantification of WT strain and complementation strain (similar to Fig 3g from ref 18) to the main Fig 1, as the quantitative representation with better comparison without going back and forth to ref 18.

      Thank you for your suggestion. Instead of moving Supplementary Fig. 2d to the main figure, we have added additional information in lines 106–110 to discuss the previous quantification of CIS assembly states in the WT, as described in our earlier work. We believe this approach allows readers to easily reference our established quantification without compromising the flow of the main figures.

      (2) Line 52/785: as work of Ref 12 has recently been published DOI: 10.1126/sciadv.adp7088, the reference should be updated accordingly.

      This reference has been updated. Thank you.

      (3) A brief description of key differences between contracted (ref 18) and extended sheath structure will be a good addition for a broader audience.

      Thank you for this suggestion. We have added more information on lines 178–180.

      (4) Fig 3d: it is not clear how well the samples from different fractions were normalized in amount (volume and cell density), but there was an inconsistency in the amount of CisA-Flag in lysate, vs. soluble and membrane fractions (total protein amount combined from soluble fraction and membrane fraction together seemed to be more than in the lysate, while in theory it should be more or less equal; and the amount of WhiA from WT seemed to be less than from the CisA-Flag strain). In the method section, it was mentioned that 'The final pellet was dissolved in 1/10 of the initial volume with wash buffer (no urea). Equi-volume amounts of fractions were mixed with 2x SDS sample buffer and analyzed by immunoblotting.' But it is still not clear whether equivalent amounts (normalized to the same OD for example) were used and if we could directly compare. A brief clarification in the legend of how samples were prepared is needed.

      The samples were normalized by first using the same volume of starting material (similar culture density and incubation period for each strain) and by loading equal volumes of each fraction for analysis. After fractionation, equi-volume amounts of the soluble and membrane protein fractions were mixed with 2× SDS sample buffer and subjected to immunoblotting, ensuring a consistent basis for comparison between samples. We have revised the figure legend and Material and Method sections to make this clear.

      We agree that the amount of CisA-3xFLAG appears slightly lower in the “Lysate” fraction compared to the “Membrane” fraction in Figure 3d (now Fig. 3f). However, this does not affect the overall conclusion of this experiment, showing that CisA-3xFLAG is clearly enriched in the membrane fraction.

      For reference, please find below the uncropped version of this Western blot image. Based on the signal of the unspecific bands, we would like to argue that equal amounts of samples obtained from the WT control strain (no FLAG epitope present) and a strain producing CisA-3xFLAG were loaded for each of the fractions. When we revisited this data, we noted that the protein size marker was wrong. This has been fixed.

      Author response image 1.

      (5) Fig. 4f: statistical analysis is missing.

      The missing statistical analysis has been added to this figure and figure legend.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      Meteorin proteins were initially described as secreted neurotrophic factors. In this manuscript, Eggeler et al. demonstrate a novel role for Meteorins in establish left-right axis formation in the zebrafish embryo. The authors generated null mutations in each of the three zebrafish meteorin genes - metrn, metrnla, and metrnlab. Triple mutant embryos displayed phenotypes strongly associated with left-right defects such as heart looping and visceral organ placement, and disrupted expression of Nodal-responsive genes, as did single mutants for metrn and metrnla. The authors then go on to demonstrate that these defects in left-right asymmetry are likely to due to defects in Kupffer's Vesicle and the progenitor dorseal forerunner cells including impaired lumen formation and reduced fluid flow, reduced clustering among DFCs, impaired DFC migration, mislocalization of apical proteins ZO-1 and aPKC, and detachment of DFCs from the EVL. Notably, the authors found that expression of marker genes sox32 and sox17 were not affected, suggesting Meteorins are required for DFC/KV morphogenesis but not necessarily fate specification. Finally, the authors show genetic interaction between Meteorins and integrin receptors, which were previously implicated in left-right patterning. In a supplemental figure, the manuscript also presents data showing expression of meteorin genes around the chick Hensen's node, suggesting that the left-right patterning functions may be conserved among vertebrates.

      Strengths:

      Strengths of this study include the generation of a triple mutant line that targets all known zebrafish meteorin family members. The experiments presented in this study were rigorous, especially with respect to quantification and statistical analysis.

      Weaknesses:

      Although the authors convincingly demonstrate a role for Meteorins in zebrafish left-right patterning, data supporting a conserved role in other vertebrates is compelling but limited to one supplemental figure.

      We thank the reviewer for their thoughtful summary of our study and for highlighting the strengths of our work, including the generation of the triple mutant line and the rigor of our experimental design and quantitative analyses. We also appreciate the constructive feedback regarding the limited functional data supporting the conservation of Meteorin function in other vertebrates. We agree that this is an important aspect that could be further explored. While functional studies in additional species are beyond the current scope, we will consider such experiments in future work.

      We would like to highlight the phylogenetic analysis of Meteorin proteins we have already performed and included in the manuscript (Fig. S7D), which illustrates the evolutionary conservation of this protein family and supports the possibility of a conserved role in left-right patterning.

      Additionally, we have expanded the methods and discussion to include: (1) details on zebrafish viability in contrast to reported embryonic lethality in metrn mutant mice, (2) the background strains used in our study, (3) observed variability in DFC number and potential batch effects and (4) clarification of our 'convergence ratio' quantification approach.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors describe their study on the role of meteorins in establishing the left-right organizer. The left-right organizer is a transient organ in vertebrate embryos in which rotating cilia cause a fluid flow that breaks the left-right symmetry and coordinates lateralization of internal organs such as gut and heart. In zebrafish, the left-right organizer (also named Kupffer's vesicle) is formed by dorsal forerunner cells, but very little is known about how dorsal forerunner cells coalles and form this ciliated vesicle in the embryo. The authors mutated the three meteorin-coding genes in zebrafish and observed that mutations in each one of these causes laterality defects with the strongest defects observed in the triple mutant. Loss of meteorins affects nodal gene expression, which play essential roles in establishing organ laterality. Meteorins are widely expressed in developing embryos and expression in lateral plate mesoderm and dorsal forerunner cells was observed. The meteorin triple mutant embryos display defects in the migration and clustering of the dorsal forerunner cells impairing kupffer's vesicle formation and cilia rotation. Finally, the authors show that meteorins genetically interact with integrins.

      Strengths:

      - These authors went through the lengthy process of generating triple mutants affecting all three meteorin genes. This provides robust genetic evidence on the role of meteorins in establishing organ laterality and circumvented that interpretation of the results would be hard due to redundant functions of meteorins.

      - The use of life imaging on triple mutants is appreciated

      - High-quality imaging of dorsal forerunner to quantify cell migrations and its relation to Kupffer's vesicle formation.

      Weaknesses:

      - Lack of a model how meteorins regulate dorsal forerunner cell migration.

      - Only genetic data to suggest a link between meteorins and integrins

      - Besides its role in DFC migration, meteorins may also play a more direct role in regulating Nodal signaling, which is not addressed here.

      We appreciate the recognition of the strengths of our study, particularly the generation of the triple meteorin mutants and the use of high-resolution imaging to quantify DFC behavior and Kupffer’s vesicle formation—both of which were central to providing robust evidence for Meteorins' role in left-right patterning.

      We also value the reviewer’s comments on areas that need further exploration, including the need for a mechanistic model explaining how Meteorins regulate DFC migration, the genetic interaction with integrins, and the potential direct involvement of Meteorins in Nodal signaling.

      We agree that deeper mechanistic insights would strengthen the study. While our findings suggest that Meteorins influence DFC migration and clustering through integrin pathways, a detailed mechanistic dissection, particularly regarding the yet unidentified Meteorin receptor, lies beyond the current scope. However, we consider this a key aspect for future research and have discussed it further in the revised discussion section.

      In response to the reviewer’s suggestions, we have expanded the discussion to address the limitations of the current data linking Meteorins and integrins, including relevant citations to studies that implicate integrins in similar contexts. Additionally, we have added a more detailed discussion of the potential for Meteorins to directly influence Nodal signaling, and we cite a relevant study to support this possibility.

      Once again, we thank the reviewer for their insightful and constructive comments. These points raise important directions for future investigation that will further advance our understanding of Meteorin function in left-right axis formation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the Results section (p. 9), the authors state, "...a reduced ZO-1 enrichment at the apical junctions of triplMUT GFP-positive DFCs could be detected." However, in Fig. 4F-G, the areas of ZO-1 enrichment indicated by arrowheads appear quite far from the DFCs themselves, making it unclear if these ZO-1-enriched areas are apical DFC junctions (as stated in the text) or instead are part of the EVL. Is it possible to include an additional cell membrane marker or other landmarks? In addition, the differences in ZO-1 accumulation between mutants and WT appear relatively modest. Is it possible to provide quantification of this effect?

      We appreciate the reviewer’s request for additional stainings and further clarification and we would like to highlight the requested quantifications of ZO-1 accumulation, including statistical analysis, are already provided in Fig. S5E.

      In mouse, loss of Meteorin is embryonic lethal yet the zebrafish triple mutants are viable. Could the authors discuss this discrepancy?

      We have expanded the discussion to address this point, suggesting that species-specific differences in compensatory mechanisms may explain the observed differences in viability. We would like to reiterate that while one study has reported embryonic lethality in metrn mutant mice, this specific mouse line has not been further investigated in any recent publications. Additionally, in collaboration with the lab of Alain Chédotal, we generated independent metrn and metrnl mutant mouse lines, which did not exhibit the phenotype described in the previously mentioned study.

      It has been reported that TL and AB strains exhibit variable numbers of DFCs and thus laterality defects (Moreno-Ayala et al., 2021, Cell Reports 34(2):108606). Would it be possible for the authors to report background stains used in this study and those used to generate the meteorin knock-outs?

      We appreciate the comment highlighting the importance of specifying the background strains used in our study. We have now included this information in the methods section, detailing the zebrafish strains utilized throughout our experiments.

      For statistical analysis, would be possible for the authors to report the number of clutches examined to control for batch effects (especially given the wide variability in DFC numbers as noted above)?

      For further clarification, we have now included additional explanation on number of clutches in the methods section.

      In the Methods section (p. 19), the description of how the convergence ratio was computed was somewhat unclear. Could the authors provide a citation or include a diagram/schematic?

      We have revised the Methods section to provide a clearer definition of the convergence ratio and have included a schematic (Fig. 4D) to illustrate how it was calculated.

      Reviewer #2 (Recommendations for the authors):

      - Meteorins are widely expressed in the embryo. Can the authors comment on whether meteorin expression is required in the dorsal forerunner cells (DFCs) or in other cells? This could be addressed by knockdown experiments in DFCs as described by others (PMID: 15716348)

      We thank the reviewer for this important comment. In our study, we have shown that Meteorins are not required for the identity of DFCs, as several DFC-specific markers remain expressed in the respective cells within the meteorin mutant background (see Fig. S4).

      - In fig1d and 1e the authors use heterotaxy to describe visceral organ placement. The embryo shown in 1d seems to display situs inversus instead of heterotaxy, which is defined as discordance in organ position. The authors should clarify this.

      We agree with the reviewer and have revised the figures and figure legends to clarify the distinction between situs inversus and heterotaxy.

      - In Fig2 the authors show that nodal pathway genes are reduced, suggesting reduced Nodal signaling. How do they explain this as loss of cilia rotation generally leads to randomization of Nodal signaling but not a reduction in signaling.

      Following this suggestion we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study.

      - Reduced Nodal signaling in the LPM leads to organ laterality defects. Most anterior tissues like the heart are more sensitive to perturbation in Nodal signaling in the LPM compared to more posterior organs like gut (see also PMID: 25684355). Since in triple mutants the position of the heart is more affected than the position of the visceral organs this suggests that meteorins play an additional role in Nodal signaling in the LPM. As others have shown that meteorins regulate nodal activity (PMID: 24558432), the authors should address this further.

      As described above, we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study. Further investigation into a possible direct role of Meteorins in Nodal signaling will be pursued in future work.

      - The term 'convergence ratio' is not clearly described and confusing as convergence is also used for the movement of LPM cells towards the midline.

      As noted in response to Reviewer #1, we have revised the Methods section and included a schematic in Fig. 4D to better explain this parameter.

      We are grateful for the thoughtful critiques from both reviewers, which have been very constructive and improved the clarity of our study. We believe that the revisions we have made address the concerns raised, and we look forward to your evaluation of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      (1) The authors state that they have reclassified the allelic expression status of 32 genes (shown in Table S5, Supplementary Figure 3). The concern is the source of the tissue or cell line which was originally used to make the classification of XCI status, and whether the comparisons are equivalent. For example, if cell lines (and not tissues) were used to define the XCI status for EGFL6, TSPAN6, and CXorf38, then how can the authors be sure that the escape status in whole tissues would be the same? Also, along these lines, the authors should consider whether escape status in previous studies using immortalized/cancer cell lines (such as the meta-analyses done in Balaton publication) would be different compared to healthy tissues (seems like it should be). Therefore, making comparisons between healthy whole tissues and cancer cell lines doesn't make sense.

      Indeed, many previous classifications were based on clonal cell lines, which could result in atypical patterns of escape due to the profound and varied effects of adaptation to culture. However, one of the primary goals of our study was to directly determine allele-specific expression from the X-chromosome in healthy primary tissues, in part to exclude the potential confounding effects of cell culture. 

      Whereas we do perform comparisons with cell culture-based classifications, we also provide detailed comparisons with the previous classification of Tukiainen et al, which also uses primary human tissues. In addition, whereas the comparison with Balaton et al is not optimal, we hold that it is valuable as it reveals which genes may exhibit aberrant escape patterns in culture. Finally, despite the above reservations, our comparison revealed an over-whelming agreement with previous research which suggests that in the vast majority of cases, escape appears to be correctly maintained in culture. 

      (2) The authors note that skewed XCI is prevalent in the human population, and cite some publications (references 8, 10-12). If RNAseq data is available from these female individuals with skewed XCI (such as ref 12), the authors should consider using their allelic expression pipeline to identify XCI status of more X-linked genes.

      Indeed, we completely agree and are in the process of obtaining this data which has proven complex and time-consuming in the currently regulatory environment.

      (3) It has been well established that the human inactive X has more XCI escape genes compared to the mouse inactive X. In light of the author's observations across human tissues, how does the XCI status compare with the same tissues in mice?

      This is a very interesting point, and a comparison we are currently working on. However, this is a major undertaking and one that is outside of the scope of this study. We do appreciate the differences in mice and humans on X-chromosome level and could only speculate on the overlap being relatively small as the number of escapees in mice has been shown the be far lower than in humans.

      Reviewer 2 (Public review):

      In my view there are only minor weaknesses in this work, that tend to come about due to the requirement to study individuals with highly skewed X inactivation. I wonder whether the cause of the highly skewed X inactivation may somehow influence the likelihood of observing tissue-specific escape from X inactivation. In this light, it would be interesting to further understand the genetic cause for the highly skewed X inactivation in each of these three cases in the whole exome sequencing data. Future additional studies may validate these findings using single-cell approaches in unrelated individuals across tissues, where there is normal X inactivation.

      We thank the reviewer for their positive assessment of our work. This is a point we have and continue to grapple with. We cannot rule out that the genetic cause of complete skewing may influence tissue-specific XCI.  Moreover, the genetic cause for the non-mosaic XCI is currently unclear and is likely to vary between individuals, which could also result in inter-individual variation in tissue-specific escape. We are currently performing large prospective studies in the tissues of healthy females to specifically address this point.

      Reviewer 3 (Public review):

      There are very few, except that this escape catalogue is limited to 3 donors, based on a single(representative) tissue screen in 285 female donors, mostly using muscle samples. However, if only pituitary samples had been screened, nmXCI-1 would have been missed. Additional donors in the 285 representative samples cross a lower threshold of AE = 0.4. It would be worthwhile to query all tissues of the 285 donors to discover more nmXCI cases, as currently fewer than half of X-linked genes received a call using this very worthwhile approach.

      We thank the reviewer for their positive assessment of our work. Of course, we agree that a tissue-wide screen in all individuals would have been optimal and is a line of research we are currently pursuing. However, the analysis of allele-specific expression in all 5,000 RNA-seq samples is a massive undertaking and was simply not practicable within the time-scale of this study. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Thanks to the authors for an interesting manuscript! I enjoyed reading it and the care that has gone into explaining the analyses and the findings. There are a few recommendations that I have for strengthening the work.

      We thank the reviewer for the nice feedback. Much appreciated.

      (1) I would like to see a genetic analysis of the three individuals, to try and identify the genetic causes of the skewed X inactivation beyond just considering the XIC or translocations. The cause of the highly skewed X inactivation would be of interest to many.

      This is certainly a very interesting avenue of research and one that we are currently focusing on. However, in the current study we simply had too few skewed XCI females to assess this  in an exhaustive manner. To tackle this issue, we have begun a prospective study of healthy females to identify additional non-mosaic females.

      (2) I wonder whether the cause of the skewed XCI may somehow influence the assessment of tissue-specific escape? If there is a problem with X inactivation itself, perhaps escape would also be different, making it appear more constitutive than tissue-specific?

      This is a point we have and continue to grapple with. We cannot rule out that the genetic cause of complete skewing may influence tissue-specific XCI.  Moreover, the genetic cause for the non-mosaic XCI is currently unclear and is likely to vary between individuals, which could result in inter-individual variation in tissue-specific escape.

      (3) Presentation/wording suggestions:

      I think the abstract is likely a bit inaccessible to those outside the field. I am in the X inactivation field, but don't use the term non-mosaic X inactivation, but rather would call it highly skewed, or non-random X inactivation. In my view, it would be simpler for the abstract to call non-mosaic XCI highly skewed XCI instead, or to use more words to ensure it is clear for the reader.

      We agree that the terminology of completely skewed/non-mosaic XCI could be more clearly defined in the abstract and have clarified this. “Using females that are non-mosaic (completely skewed) for X-inactivation (nmXCI) has proven a powerful and natural genetic system for profiling X-inactivation in humans.”

      I would consider calling the always escape genes constitutive escapees, while the variable may be facultative.

      This is something we have also considered and have received differing feedback on. However, we will definitely keep this in mind for future publications.

      Line 132, it would be useful to explain median >0.475 as less than 2.5% of reads coming from the inactive allele here, not just in the methods. Can you also explain why this cutoff was chosen?

      We thank the reviewer for this clarification. A clarification has been added to the main text as suggested.

      The cutoff was applied to account for potential variations in skewing, given that we screened only a single tissue sample per individual. Although nmXCI females are theoretically expected to have 0% of reads originating from the 'inactive' allele, this is not always observed due to (a) technical errors such as PCR or sequencing inaccuracies, or (b) differences in skewing between tissue types.

      Lines 156-160 describe how the heterozygous SNPs were identified in relation to Figure 2. I read these in the methods so that I could understand Figure 1, so I suggest moving this section up.

      We have moved the section as suggested by the reviewer.

      Line 156, consider adding in a sentence to describe what is shown in Figures 2A and B i.e, the overlap of SNPs and spread along the X.

      We have added a sentence describing what is shown in Figures 2A and 2B as suggested by the reviewer.

      Line 217, it would be useful to give the % of genes that show tissue-specific escape, to quantify rare.

      We have added a sentence quantifying ‘rare’ at the suggested line.

      (4) Typos:

      Line 119, missing 'the most' before extensive (and remove an).

      We thank the reviewer for pointing this out. This error has been corrected.

      Reviewer #3 (Recommendations for the authors):

      Some results in the supplementary figures were quite striking. What is going on with DDX3X and ZRSR2? How come total read counts are so different between individuals?

      Indeed, this is a very intriguing observation and one that we have simply failed to understand thus far. We are currently performing a large prospective study to obtain greater number of non-mosaic females and tissues samples. Hopefully, additional observations across females will allow us to gain further insights into the inter-individual behaviour of DDX3X and ZRSR2.   

      One item I would like to see added is some analysis to address the cause of these extremely skewed XCI individuals. The copy number analysis suggests there are some segmental deletions on the X in all three nmXCI cases. Where are these deletions, and do any fall in the region of the X-inactivation centre? Have the authors performed any analysis of potentially deleterious X-linked variants in the WGS or WES data? Why are these donors so skewed? It's interesting that UPIC was still more skewed than the other two.

      The segmental deletions the reviewer points out are not segmental deletions, the same variation in coverage is found in all females we’ve looked at including females with a mosaic XCI (see Author response image 1 below where the same pattern of slightly lower read counts is observed at the same sites in all female samples). No deletions were identified in the XIC region. No analysis was performed of deleterious X-linked variants. Why the donors are so skewed is unknown and intriguing. Indeed, identifying the origin of extreme skewing (including the females in this study) is now the main focus of the group. Whereas UPIC had trisomy 17, which has likely resulted in the observed skewing, we have not yet found a genetic variant that could explain the skewing observed in 13PLJ or ZZPU.

      Author response image 1.

      Copy number as log2 ratio using 500kb bins across the X-chromosome for 3 mosaic XCI females (1QPFJ, OXRO, and RU1J) and 3 nmXCI females, UPIC, nmXCI-1 and nmXCI-2.

      This is not necessary to address with new analyses, but as alluded to above, the authors could screen more than a single representative tissue. And to apply this analysis to larger databases (UK biobank), which the authors may be planning to do already.

      This an avenue of research we are currently investigating. 

      The code is well-documented and accessible. Additional information on the manual reclassification (to deal with inflated binomial P-values) would be helpful. Why not require a minimal threshold for escape (10% of active X allele) in addition to a significant binomial P (inactive X exp. > 2.5% of active)?

      We thank the reviewer for this positive assessment of the code. 

      Indeed, how to define ‘escape’ is a vexed issue, and one we feel has been given undue weight within the field. In reality, studies of escape are often dealing with sparse data (e.g. read depth), few observations (genes and individuals) and substantial amounts of missing data. Thus, it is unlikely that a standard statistical approach will be sensitive and specific across different studies and data types. Similarly, cut-offs, though useful would also need to be adjusted to the data type and quality in any given study.

      Whereas we initially used a significant binomial P-value as our sole test (often quoted as ‘best practice’), this resulted in wide-spread inflation of P-values. Thus, we switched to manually curating the allelic expression status of all 380 genes using the empirical guideline of allelic ratio >0.4 (also a commonly used cut-off) as indicating mono-allelic expression. We considered combining the binomial P-value with the cut-off but felt that this would result in an overly complex definition of escape and would unnecessarily exclude many genes from classification, due to the opposing effects of low/high read depth on the binomial and cut-off approaches respectively.

      Indeed, due to the difficultly of both accurate and objective ‘classification’ of escape that we placed an emphasis on clearly displaying all data for each gene in each individual to allow readers to see all the data on which each classification was based.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      This work examines the binding of several phosphonate compounds to a membrane-bound pyrophosphatase using several different approaches, including crystallography, electron paramagnetic resonance spectroscopy, and functional measurements of ion pumping and pyrophosphatase activity. The work attempts to synthesize these different approaches into a model of inhibition by phosphonates in which the two subunits of the functional dimer interact differently with the phosphonate.  

      Strengths:  

      This study integrates a variety of approaches, including structural biology, spectroscopic measurements of protein dynamics, and functional measurements. Overall, data analysis was thoughtful, with careful analysis of the substrate binding sites (for example calculation of POLDOR omit maps).  

      Weaknesses:  

      Unfortunately, the protein did not crystallize with the more potent phosphonate inhibitors. Instead, structures were solved with two compounds with weak inhibitory constants >200 micromolar, which limits the molecular insight into compounds that could possibly be developed into small molecule inhibitors. Likewise, the authors choose to focus the spectroscopy experiments on these weaker binders, missing an opportunity to provide insight into the interaction between more potent binders and the protein. 

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We attempted cocrystallization with all available inhibitors, including those with higher potency. However, despite numerous efforts, these potent inhibitors yielded low-resolution crystals, making them unsuitable for detailed structural analysis. Therefore, we chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy and monitor conformational TmPPase state ensembles in solution with the added advantage of accurately analysing the data against structural models derived from X-ray crystallography. Using these weaker inhibitors enabled a more precise interpretation of the DEER data, thus providing reliable insights into the conformational dynamics and inhibition mechanism. As suggested by the reviewer, in the revised version, we add new DEER experiments, conditions and analysis on two of the more potent inhibitors (alendronate and pamidronate) to provide additional insight into their interactions. Furthermore, we also implemented additional DEER data on the cytoplasmic side of TmPPase; at a new site we identified (with the advantage of being an endogenous cysteine residue) and spin labelled (C599R1), given the DEER data for the previous T211R1cytoplasmic site were difficult to interpret owing to the highly dynamic nature of this region. The new pair C599R1 yielded high-quality DEER traces and indicated more clearly than T211R1, distance distributions consistent with asymmetry across the sampled conditions.  Again, as suggested by the reviewer, alendronate and pamidronate DEER measurements were also recorded for this site (cytoplasmic side; C599R1) as well as the periplasmic side (525R1).

      In general, the manuscript falls short of providing any major new insight into membrane-bound pyrophosphatases, which are a very well-studied system. Subtle changes in the structures and ensemble distance distributions suggest that the molecular conformations might change a little bit under different conditions, but this isn't a very surprising outcome. It's not clear whether these changes are functionally important, or just part of the normal experimental/protein ensemble variation. 

      We respectfully disagree with the reviewer. The scale of motions particularly seen in solution (and now on a new reliable spin pair (C599R1) located on the cytoplasmic side) correspond to those seen in the full panoply of crystal structures of mPPases. Some proteins undergo very large conformational changes during catalysis – such as the rotary ATPase. This one does not, meaning that the precise motions we describe here are relevant and observed in solution for the first time. Conformational changes in the ensemble, whether large or small, represent essential protein motions which underlie key mPPase catalytic function. These dynamic transitions are extremely challenging to monitor, especially in so many conditions and our DEER spectroscopy data demonstrate the sensitivity and resolution necessary to monitor these subtle changes in equilibria, even if these are only a few Angstroms. For several of the conditions we investigated by DEER in solution, corresponding X-ray structures have been solved, with the derived distances agreeing well with the DEER distributions. This further validates the biological relevance of the structures, and reveals the complete conformational ensemble, intractable using other current approaches. Indeed, some conformational states were previously seen using serial time-resolved X-ray static structures and were consistent with asymmetry.

      The ZLD-bound crystal structure doesn't predict the DEER distances, and the conformation of Na+ binding site sidechains in the ZLD structure doesn't predict whether sodium currents occur. This might suggest that the ZLD structure captures a conformation that does not recapitulate what is happening in solution/ a membrane. 

      We agree with the reviewer that the ZLD-bound crystal structure does not predict the DEER distances. However, we believe this discrepancy arises from the steric bulkiness of ZLD inhibitor, which prevents the closure of the hydrolytic centre. Additionally, the absence of Na+ at the ion gate in the ZLD-bound structure suggests that Na+ transport does not occur, a conclusion further supported by our electrometric measurements. We agree with the reviewer; distances observed in the DEER experiments might represent a potential new conformation in solution, not captured by the static X-ray structure, thereby offering new insights into the dynamic nature of the protein under physiological conditions. This serves to emphasize the complementarity of the DEER approach to Xray crystallography and redoubles the importance of using both techniques. Finally, the static X-ray structures have not captured the asymmetric conformations that must exist to explain half-of-thesites reactivity, where DEER yields distance distributions, across all 16 cases tested here (two mutants with eight conditions each), that are consistent with asymmetry.

      Reviewer #2 (Public review):  

      Summary:  

      Crystallographic analysis revealed the asymmetric conformation of the dimer in the inhibitor-bound state. Based on this result, which is consistent with previous time-resolved analysis, authors verified the dynamics and distance between spin introduced label by DEER spectroscopy in solution and predicted possible patterns of asymmetric dimer.  

      Strengths:  

      Crystal structures with inhibitor bound provide detailed coordination in the binding pocket thus useful information for the mPPase field and maybe for drug development.  

      Weaknesses:  

      The distance information measured by DEER is advantageous for verifying the dynamics and structure of membrane protein in solution. However, regarding T211 data, which, as the authors themselves stated, lacks measurement precision, it is unclear for readers how confident one can judge the conclusion leading from these data for the cytoplasmic side. 

      We thank the reviewer for acknowledging the advantageous use of the DEER methodology for identifying dynamic states of membrane proteins in solution. In our original manuscript, we used two sites in our analysis: S525 (periplasm) and T211 (cytoplasm), in which S525R1 yielded highquality DEER data, while T211R1 yielded weak (or no) visual oscillations, leading to broad distributions for the several conditions tested. In the revised manuscript, we now added a third site at the cytoplasmic side (C599R1 located at TMH14), which yielded high-quality DEER data and comparable to S525R1. Both C599R1 and C525R1 spin pairs generated distance distributions for all 16 conditions (two mutants of eight conditions each) that were described well by the solution-state ensemble adopting a predominantly asymmetric conformation.  

      Furthermore, we have tailored our interpretation of the T211R1 DEER data, and refrain from using the data to draw conclusions about the TmPPase conformational ensemble in the presence of different inhibitors. However, we still opted to include the T211R1 data in the SI because they confirm an important structural feature of mPPase in solution conditions; the intrinsically dynamic behaviour of the loop5-6 where T211 is located. This observation in solution is also consistent with our previous (Kellosalo et al., Science, 2012; Li et al., Nat. Commun, 2016; Vidilaseris et al., Sci. Adv., 2019; Strauss et al., EMBO Rep., 2024) and current X-ray crystallography data. To reiterate, we excluded T211R1 from any analysis relating to mPPase asymmetry and our conclusions were entirely based on the S525R1 and new C599R1 DEER data, which allowed us to monitor both sides on the membrane.  

      The distance information for the luminal site, which the authors claim is more accurate, does not indicate either the possibility or the basis for why it is the ensemble of two components and not simply a structure with a shorter distance than the crystal structure.  

      We thank the reviewer for pointing out this possibility and alternative interpretation of our DEER data. We now provide further analysis to show that our DEER data from both membrane sides reporters are highly consistent with (although they cannot completely exclude) asymmetry and rephrase to be inclusive of other possibilities. Importantly, this additional possibility does not affect the current interpretation of the data in our manuscript. Furthermore, we have removed Fig. 6 from the manuscript, and we now include a direct comparison of the in silico predicted distribution coming from the asymmetric hybrid structure with the 8 conditions tested, for both mutants (i.e. S525R1 and C599R1).

      Reviewer #3 (Public review):  

      Summary:  

      Membrane-bound pyrophosphatases (mPPases) are homodimeric proteins that hydrolyze pyrophosphate and pump H+/Na+ across membranes. They are attractive drug targets against protist pathogens. Non-hydrolysable PPi analogue bisphosphonates such as risedronate (RSD) and pamidronate (PMD) serve as primary drugs currently used. Bisphosphonates have a P-C-P bond, with its central carbon can accommodate up to two substituents, allowing a large compound variability. Here the authors solved two TmPPase structures in complex with the bisphosphonates etidronate (ETD) and zoledronate (ZLD) and monitored their conformational ensemble using DEER spectroscopy in solution. These results reveal the inhibition mechanism of these compounds, which is crucial for developing future small molecule inhibitors.  

      Strengths:  

      The authors show that seven different bisphosphonates can inhibit TmPPase with IC50 values in the micromolar range. Branched aliphatic and aromatic modifications showed weaker inhibition.  

      High-resolution structures for TmPPase with ETD (3.2 Å) and ZLD (3.3 Å) are determined. These structures reveal the binding mode and shed light on the inhibition mechanism. The nature of modification on the bisphosphonate alters the conformation of the binding pocket.  

      The conformational heterogeneity is further investigated using DEER spectroscopy under several conditions.  

      Weaknesses:  

      The authors observed asymmetry in the TmPPase-ELD structure above the hydrolytic center. The structural asymmetry arises due to differences in the orientation of ETD within each monomer at the active site. As a result, loop5-6 of the two monomers is oriented differently, resulting in the observed asymmetry. The authors attempt to further establish this asymmetry using DEER spectroscopy experiments. However, the (over)interpretation of these data leads to more confusion than any further understanding. DEER data suggest that the asymmetry observed in the TmPPase-ELD structure in this region might be funneled from the broad conformational space under the crystallization conditions. 

      We respectfully disagree with the reviewer. The asymmetry was previously established using serial time crystallography (Strauss et al., EMBO Rep, 2024) and biochemical assays (e.g. Malinen et al., Prot. Sci., 2022; Artukka et al., Biochem J, 2018; Luoto et al., PNAS, 2013) and partially seen in one static structure (Vidilaseris et al., Sci Adv 2019). DEER data here also show that the previously proposed asymmetry is also present (and this presence of asymmetry is consistent across all DEER data) within the TmPPase conformational ensemble in solution conditions. Although we cannot rule out the possibility that the TmPPase monomers adopt a metastable intermediate state, in such a case we would expect the distance changes reported by DEER to be symmetric across both membrane sides. However, we observe a symmetry breaking between the cytoplasmic and periplasmic TmPPase sites. Indeed, DEER data yield distance distributions similar to that of the hybrid asymmetric structure under all: apo, +Ca, +Ca/ETD, +ETD, +ZLD, +IDP, +PAM, +ALE conditions.

      DEER data for position T211R1 at the enzyme entrance reveal a highly flexible conformation of loop56 (and do not provide any direct evidence for asymmetry, Figure EV8).

      Please see relevant response above. We acknowledge that T211 is indeed situated on a highly dynamic loop, which is important for gating and our DEER data confirm the high flexibility of this protein region. Given we have not observed dipolar oscillations, leading to broad distributions, we have stated in the original manuscript that we will not establish the presence of any asymmetry in solution on the basis of T211, rather relying on the S525R1 and the new C599R1 sites, for which we have acquired high-quality DEER data, as was also pointed out and has been commented on by all reviewers. We have provided data at the C599R1 position (same cytoplasmic side as 211 for which we have now limited our analysis to a minimum) which further provides evidence for asymmetry, including two new conditions.

      Similarly, data for position S521R1 near the exit channel do not directly support the proposed asymmetry for ETD.  

      The reviewer appears to suggest that we hold the S525R1 DEER data as direct proof of asymmetry; this is combative on the grounds that to directly prove asymmetry would require time-resolved DEER measurements, far beyond the scope of this work. Rather, we have applied DEER measurements to explore whether asymmetry (observed previously via time-resolved X-ray crystallography) is also present (or indeed a possibility) in solution. All our S525R1 and C599R1 DEER data (recorded for eight conditions) are consistent with asymmetry (see also detailed response above).

      Despite the high quality of the data, they reveal a very similar distance distribution. The reported changes in distances are very small (+/- 0.3 nm), which can be accommodated by a change of spin label rotamer distribution alone. Further, these spin labels are located on a flexible loop, thereby making it difficult to directly relate any distance changes to the global conformation

      We thank the reviewer for recognising the high quality of our DEER data for the S525R1 site which we now complement with a new pair on the cytoplasmic facing membrane side (C599R1) with DEER data of comparable quality as for S525R1, where visual oscillations in the raw traces for both spin pairs, as in our case, reportedly lead to highly accurate and reliable distributions, able to separate (in fortuitous cases) helical movements of only a few Angstroms (Peter et al., Nature Comms 13:4396, 2022; Klose et al., Biophys J 120:4842-4858, 2021). The ability of DEER/PELDOR offering near Angstrom resolution was also previously demonstrated by the acquisition and solution of highresolution multi-subunit spin-labelled membrane protein structures (Pliotas at al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015; Pliotas, Methods Enzymol, 2017) as well as its ability in detecting small (and of similar to mPPase magnitude) conformational changes in different integral membrane protein systems (Kapsalis et al., Nature Comms, 2019; Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Lane et al., Structure, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024), occurring under different conditions and/or stimuli in solution and/or lipid environment. The changes here are not below the detection sensitivity of DEER (e.g. ~ 7 Angstroms between the two modal distance extremes (+Ca vs +IDP for S525R1), and with all other conditions showing intermediate changes.  

      We agree with the reviewer that these changes are relatively small, but they are expected for membrane ion pumps. Indeed, none of the mPPase structures show helical movements of greater than half a turn, and that only in helices 6 and 12. There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg<sup>2+</sup> ions that coordinate the leaving group phosphate. This is, inter alia, the reason that this loop is so flexible: it cannot order before substrate is bound.  

      The reviewer suggests that the subtle distance shifts detected arise only from changes of label rotamer distribution. However, the concerted nature of the modal distance shifts with respect to multiple different conditions at a single labelling site strongly suggests that preferential rotamer orientations are not the cause. Indeed, for so many spin labels to undergo an arbitrary shift that the modal distance of the entire distribution changes – and in the absence of any conformational change – appears improbable. Here we have the resolution to detect such subtle differences by DEER, given there are unambiguous shifts in our time domain data (i.e. the position of the minimum of the first dipolar oscillation) (Fig 4) and these are reflected in the modal distances in the distributions. We also refrain from performing any quantitative analysis and use qualitative trends in modal distance shifts only; all which support our proposed model of a symmetry breaking across the membrane face. To further belabour this point, we do not quantify the DEER data (for instance through parametric fitting) to extract populations of different conformational states and we appreciate that to do so would be highly prone to error; however we do (and can, we feel without over-interpretation) assert that the modal distances shift.  

      The interpretations listed below are not supported by the data presented:  

      (1) 'In the presence of Ca2+, the distance distribution shifts towards shorter distances, suggesting that the two monomers come closer at the periplasmic side, and consistent with the predicted distances derived from the TmPPase:Ca structure.'

      Problem: This is a far-stretched interpretation of a tiny change, which is not reliable for the reasons described in the paragraph above. 

      While the authors overall agree with the reviewer assessment that ±0.3 nm is a small (not a minor) change, there are literature examples quantifying (or using for quantification) distribution peaks separated by similar Δr. (Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024). However, the time-domain data clearly indicate the position of the first minimum of the dipolar oscillation shifts to shorter dipolar evolution time. The sensitivity of the time-domain data to subtle changes in dipolar coupling frequency is significantly improved compared to the distance distributions.

      Importantly, we have fitted Gaussians to the experimental distance distributions of 525R1 output by the Comparative Deer Analyzer 2.0 and observed a change in the distribution width in presence of Ca2+, implying the rotameric freedom of the spin label is restricted. However, the CW-EPR for 525R1 indicate that the rotational correlation time of the spin label is highly consistent between conditions (the spectra are almost identical); this cannot be explained simply by rotameric preference of the spin label (as asserted by the reviewer 3), as there is no (further) immobilisation observed from the CW-EPR of apo-state (Figure EV9) to that in presence of Ca2+. Furthermore, in the absence of conformational changes, it is reasonable to assume (and demonstrable from the CW-EPR data) that the rotamer cloud should not significantly change between conditions. However, Gaussian fits of the two extreme cases yielding the longest (i.e., in presence of IDP) and shortest (in presence of ZLD) modal distances for the 525R1 DEER data indicated significant (i.e., above the noise floor after Tikhonov validation) probability density for the IDP condition at 50 Å (P(r) = 0.18). This occurs at four standard deviations above the mean of the Guassian fit to the +ZLD condition, which by random chance should occur with <0.007% probability.  

      As in previous response, the method can detect changes of such magnitude which are not small, but physiologically relevant and expected for integral membrane proteins, such as mPPases. Indeed, even in equal (or more) complex systems such as heptameric mechanosensitive channel proteins DEER provided sub-Angstrom accuracy, when a spin labelled high resolution XRC structure was solved (Pliotas et al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015). Despite this being an ideal case where DEER accuracy was experimentally validated another high-resolution structural method on modified membrane protein and is not very common it demonstrates the power of the method, especially when strong oscillations are present in the raw DEER data (as here for mPPase S525R1, and C599R1), even when multiple distances are present, Angstrom resolution is achievable in such challenging protein classes.

      (2) 'Based on the DEER data on the IDP-bound TmPPase, we observed significant deviations between the experimental and the in silico distances derived from the TmPPase:IDP X-ray structure for both cytoplasmic- (T211R1) and periplasmic-end (S525R1) sites (Figure 4D and Figure EV8D). This deviation could be explained by the dimer adopting an asymmetric conformation under the physiological conditions used for DEER, with one monomer in a closed state and the other in an open state.'  

      Problem: The authors are trying to establish asymmetry using the DEER data. Unfortunately, no significant difference is observed (between simulation and experiment) for position 525 as the authors claim (Figure 4D bottom panel). The observed difference for position 112 must be accounted for by the flexibility and the data provide no direct evidence for any asymmetry.  

      Reviewer 3 is incorrect in suggesting that we are trying to prove asymmetry through the DEER data. That is a well-known fact in the literature (e.g. Vidilaseris et al, Sci Adv 2019) where we show (1) that the exit channel inhibitor ATC (i.e. close to S525R1) binds better in solution to the TmPPase:PPi complex than the TmPPase:PPi<sub>2</sub> complex, and (2) that ATC binds in an asymmetric fashion to the TmPPase:IDP<sub>2</sub> complex with just one ATC dimer on one of the exit channels. We merely use the DEER data to support this well-established fact.  

      However, because we agree that the DEER data in presence of IDP does not provide direct proof for asymmetry; particularly for the cytoplasmic facing mutant T211R1, we have refrained from interpreting T211R1 data beyond being a highly dynamic loop region (as evidenced by the broad distributions). As pointed out by the reviewer, the differences in distance distributions between conditions observed for T211R1 likely arise from conformational heterogeneity in solution. Furthermore, we now report DEER data on another new site (C599R1), which is also on the cytoplasmic side and yields high quality DEER data comparable to the S525R1 data (commended for their quality by both the reviewers). The C599R1 measurements show that in all conditions tested, highly similar distributions are observed, inconsistent with the in silico predicted distance distributions from the symmetric X-ray structures, but consistent with an asymmetric hybrid structure (i.e. open-closed) in solution. Importantly, the difference between the fully open (6.8 nm modal distance) and fully closed (4.8 nm modal distance) states of the C599R1 dimer is larger than for the S525R1 dimer pair. Thus, delineating the asymmetric hybrid conformation from the symmetric conformations is more robust.

      (3) 'Our new structures, together with DEER distance measurements that monitor the conformational ensemble equilibrium of TmPPase in solution, provide further solid experimental evidence of asymmetry in gating and transitional changes upon substrate/inhibitor binding.'  

      Problem: See above. The DEER data do not support any asymmetry. 

      We feel that the reviewer comments here are somewhat unfounded. All the DEER data (for 525R1 periplasmic and C599R1 cytoplasmic sites are described, most parsimoniously, using an asymmetric hybrid structure. In particular, the new C599R1 distance distributions are poorly described by the symmetric X-ray crystal structures, with a conserved modal distance of approx. 5.8 nm throughout the tested conditions that aligns nicely with the in silico predictions from the asymmetric hybrid structure. Additionally, all S525R1 and C599R1 data well exceed the relevant criteria of the recent white paper (Schiemann et al., 2021, JACS) from the EPR community to be considered reliably interpretable (strong visual oscillations in the raw traces; signal-to-noise ratio .r.t modulation depth of > 20 in all cases; replicates have been performed and added into the maintext or supplementary; near quantitative labelling efficiency (evidenced by lack of free spin label signal in the CW-EPR spectra); analysed using the CDA (now Figure EV10) to avoid confirmation bias).

      While the DEER data do not prove asymmetry, we do not claim proof of asymmetry in the above sentence. We concede to rephrase the offending sentence above as: “Our new structures, together with DEER distance measurements that monitor the conformational ensemble of TmPPase in solution, do not exclude asymmetry in gating and transitional changes upon substrate/inhibitor binding and are consistent with our proposed model.” We feel that this reframed conjecture of asymmetry is well founded; indeed, comparing all the 16 experimentally derived DEER distance distributions for the 525R1 and 599R1 sites with in-silico modelling performed on the hybridised asymmetric structure (i.e., comprised of one monomer bound to Ca2+ and another bound to IDP) yields overlap coefficients (Islam and Roux, JPC B, 2015) of >0.85. This implies the envelope of the modelled distance distribution is quantitatively inside the envelope of the experimental distance distributions. Thus, the DEER data support asymmetry (previously observed by time-resolved XRC) in solution, and while we appreciate that ideally one would measure time-resolved DEER to directly correlate kinetics of conformational changes within the ensemble to the catalytic cycle of mPPase, (and this is something we aim to do in the future), it is far beyond the scope of this study.

      Indeed, half-of-the-sites reactivity has been demonstrated in at least the following papers

      (Vidilaseris et al, Sci Acv. ,2019, Strauss et al, EMBO Rep. 2024, Malinen et al Prot Sci, 2022, Artukka et al Biochem J, 2018; Luoto et al, PNAS, 2013). Half-of-the sites activity requires asymmetry in the mechanism, and therefore asymmetric motions in the active site (viz 211) and exit channel (viz 525). As mentioned above, we have demonstrated this for other inhibitors (Vidilaseris et al 2019) and as part of a time-resolved experiment (Strauss et al 2024). In fact, given the wealth of evidence showing that the symmetrical crystal structures sample a non- or less-productive conformation of the protein, it would be quixotic to propose the DEER experiments - in solution - do not generate asymmetric conformations. It certainly doesn’t obey Occam’s razor of choosing the simplest possible explanation that covers the data.

      (4) Based on these observations, and the DEER data for +IDP, which is consistent with an asymmetric conformation of TmPPase being present in solution, we propose five distinct models of TmPPase (Figure 7).  

      Problem: Again, the DEER data do not support any asymmetry and the authors may revisit the proposed models. 

      We have redressed the proposed models and limited them to four asymmetric models to clearly illustrate the apo/+Ca/+Ca:ETD-state (model 1) and highlight the distinct binding patterns of various inhibitors (ETD, ZLD and IDP; model 2-4), which result in a variety of closed/open-open states. In this version, we clarify that the proposed models are not solely based on the DEER data but all DEER data recorded for multiple conditions, inhibitors and for two opposite membrane side facing reporters are highly consistent, and are grounded in both current and previously solved structures, with the DEER data providing additional consistency with these models.

      (5) 'In model 2 (Figure 7), one active site is semi-closed, while the other remains open. This is supported by the distance distributions for S525R1 and T211R1 for +Ca/ETD informed by DEER, which agrees with the in silico distance predictions generated by the asymmetric TmPPase:ETD X-ray structure'  

      Problem: Neither convincing nor supported by the data 

      We respectfully disagree with the reviewer. However, owing to the conformational heterogeneity of T211R1, we now exclude T211R1 data from quantitative interpretation of changes to the conformational ensemble. Instead, we include new DEER data from site C599R1, which provides high-quality and convincing data that is consistent with asymmetry at the cytoplasmic face, and inconsistent with in silico distance distributions derived from symmetric X-ray crystal structures. Furthermore, the S525R1 distance distributions for the +ETD (corresponding to +Ca/ETD) and +ZLD conditions were directly compared with both the apo-state distance distribution (corresponding to a fully open, symmetric conformation) and the in silico predicted distributions of the asymmetric hybrid structure (corresponding to an open-closed conformation). Overlap coefficients were calculated (given in the main text) that indicated the +ETD (corresponding to +Ca/ETD) and +ZLD S525R1 distributions were more consistent with the apo-state distance distribution. This suggests that while on the cytosolic face of the membrane, an open-closed conformation is favoured, on the periplasmic face, a symmetric open-open conformation is favoured.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):   

      (1) The DEER experiments were performed with the two crystallized inhibitors, ETD and ZLD, along with previously characterized IDP. It would increase the impact of a tighter-binding phosphonate was examined since the inhibitory mechanism of these molecules is of greater interest. 

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy with the added advantage of accurately analysing the data against structural models derived from X-ray crystallography. In the revised version, we also include results from alendronate and pamidronate, two of the tighter inhibitors, which show similar and consistent results to the others.

      (2) I'm not able to find the concentrations of ETD and ZLD used for the DEER experiments. This information should be added to the Methods section on sample prep for EPR. 

      The information is already mentioned in the Method section on sample preparation for EPR spectroscopy (page 24), where we indicated that the protein aliquots were incubated with a final concentration of 2 mM inhibitors or 10 mM CaCl2 (30 min, RT). However, we recognise that this may not have been sufficiently clear. To clarify, we now explicitly state that the concentration of ETD and ZLD (amongst other inhibitors) used for the DEER experiments is 2 mM.  

      (3) There should be additional detail about the electrometry replicates. Does "triplicate" mean three measurements on the same sensor, three different sensors, and different protein preparations? At a minimum, data should be collected from three different sensors to ensure that the negative results (lack of current) for ETD and ZLD are not due to a failed sensor prep. In addition, Data from the other replicates should be shown in a supplementary figure, either the traces, or in a summary figure. Are the traces shown collected on the same sensor? They could be, in principle, since the inhibitor is washed away after each perfusion. 

      Yes, by 'triplicate', we mean three measurements taken on the same sensor. All traces shown were collected from a single sensor. Thank you for your advice; we now show here additional data from other sensors that display the same pattern. As for the possibility of a failed sensor preparation, this is unlikely since we always ensure the sensor quality with the substrate (PPi) as a positive control after each measurement.

      Author response image 1.

      (4) I'm confused by the NEM modification assay, and I don't think there is enough information in this manuscript for a reader to figure out what is happening. Why is the protein active if an inhibitor is present? I understand that there is a conformational change in the presence of the inhibitor that buries a cysteine, but the inhibitor itself should diminish function, correct? Is the inhibitor removed before testing the function? In addition, it would be clearer if the cysteines that are modified are indicated in the main text. I don't understand what is being shown in Figure Ev2. Shouldn't the accessible cysteines in the apo form be shown? Finally, the sentence "IDP has been reported to prevent the NEM modification..." does not make sense to me. Should the word "by" be removed from this sentence? 

      We apologize for the confusion. Yes, the inhibitors were removed before testing the protein function. In Figure EV2, the accessible cysteines are shown for both the apo and IDP-bound states. As seen, the accessible cysteines in the IDP-bound states are fewer than those in the apo state, meaning fewer cysteines are available for modification. Consequently, more activity is retained when IDP binds due to the reduction in accessible cysteines. We have addressed this in the manuscript (see the method section on the NEM modification assay).

      (5) Why does the model in Figure 7 show the small molecules bound to only one subunit, when they are crystallized in both subunits? 

      We propose that the small molecules bound to the two subunits in the crystal structure is likely a result of substrate inhibition, given the excess inhibitor used during crystallisation (e.g. Artukka, et al., Biochemical Journal, 2018; Vidilaseris, et al., Science Advances, 2022). Our PELDOR data indicate that in solution, the small molecules bound to TmPPase are in an intermediate state between both subunits being closed and both being open, most likely with at least one subunit in an open state. This is also consistent with previous kinetic studies (Anashkin, V. A., et al., International Journal of Molecular Sciences, 22, 2021), which showed that the binding constant of IDP to the second subunit is around 120 times higher than that of the first subunit.

      (6) The authors argue that the two ETDs bound in the two protomers adopt distinct conformations. Can this be further supported, for example, by swapping the position of the two ETDs between the two protomers and calculating a difference map (there should be corresponding negative/positive density if the modelling of the two different conformations is robust)? 

      As per the reviewer suggestion, we swapped the positions of the two ETDs between the protomers and calculated the difference electron density map. This analysis, presented in Figure EV3, reveals corresponding negative and positive electron density peaks, indicating that the ETDs indeed adopt distinct conformations in each protomer, supporting the accuracy of our modeling.

      (7) Are the changes in loop conformation possibly due to crystal packing differences for the two protomers? 

      We examined the crystal packing of the two protomers and found no interactions at the loop regions (red coloured in Author response image 2 below) that could be attributed to crystal packing differences. Therefore, we rule out this possibility.

      Author response image 2.

      (8) Typos:  

      Legend for Figure EV2 cystine - cysteine  

      Page 14, last sentence of the first paragraph: further - further  

      Figure 6 legend: there is no reference to panel B.  

      Thanks for pointing out the typos, now they are fixed.

      Reviewer #2 (Recommendations for the authors):  

      (1) T211 is located on the same loop where ligand/inhibitor-coordinating side chains (E217, D218) are located. It has not been tested whether spin labeling here would affect inhibitor binding. 

      We test all the mutant(s) activity before spin labelling, but not the activity of the spin-labelled mutants. MTSSL spin labels are typically not structurally perturbing. In particular, the T211R1 site that the reviewer is referring to is now not included in our interpretation of conformational changes occurring during mPPase’s functional cycle.

      (2) Why should the spin label be introduced to T211, which is recognized as a flexible region in the crystal structure? Authors should search for suitable residues except for T211 and other residues in this loop to evaluate the cytoplasmic distance. 

      We acknowledge the reviewer’s concern regarding the flexibility of the T211 region for spin labelling. Given the challenges associated with TmPPase, including reduced protein expression, loss of function, or inaccessibility upon spin labelling at certain sites, we have explored alternative residues. After extensive testing, we identified C599 as a suitable site for spin labelling resulting in high-quality DEER data. The results from spin labelling at C599 have been incorporated into the revised manuscript.

      (3) On the other hand, DEER data for S525 is solid, as the authors stated. This residue is located on the luminal side of the enzyme. However, the description of the luminal side structure and the comparison of symmetric/asymmetric dimer in this par are missing in the paper. 

      We thank the viewer for their positive assessment of the S525R1 DEER data. The data for 525 and now also for 599 spin pairs are indeed solid given the strong visual oscillation we observed particularly in such a challenging system.   

      We presented the periplasmic sites in the crystal structure dimer (Figure 4A), highlighting both the symmetrical region and the asymmetric model in Figure 4. In the revised version, we include additional details about this region and our rationale for labeling at position S525.

      (4) The conclusion models (Figure 7) are misleading. In the crystal structure, the 5-6Loop distance between each monomer should be close given the location of the dimer interface, and the actual distance between T211 in the structure (for example, in 5lzq) is about 10A. Nevertheless, the model depicts this distance longer than S525 (40.7A in 5LZQ), which would give a false impression. 

      We would like to apologize for the misleading model. We have now corrected the models to ensure they are consistent with their respective regions in the crystal structures.

      (5) P8 last paragraph  

      It is hard to imagine that in a crystal lattice, the straight inhibitor always binds to monomer A, and the neighboring monomer is always attached to a slightly tilted inhibitor, which causes asymmetry. For example, wouldn't it mean that it would first bind to one of them, which would then affect the neighboring monomer via 5-6 Loop, which would then affect its binding pose? So in this case, the inhibitor did not ARAISE asymmetry, and this is where it is misleading for readers. 

      We apologize for the confusion. What we intended to convey is that the first inhibitor binds to one protomer, which then affects the conformation of the neighbouring monomer, ultimately influencing its binding pose. This is required for half-of-the-sites reactivity, which is well-established in this system. This is reflected in our crystal structure, where we observed asymmetry in the loop 5-6 region and the ETD orientation between the two protomers. We have addressed this in the manuscript accordingly.

      (6) P11 L4 EV10 instead of EV8? 

      Thanks for pointing out. We have corrected it accordingly.

      (7) P11 L5 It is difficult to determine whether the peak is broad or sharp. Should be evaluated quantitatively by showing the half-value width of the peak. This may also be helpful to judge whether the peak is a mixture of two components or a single one. 

      We have taken this analysis out and rephrased the offending sentence. We have also added the FWHM values as the Reviewer suggested, and corresponding standard deviations for the distance distributions (under approximation as Gaussian distribution).   

      (8) Throughout the paper, the topology of the enzyme may be difficult to follow for readers who are not experts in this field. Please indicate the membrane plane's location or a figure's viewpoint in the caption. 

      We acknowledge the importance of making our figures accessible to all readers. In the revised manuscript, we have enhanced the clarity of our figures by explicitly indicating the membrane plane’s location and specifying the viewpoint in each figure caption. For example, we have added annotations such as “Top view of the superposition of chain A (cyan) and chain B (wheat), showing the relative movements (black arrow) of helices. The membrane plane is indicated by dashed lines.”

      (9) Figure 2B Check the color of the helix.  

      IDP and ETD are almost the same color, so it is difficult to see the superposition. It would be easier to understand the reading by, for example, using a lighter or transparent color set only for IDPs.  

      We acknowledge the reviewer concern regarding the colour similarity between the IDP and ETD in Figure 2B, which hinders clear differentiation. To enhance visual distinction, we have adjusted the colour scheme by changing the TmPPase:IDP structure colour to light blue. This modification improves the clarity of the superposition, making the structural differences more discernible.

      (10) Figure 2C Check the coordination state (dotted line), there appears to be coordination between E217Cg and Mg. Also, water that is located near N492 appears to be a bit distant from Mg, why does this act as a ligand? Stereo view or view from different angles, and distance information would help the reader understand the bonding state in more detail.  

      Yes, we confirm that Mg<sup>2+</sup> is coordinated by the oxygen atoms from both the side chain and main chain of residue E217. The water molecule near N492 is not directly coordinated with Mg<sup>2+</sup> but interacts with the O5 atom of one of the phosphate groups in ETD. To enhance clarity, we have updated Figure 2C (and other related figures) to include stereo views.  

      (11) Figure 5A: in the Bottom view (lower left), the symmetric dimer does not look symmetric. Better to view from a 2-fold axis exactly.  

      We have taken this figure out entirely and instead add a direct comparison to the in silico predicted distribution from the asymmetric hybrid structure to all 16 experimental DEER distributions. We have added the symmetric and asymmetric structures to Fig. 4A and view the symmetric structure along the 2-fold axis, as suggested.   

      (12) Figure 5B: Indicate which data is plotted in the caption.  

      As mentioned above, we have taken this figure out, as we felt quantifying two overlapping populations from a single Gaussian was over-interpretation of the data, and at the suggestion of reviewer 3, we have tailored our interpretation here.  

      (13) Figure EV8:  

      Because the authors discuss a lot about their conclusive model based on this data, Figure EV8 should be treated as a main figure, not a supplement. However, this reviewer has serious concerns about the measurement in this figure. Because DEER for T211 is too noisy, I don't see the point in discussing this in detail. For example, in the Ca/ETD data, there is a peak near 50A, but it would be difficult for TM5 to move away from this distance unless the protein unfolds. I do not find it meaningful to discuss using measurement results in which such an impossible distance is detected as a peak.  

      A: Show top view as in Figure 5  

      D: 2nd row dotted line. Regarding the in silico model that is used as a reference to compare the distance information, the distance of 40-50 A for T211 in the Ca-bound form is hard to imagine. PDB 4av6 model shows that T211 is disordered and not visible, but given the position of the TM5 helix, it does not appear to be that different from the IDR binding structure (5LZQ, 10A between two T211). The structures of in silico models are not shown in the figure, as it is only mentioned as modeled in Rossetafold. Please indicate their structures, especially focused on the relative orientation of T211 and S525 in the dimer, which would allow readers to determine the distances.  

      We acknowledge the reviewer’s concerns regarding Figure EV8 and the DEER data for T211R1. Upon re-evaluation, we recognize that the non-oscillating nature of the DEER data for T211R1 leads to broad distributions, indicating increased conformational dynamics, which is expected for a highly dynamic loop. Consequently, we have limited the discussion and interpretation of T211R1 in the revised manuscript and focused more on C599R1.

      Reviewer #3 (Recommendations for the authors):  

      A careful interpretation of the data in view of these limitations and without directly linking to asymmetry could solve the problem of the over-interpretation of the DEER data.  

      We respectfully disagree with the reviewer. Please see our detailed response above.  

      Additional comments:  

      (1) Did the authors use a Cys-less construct for spin labeling and DEER experiments?  

      We utilized a nearly Cys-less construct in which all native cysteines were mutated to serine, except for Cys183, which was retained due to its buried location and functional importance. We then introduced single cysteine mutations for spin labelling. For C599, Ser599 was reverted to cysteine.

      (2) The time data for position T211R1 is too short for most cases (Figure EV8D) for a reliable distance determination. No confidence interval is given for the '+Ca' sample distance distributions.  

      We recorded longer time traces for two of the conditions to better assign the background. We did not use the 211R1 data to reach any conclusions regarding asymmetry, which were based on the 525R1 and the 599R1 data. We now simply include T211R1 data to indicate the high mobility observed at loop5-6. We have added the confidence interval for the +Ca condition.  

      (3) It is recommended to mention the 2+1 artefact obvious at the end of the DEER data. 

      In the methods section, we have mentioned that the “2+1” artefact present at the end of the S525R1, and T211R1 DEER data likely arises from using a 65 MHz offset, rather than an 80 MHz offset (as for the C599R1 data), which avoids significant overlap of the pump and detection pulses. We also mention in the methods section that owing to the intense “2+1” artefact, the decision was made to truncate the artefact away, to minimise the impact on data treatment. As for motivation to use the lower offset of 65 MHz, we did so to maximise the achievable signal-to-noise ratio (SNR), as particularly for the T211R1 data, the detected echo was quite weak. This was further exacerbated by the poor transverse relaxation time observed at that site.  

      (4) Please check the number of significant digits for all the reported values. 

      We have addressed the number of significant digits as requested.

      (5) Please report the mean distances from DEER experiments with the standard deviation or FWHM.

      We have addressed this in the revised manuscript, we report modal distances rather than the mean distances and provide the FWHM and standard deviation.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported. However, several aspects of the study require clarification and discussion:

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of this interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation. As we do not have yet experimental data to support this model, we have not included this model in the manuscript. Yet, we will update the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. The group of Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require us utilize some of the mice described in above reference, which is beyond the scope of this manuscript. We, however, will revise the discussion to elaborate on this potential mechanism.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, like the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed until we sacrificed the mice at P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing use to the reference showing the heterozygous mice show glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that the better understanding the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are unfortunately beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. Most of the criticisms raised by the reviewer will be easily addressed in the revised version of the manuscript. Yet, none of the critiques raised by the reviewer seems to directly impact the overall interpretation of the data.

      Reviewer #3 (Public Review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected before the two patients with the BICC1 p.Ser240Pro mutation passed away. To address this missing link, we have – as a first pass - generated HEK293T cells carrying the BICC1 p.Ser240Pro variant. While these admittingly are not kidney epithelial cells, they indeed show a reduced level of PC2 expression. These data are shown in the manuscript. We have not yet addressed how this relates to its crosstalk with miR-17.

      Conclusion:

      The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      (2) Role of BiCC1 in mir/PKD1/2 complex should be the next step in the quest for a modifiable therapeutic target.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement.

      We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.

      Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and movies are of excellent quality.

      The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      Weaknesses:

      There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei.

      We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding.

      Our responses to each are provided below. Necessary experiments are in progress.

      Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted.

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae.

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation.

      We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.

    1. Author response:

      (1) General Statements

      As you will see in our attached rebuttal to the reviewers, we have added several new experiments and revised manuscript to fully address their concerns.

      (2) Point-by-point description of the revisions

      Reviewer #1:

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Yang et al. describes a new CME accessory protein. CCDC32 has been previously suggested to interact with AP2 and in the present work the authors confirm this interaction and show that it is a bona fide CME regulator. In agreement with its interaction with AP2, CCDC32 recruitment to CCPs mirrors the accumulation of clathrin. Knockdown of CCDC32 reduces the amount of productive CCPs, suggestive of a stabilisation role in early clathrin assemblies. Immunoprecipitation experiments mapped the interaction of CCDC42 to the α-appendage of the AP2 complex α-subunit. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome disrupt the interaction of this protein to the AP2 complex. The manuscript is well written and the conclusions regarding the role of CCDC32 in CME are supported by good quality data. As detailed below, a few improvements/clarifications are needed to reinforce some of the conclusions, especially the ones regarding CFNDS.

      We thank the referee for their positive comments. In light of a recently published paper describing CCDC32 as a co-chaperone required for AP2 assembly (Wan et al., PNAS, 2024, see reviewer 2), we have added several additional experiments to address all concerns and consequently gained further insight into CCDC32-AP2 interactions and the important dual role of CCDC32 in regulating CME. 

      Major comments:

      (1) Why did the protein could just be visualized at CCPs after knockdown of the endogenous protein? This is highly unusual, especially on stable cell lines. Could this be that the tag is interfering with the expressed protein function rendering it incapable of outcompeting the endogenous? Does this points to a regulated recruitment?

      The reviewer is correct, this would be unusual; however, it is not the case. We misspoke in the text (although the figure legend was correct) these experiments were performed without siRNA knockdown and we can indeed detect eGFP-CCDC32 being recruited to CCPs in the presence of endogenous protein. Nonetheless, we repeated the experiment to be certain (see Author response image 1).  

      Author response image 1.

      Cohort-averaged fluorescence intensity traces of CCPs (marked with mRuby-CLCa) and CCP-enriched eGFPCCDC32(FL).

      (2) The disease mutation used in the paper does not correspond to the truncation found in patients. The authors use an 1-54 truncation, but the patients described in Harel et al. have frame shifts at the positions 19 (Thr19Tyrfs*12) and 64 (Glu64Glyfs*12), while the patient described in Abdalla et al. have the deletion of two introns, leading to a frameshift around amino acid 90. Moreover, to be precisely test the function of these disease mutations, one would need to add the extra amino acids generated by the frame shift. For example, as denoted in the mutation description in Harel et al., the frameshift at position 19 changes the Threonine 19 to a Tyrosine and ads a run of 12 extra amino acids (Thr19Tyrfs*12).

      The label of the disease mutant p.(Thr19Tyrfs12) and p.(Glu64Glyfs12) is based on a 194aa polypeptide version of CCDC32 initiated at a nonconventional start site that contains a 9 aa peptide (VRGSCLRFQ) upstream of the N-terminus we show. Thus, we are indeed using the appropriate mutation site (see: https://www.uniprot.org/uniprotkb/Q9BV29/entry). The reviewer is correct that we have not included the extra 12 aa in our construct; however as these residues are not present in the other CFNDS mutants, we think it unlikely that they contribute to the disease phenotype.  Rather, as neither of the clinically observed mutations contain the 78-98 aa sequence required for AP2 binding and CME function, we are confident that this defect contributed to the disease. Thus, we are including the data on the CCDC32(1-54) mutant, as we believe these results provide a valuable physiological context to our studies. 

      (3) The frameshift caused by the CFNDS mutations (especially the one studied) will likely lead to nonsense mediated RNA decay (NMD). The frameshift is well within the rules where NMD generally kicks in. Therefore, I am unsure about the functional insights of expressing a diseaserelated protein which is likely not present in patients.

      We thank the reviewer for bringing up this concern. However, as shown in new Figure S1, the mutant protein is expressed at comparable levels as the WT, suggesting that NMD is not occurring.

      (4) Coiled coils generally form stable dimers. The typically hydrophobic core of these structures is not suitable for transient interactions. This complicates the interpretation of the results regarding the role of this region as the place where the interaction to AP2 occurs. If the coiled coil holds a stable CCDC32 dimer, disrupting this dimer could reduce the affinity to AP2 (by reduced avidity) to the actual binding site. A construct with an orthogonal dimeriser or a pulldown of the delta78-98 protein with of the GST AP2a-AD could be a good way to sort this issue.

      We were unable to model a stable dimer (or other oligomer) of this protein with high confidence using Alphafold 3.0. Moreover, we were unable to detect endogenous CCDC32 coimmunoprecipitating with eGFP-CCDC32 (Fig. S6C). Thus, we believe that the moniker, based solely on the alpha-helical content of the protein is a misnomer.  We have explained this in the main text.

      Minor comments:

      (1) The authors interchangeably use the term "flat CCPs" and "flat clathrin lattices". While these are indeed related, flat clathrin lattices have been also used to refer to "clathrin plaques". To avoid confusion, I suggest sticking to the term "flat CCPs" to refer to the CCPs which are in their early stages of maturation.

      Agreed. Thank you for the suggestion. We have renamed these structures flat clathrin assemblies, as they do not acquire the curvature needed to classify them as pits, and do not grow to the size that would classify then as plaques. 

      Significance

      General assessment:

      CME drives the internalisation of hundreds of receptors and surface proteins in practically all tissues, making it an essential process for various physiological processes. This versatility comes at the cost of a large number of molecular players and regulators. To understand this complexity, unravelling all the components of this process is vital. The manuscript by Yang et al. gives an important contribution to this effort as it describes a new CME regulator, CCDC32, which acts directly at the main CME adaptor AP2. The link to disease is interesting, but the authors need to refine their experiments. The requirement for endogenous knockdown for recruitment of the tagged CCDC32 is unusual and requires further exploration.

      Advance:

      The increased frequency of abortive events presented by CCDC32 knockdown cells is very interesting, as it hints to an active mechanism that regulates the stabilisation and growth of clathrin coated pits. The exact way clathrin coated pits are stabilised is still an open question in the field.

      Audience:

      This is a basic research manuscript. However, given the essential role of CME in physiology and the growing number of CME players involved in disease, this manuscript can reach broader audiences.

      We thank the referee for recognizing the ‘interesting’ advances our studies have made and for considering these studies as ‘an important contribution’ to ‘an essential process for various physiological processes’ and able ‘to reach broader audiences’. We have addressed and reconciled the reviewer’s concerns in our revised manuscript. 

      Field of expertise of the reviewer:

      Clathrin mediated endocytosis, cell biology, microscopy, biochemistry.

      Reviewer #2:

      Evidence, reproducibility and clarity

      In this manuscript, the authors demonstrate that CCDC32 regulates clathrin-mediated endocytosis (CME). Some of the findings are consistent with a recent report by Wan et al. (2024 PNAS), such as the observation that CCDC32 depletion reduces transferrin uptake and diminishes the formation of clathrin-coated pits. The primary function of CCDC32 is to regulate AP2 assembly, and its depletion leads to AP2 degradation. However, this study did not examine AP2 expression levels. CCDC32 may bind to the appendage domain of AP2 alpha, but it also binds to the core domain of AP2 alpha.

      We thank the reviewer for drawing our attention to the Wan et al. paper, that appeared while this work was under review.  However, our in vivo data are not fully consistent with the report from Wan et al. The discrepancies reveal a dual function of CCDC32 in CME that was masked by complete knockout vs siRNA knockdown of the protein, and also likely affected by the position of the GFP-tag (C- vs N-terminal) on this small protein. Thus:

      -  Contrary to Wan et al., we do not detect any loss of AP2 expression (see new Figure S3A-B) upon siRNA knockdown. Most likely the ~40% residual CCDC32 present after siRNA knockdown is sufficient to fulfill its catalytic chaperone function but not its structural role in regulating CME beyond the AP2 assembly step.  

      - Contrary to Wan et al., we have shown that CCDC32 indeed interacts with intact AP2 complex (Figure S3C and 6B,C) showing that all 4 subunits of the AP2 complex co-IP with full length eGFP-CCDC32. Interestingly, whereas the full length CCDC32 pulls down the intact AP2 complex, co-IP of the ∆78-98 mutant retains its ability to pull down the β2-µ2 hemicomplex, its interactions with α:σ2 are severely reduced.  While this result is consistent with the report of Wan et al that CCDC32 binds to the α:σ2 hemi-complex, it also suggests that the interactions between CCDC32 and AP2 are more complex and will require further studies.

      - Contrary to Wan et al., we provide strong evidence that CCDC32 is recruited to CCPs. Interestingly, modeling with AlphaFold 3.0 identifies a highly probably interaction between alpha helices encoded by residues 66-91 on CCDC32 and residues 418-438 on α. The latter are masked by µ2-C in the closed confirmation of the AP2 core, but exposed in the open confirmation triggered by cargo binding, suggesting that CCDC32 might only bind to membrane-bound AP2.

      Thus, our findings are indeed novel and indicate striking multifunctional roles for CCDC32 in CME, making the protein well worth further study. 

      (1) Besides its role in AP2 assembly, CCDC32 may potentially have another function on the membrane. However, there is no direct evidence showing that CCDC32 associates with the plasma membrane.

      We disagree, our data clearly shows that CCDC32 is recruited to CCPs (Fig. 1B) and that CCPs that fail to recruit CCDC32 are short-lived and likely abortive (Fig. 1C). Wan et al. did not observe any colocalization of C-terminally tagged CCDC32 to CCPs, whereas we detect recruitment of our N-terminally tagged construct, which we also show is functional (Fig. 6F).  Further, we have demonstrated the importance of the C-terminal region of CCDC32 in membrane association (see new Fig. S7).  Thus, we speculate that a C-terminally tagged CCDC32 might not be fully functional. Indeed, SIM images of the C-terminally-tagged CCDC32 in Wan et al., show large (~100 nm) structures in the cytosol, which may reflect aggregation. 

      (2) CCDC32 binds to multiple regions on AP2, including the core domain. It is important to distinguish the functional roles of these different binding sites.

      We have localized the AP2-ear binding region to residues 78-99 and shown these to be critical for the functions we have identified. As described above we now include data that are complementary to those of Wan et al. However, our data also clearly points to additional binding modalities. We agree that it will be important and map these additional interactions and identify their functional roles, but this is beyond the scope of this paper.  

      (3) AP2 expression levels should be examined in CCDC32 depleted cells. If AP2 is gone, it is not surprising that clathrin-coated pits are defective.

      Agreed and we have confirmed this by western blotting (Figure S3A-B) and detect no reduction in levels of any of the AP2 subunits in CCDC32 siRNA knockdown cells. As stated above this could be due to residual CCDC32 present in the siRNA KD vs the CRISPR-mediated gene KO.

      (4) If the authors aim to establish a secondary function for CCDC32, they need to thoroughly discuss the known chaperone function of CCDC32 and consider whether and how CCDC32 regulates a downstream step in CME.

      Agreed. We have described the Wan et al paper, which came out while our manuscript was in review, in our Introduction.  As described above, there are areas of agreement and of discrepancies, which are thoroughly documented and discussed throughout the revised manuscript.  

      (5) The quality of Figure 1A is very low, making it difficult to assess the localization and quantify the data.

      The low signal:noise in Fig. 1A the reviewer is concerned about is due to a diffuse distribution of CCDC32 on the inner surface of the plasma membrane. We now, more explicitly describe this binding, which we believe reflects a specific interaction mediated by the C-terminus of CCDC32; thus the degree of diffuse membrane binding we observe follows: eGFP-CCDC32(FL)> eGFPCCDC32(∆78-98)>eGFP-CCDC32(1-54)~eGFP/background (see new Fig. S7). Importantly, the colocalization of CCDC32 at CCPs is confirmed by the dynamic imaging of CCPs (Fig 1B).

      (6) In Figure 6, why aren't AP2 mu and sigma subunits shown?

      Agreed. Not being aware of CCDC32’s possible dual role as a chaperone, we had assumed that the AP2 complex was intact.  We have now added this data in Figure 6 B,C and Fig. S3C, as discussed above. 

      Page 5, top, this sentence is confusing: "their surface area (~17 x 10 nm<sup>2</sup>) remains significantly less than that required for the average 100 nm diameter CCV (~3.2 x 103 nm<sup>2</sup>)."

      Thank you for the criticism. We have clarified the sentence and corrected a typo, which would definitely be confusing.  The section now reads,  “While the flat CCSs we detected in CCDC32 knockdown cells were significantly larger than in control cells (Fig. 4D, mean diameter of 147 nm vs. 127 nm, respectively), they are much smaller than typical long-lived flat clathrin lattices (d≥300 nm)(Grove et al., 2014). Indeed, the surface area of the flat CCSs that accumulate in CCDC32 KD cells (mean ~1.69 x 10<sup>4</sup> nm<sup>2</sup>) remains significantly less than the surface area of an average 100 nm diameter CCV (~3.14 x 10<sup>4</sup> nm<sup>2</sup>). Thus, we refer to these structures as ‘flat clathrin assemblies’ because they are neither curved ‘pits’ nor large ‘lattices’. Rather, the flat clathrin assemblies represent early, likely defective, intermediates in CCP formation.” 

      Significance

      Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step.

      Our responses above argue that we have indeed established that CCDC32 regulates AP2 beyond the assembly step. We have also identified several discrepancies between our findings and those reported by Wan et al., most notably binding between CCDC32 and mature AP2 complexes and the AP2-dependent recruitment of CCDC32 to CCPs.  It is possible that these discrepancies may be due to the position of the GFP tag (ours is N-terminal, theirs is C-terminal; we show that the N-terminal tagged CCDC32 rescues the knockdown phenotype, while Wan et al., do not provide evidence for functionality of the C-terminal construct). 

      Reviewer #3: 

      Evidence, reproducibility and clarity (Required): 

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, known to play a role in CFNDS, is also addressed in this study and shown to have endocytic defects.

      We thank the reviewer for their positive remarks regarding the quality of our data and the strength of our conclusions.  

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2, whereby the following major and minor points remain to be addressed: 

      - The authors show that CCDC32 depletion leads to the formation of brighter and static clathrin coated structures (Figure 2), but that these were only prevalent to 7.8% and masked the 'normal' dynamic CCPs. At the same time, the authors show that the absence of CCDC32 induces pits with shorter life times (Figure 1 and Figure 2), the 'majority' of the pits.

      Clarification is needed as to how the authors arrive at these conclusions and these numbers. The authors should also provide (and visualize) the corresponding statistics. The same statement is made again later on in the manuscript, where the authors explain their electron microscopy data. Was the number derived from there? 

      These points are critical to understanding CCDC32's role in endocytosis and is key to understanding the model presented in Figure 8. The numbers of how many pits accumulate in flat lattices versus normal endocytosis progression and the actual time scales could be included in this model and would make the figure much stronger. 

      Thank you for these comments.  We understand the paradox between the visual impression and the reality of our dynamic measurements. We have been visually misled by this in previous work (Chen et al., 2020), which emphasizes the importance of unbiased image analysis afforded to us through the well-documented cmeAnalysis pipeline, developed by us (Aguet et al., 2013) and now used by many others (e.g. (He et al., 2020)). 

      The % of static structures was not derived from electron microscopy data, but quantified using cmeAnalysis, which automatedly provides the lifetime distribution of CCPs. We have now clarified this in the manuscript and added a histogram (Fig. S4) quantifying the fraction of CCPs in lifetime cohorts  <20s, 21-60s, 61-100s, 101-150s and >150s (static). 

      - In relation to the above point, the statistics of Figure 2E-G and the analysis leading there should also be explained in more detail: For example, what are the individual points in the plot (also in Figures 6G and 7G)? The authors should also use a few phrases to explain software they use, for example DASC, in the main text. 

      Each point in these bar graphs represents a movie, where n≥12. These details have been added to the respective figure legend. We have also added a brief description of DASC analysis in the text. 

      -  There are several questions related to the knock-down experiments that need to be addressed:

      Firstly, knock-down of CCDC32 does not seem to be very strong (Figure S2B). Can the level of knock-down be quantified? 

      We have now quantified the KD efficiency. It is ~60%. This turns out to be fortuitous (see responses to reviewer 2), as a recent publication, which came out after we completed our study, has shown by CRISPR-mediated knockout, that CCD32 also plays an essential chaperone function required for AP2 assembly.  We do not see any reduction in AP2 levels or its complex formation under our conditions (see new Supplemental Figure S3), which suggests that the effects of CCDC32 on CCP dynamics are more sensitive to CCDC32 concentration than its roles as a chaperone. Our phenotypes would have been masked by more efficient depletion of CCDC32.  

      In page 6 it is indicated that the eGFP-CCDC32(1-54) and eGFP-CCDC32(∆78-98) constructs are siRNA-resistant. However in Fig S2B, these proteins do not show any signal in the western blot, so it is not clear if they are expressed or simply not detected by the antibody. The presence of these proteins after silencing endogenous CCDC32 needs to be confirmed to support Figures 6 and Figures 7, which critically rely on the presence of the CCDC32 mutants. 

      Unfortunately, the C-terminally truncated CCDC32 proteins are not detected because they lack the antibody epitope, indeed even the ∆78-98 deletion is poorly detected (compare the GFP blot in new S1A with the anti-CCDC32 blot in S1B).  However, these constructs contain the same siRNA-resistance mutation as the full length protein. That they are expressed and siRNA resistant can be seen in Fig. S2A (now Fig. S1A) blotting for GFP.

      In Figures 6 and 7, siRNA knock-down of CCDC32 is only indicated for sub-figures F to G. Is this really the case? If not, the authors should clarify. The siRNA knock-down in Figure 1 is also only mentioned in the text, not in the figure legend. The authors should pay attention to make their figure legends easy to understand and unambiguous. 

      No, it is not the case.  Thank you for pointing out the uncertainty. We have added these details to the Figure legends and checked all Figure legends to ensure that they clearly describe the data shown.  

      - It is not exactly clear how the curves in Figure 3C (lower panel) on the invagination depth were obtained. Can the authors clarify this a bit more? For example, what are kT and kE in Figure 3A? What is I0? And how did the authors derive the logarithmic function used to quantify the invagination depth? In the main text, the authors say that the traces were 'logarithmically transformed'. This is not a technical term. The authors should refer to the actual equation used in the figure. 

      This analysis was developed by the Kirchhausen lab (Saffarian and Kirchhausen, 2008). We have added these details and reference them in the Figure legend and in the text. We also now use the more accurate descriptor ‘log-transformed’.

      - In the discussion, the claim 'The resulting dysregulation of AP2 inhibits CME, which further results in the development of CFNDS.' is maybe a bit too strong of a statement. Firstly, because the authors show themselves that CME is perturbed, but by no means inhibited. Secondly, the molecular link to CFNDS remains unclear. Even though CCDC32 mutants seem to be responsible for CFNDS and one of the mutant has been shown in this study to have a defect in endocytosis and AP2 binding, a direct link between CCDC32's function in endocytosis and CFNDS remains elusive. The authors should thus provide a more balanced discussion on this topic. 

      We have modified and softened our conclusions, which now read that the phenotypes we see likely “contribute to” rather than “cause” the disease.

      - In Figure S1, the authors annotate the presence of a coiled-coil domain, which they also use later on in the manuscript to generate mutations. Could the authors specify (and cite) where and how this coiled-coil domain has been identified? Is this predicted helix indeed a coiled-coil domain, or just a helix, as indicated by the authors in the discussion?

      See response to Reviewer 1, point 4.  We have changed this wording to alpha-helix. The ‘coiled-coil’ reference is historical and unlikely a true reflection of CCDC32 structure. AlphaFold 3.0 predictions were unable to identify with certainly any coiled-coil structures, even if we modelled potential dimers or trimers; and we find no evidence of dimerization of CCDC32 in vivo. We have clarified this in the text.

      Minor comments

      - In general, a more detailed explanation of the microscopy techniques used and the information they report would be beneficial to provide access to the article also to non-expert readers in the field. This concerns particularly the analysis methods used, for example: 

      How were the cohort-averaged fluorescence intensity and lifetime traces obtained? 

      How do the tools cmeAnalysis and DASC work? A brief explanation would be helpful. 

      We have expanded Methods to add these details, and also described them in the main text. 

      - The axis label of Figure 2B is not quite clear. What does 'TfnR uptake % of surface bound' mean? Maybe the authors could explain this in more detail in the figure legend? Is the drop in uptake efficiency also accessible by visual inspection of the images? It would be interesting to see that. 

      This is a standard measure of CME efficiency. 'TfnR uptake % of surface bound' = Internalized TfnR/Surface bound TfnR. Again, images may be misleading as defects in CME lead to increased levels of TfnR on the cell surface, which in turn would result in more Tfn uptake even if the rate of CME is decreased.

      - Figure 4: How is the occupancy of CCPs in the plasma membrane measured? What are the criteria used to divide CCSs into Flat, Dome or Sphere categories? 

      We have expanded Methods to add these details. Based on the degree of invagination, the shapes of CCSs were classified as either: flat CCSs with no obvious invagination; dome-shaped CCSs that had a hemispherical or less invaginated shape with visible edges of the clathrin lattice; and spherical CCSs that had a round shape with the invisible edges of clathrin lattice in 2D projection images. In most cases, the shapes were obvious in 2D PREM images. In uncertain cases, the degree of CCS invagination was determined using images tilted at ±10–20 degrees. The area of CCSs were measured using ImageJ and used for the calculation of the CCS occupancy on the plasma membrane.

      - Figure 5B: Can the authors explain, where exactly the GFP was engineered into AP2 alpha? This construct does not seem to be explained in the methods section. 

      We have added this information. The construct, which corresponds to an insertion of GFP into the flexible hinge region of AP2, at aa649, was first described by (Mino et al., 2020) and shown to be fully functional.  This information has been added to the Methods section.

      - Figure S1B: The authors should indicate the colour code used for the structural model.

      We have expanded our structural modeling using AlphaFold 3.0 in light of the recent publication suggesting the CCDC32 interacts with the µ2 subunit and does not bind full length AP2. These results are described in the text. The color coding now reflects certainty values given by AlphaFold 3.0 (Fig. S6B, D). 

      - The list of primers referred to in the materials and methods section does not exist. There is a Table S1, but this contains different data. The actual Table S1 is not referenced in the main text. This should be done. 

      We apologize for this error. We have now added this information in Table S2.

      Significance (Required):

      In this study, the authors analyse a so-far poorly understood endocytic accessory protein, CCDC32, and its implication for endocytosis. The experimental tool set used, allowing to quantify CCP dynamics and invagination is clearly a strength of the article that allows assessing the impact of an accessory protein towards the endocytic uptake mechanism, which is normally very robust towards mutations. Only through this detailed analysis of endocytosis progression could the authors detect clear differences in the presence and absence of CCDC32 and its mutants. If the above points are successfully addressed, the study will provide very interesting and highly relevant work allowing a better understanding of the early phases in CME with implication for disease. 

      The study is thus of potential interest to an audience interested in CME, in disease and its molecular reasons, as well as for readers interested in intrinsically disordered proteins to a certain extent, claiming thus a relatively broad audience. The presented results may initiate further studies of the so-far poorly understood and less well known accessory protein CCDC32.

      We thank the reviewer for their positive comments on the significance of our findings and the importance of our detailed phenotypic analysis made possible by quantitative live cell microscopy. We also believe that our new structural modeling of CCDC32 and our findings of complex and extensive interactions with AP2 make the reviewers point regarding intrinsically disordered proteins even more interesting and relevant to a broad audience.  We trust that our revisions indeed address the reviewer’s concerns. 

      The field of expertise of the reviewer is structural biology, biochemistry and clathrin mediated endocytosis. Expertise in cell biology is rather superficial.

      References:

      Aguet, F., Costin N. Antonescu, M. Mettlen, Sandra L. Schmid, and G. Danuser. 2013. Advances in Analysis of Low Signal-to-Noise Images Link Dynamin and AP2 to the Functions of an Endocytic Checkpoint. Developmental Cell. 26:279-291.

      Chen, Z., R.E. Mino, M. Mettlen, P. Michaely, M. Bhave, D.K. Reed, and S.L. Schmid. 2020. Wbox2: A clathrin terminal domain–derived peptide inhibitor of clathrin-mediated endocytosis. Journal of Cell Biology. 219.

      Grove, J., D.J. Metcalf, A.E. Knight, S.T. Wavre-Shapton, T. Sun, E.D. Protonotarios, L.D. Griffin, J. Lippincott-Schwartz, and M. Marsh. 2014. Flat clathrin lattices: stable features of the plasma membrane. Mol Biol Cell. 25:3581-3594.

      He, K., E. Song, S. Upadhyayula, S. Dang, R. Gaudin, W. Skillern, K. Bu, B.R. Capraro, I. Rapoport, I. Kusters, M. Ma, and T. Kirchhausen. 2020. Dynamics of Auxilin 1 and GAK in clathrinmediated traffic. J Cell Biol. 219.

      Mino, R.E., Z. Chen, M. Mettlen, and S.L. Schmid. 2020. An internally eGFP-tagged α-adaptin is a fully functional and improved fiduciary marker for clathrin-coated pit dynamics. Traffic. 21:603-616.

      Saffarian, S., and T. Kirchhausen. 2008. Differential evanescence nanometry: live-cell fluorescence measurements with 10-nm axial resolution on the plasma membrane. Biophys J. 94:23332342.

    1. Author Response:

      We sincerely thank the reviewers and the editorial team for their thoughtful and constructive evaluation of our manuscript. We are very pleased that both reviewers and the Reviewing Editor found the work to be compelling and of interest to the community studying membrane-associated condensates. Below we outline our planned revisions in response to the public reviews.

      Reviewer #1

      We appreciate Reviewer #1’s positive evaluation of the study’s significance and the utility of our theoretical framework.

      1. Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient.

      Response: We acknowledge this limitation. While we agree that additional systems would strengthen the generality of our theory, we note that the focus of this work is to introduce and validate a theoretical framework. As the reviewer notes, this is sufficient for establishing the framework. Nonetheless, we are open to further collaborations or future studies to test the model with other systems.

      Reviewer #2

      We are grateful for Reviewer #2’s detailed comments and will address each of the points as follows:

      1. In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear.

      Response: We will revise the theory section to clearly distinguish previously established formulations from novel contributions.

      1. Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes.

      Response: We will expand the discussion to provide key physical justification, especially to explain why binding rate effects are/could be larger than the other fluxes.

      1. I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.

      Response: We will elaborate on the mechanism underlying this coupling.

      1. The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data.

      Response: We will clarify this comparison more explicitly and highlight how the non-dilute model captures key nonlinear behaviors and concentration-dependent adsorption phenomena that the dilute model fails to reproduce.

      1. Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.

      Response:  We appreciate the suggestion and agree that such modeling would be valuable. However, this is beyond the scope of the current study. We will add a discussion on how discrete simulations could be used to further test our theory in future work.

      1. Discussion of the caveats and limitations of the theory and modelling is missing from the text.

      Response:  We will add a paragraph outlining caveats and limitations of the modelling.

      We believe these changes will significantly improve the clarity and impact of our manuscript, and we thank the reviewers again for their valuable input.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive feedback. As the reviewers noted, dissecting the contributions of Gtr1/2 and Pib2 to TORC1 signaling across diverse nutrient states is a technically and conceptually challenging problem. Indeed, many of the issues raised—including the interpretation of non-canonical TORC1 readouts (e.g., Rps6, Par32), the influence of strain auxotrophy and media composition, and the limitations of phosphoproteomic analysis performed under a single growth condition—underscore the challenges of working with the TORC1 signaling system.

      In response to the reviewers’ comments, we have undertaken a broader and more systematic analysis of TORC1 regulation across defined nitrogen transitions, building directly on the signaling framework established in Figures 6 and 8 of this manuscript. This work, which includes expanded phosphoproteomic profiling and the use of refined genetic tools, supports and extends the key conclusions of Cecil et. al. Specifically, it reinforces the existence of a Pib2-dependent TORC1 output under nitrogen-limited conditions and further clarifies the physiological relevance of the intermediate TORC1 activity state. Due to the scope and depth of this expanded work, we are reporting those findings in a separate publication. Nonetheless, we view the data presented here as a key foundational step in establishing a non-redundant framework for Gtr1/2- and Pib2-dependent control of TORC1.

      We have therefore made minor changes to the manuscript to clarify our use of different growth media and to temper our conclusions where appropriate. These changes, together with the context of ongoing work, should reinforce the value of Cecil et. al. in advancing our understanding of TORC1 and nutrient signaling in eukaryotes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work Jeong and colleagues focus on exploring the role of the acyltransferase ZDHHC9 in myelinating OLs in particular in the palmitoylation of several myelin proteins. After confirming the specific enrichment of the Zdhhc9 transcript in mouse and human OLs, the authors examine the subcellular localization of the protein in vitro and observed that in comparison with other isoforms, ZDHHC9 localizes at OLs cell bodies and at discrete puncta in the processes. These observations (Figures 1 and 2) led the authors to hypothesize that ZDHHC9 plays an important role in myelination. No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3).

      However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      Maturation of OL in Zdhhc9 KO was examined by crossing Zdhhc9 KO with Pdgfra-CreER;R26- EGFP and following the newly EGFP-labelled OPCs following tamoxifen administration. No changes in the numbers of EGFP+ OL were detected. The authors concluded that the loss of ZDHHC9 does not alter oligodendrogenesis in either the young or mature CNS. The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice. However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How to reconcile this? 

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Reviewer #2 (Public Review):

      This study provides an in-depth exploration of the impact of X-linked ZDHHC9 gene mutations on cognitive deficits and epilepsy, with a particular focus on the expression and function of ZDHHC9 in myelin-forming oligodendrocytes (OLs). These findings offer crucial insights into understanding ZDHHC9-related X-linked intellectual disability (XLID) and shed light on the regulatory mechanisms of palmitoylation in myelination. The experimental design and analysis of results are convincing, providing a valuable reference for further research in this field. However, upon careful review, I believe the article still needs further improvement and supplementation in the following aspects:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      This is an important point but is technically challenging to address in vivo as it would likely require delivery of AAV to express ZDHHC9wt and XLID mutants specifically in OLs, preferably in the absence of endogenous ZDHHC9. We hope the reviewers would agree that this experiment is beyond the scope of the current study. However, we did compare the ability of ZDHHC9wt and XLID mutants to palmitoylate MBP, and to autopalmitoylate (sometimes used as a surrogate measure of PAT activity) in transfected heterologous cells. Although we recognize that this over-expression system is less physiological than a native OL, it has the benefit of being able to readily compare transfected wt vs mutant forms of ZDHHC9 with minimal contribution from endogenous ZDHHC9. Intriguingly, using this system, we found that autopalmitoylation activity of the XLID ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 (new panels 8E-G) to show these additional experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. anti-MBP splice form-specific antibodies that are compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool, we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

      In summary, it is recommended that the authors address the above issues through additional experiments and improved discussions to further strengthen the credibility and clinical relevance of the article.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3). However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), ***early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered***.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice.

      However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How do they reconcile these different findings?

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Page 7: "The OL processes in this culture condition correspond to large lipid-rich membranous sheets that form spiral membrane expansion on axons in vivo (49)." At which stage are authors referring to? OL processes are extended in culture before membrane formation and this is not clear here. In a 3-days differentiation culture, most OLs have not yet formed a myelin sheath (eg., Figure 2 in Zuchero et al., 2015, Dev Cell).

      We appreciate the reviewer highlighting this point. We first note that our oligodendrocyte (OL) culture conditions differ from the immunopanning method used by Zuchero et al., 2015 (original reference (Emery and Dugas, 2013)), which may affect the time course and progression of OL process elaboration and/or myelin sheath formation. We further note that in our cultures most EGFP+ processes are also MBP+ at the time point examined (strictly 3 days plus 9 hours post-differentiation). It thus seems likely that these MBP+ structures largely correspond to the MBP+ wrapping sheaths that occur in vivo, so we have therefore retained our original statement but have added this further explanation.

      Minor: Figure 6 (Legend): Time points should be indicated throughout the panels.

      We have added this information as requested

      Reviewer 2 Recommendations for the Authors:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      We thank the reviewer for raising this point. New data in our revised Figure 8 compares autopalmitoylation (sometimes used as a surrogate measure of PAT activity) of ZDHHC9wt and XLID mutants, and their ability to palmitoylate MBP in transfected cells. Intriguingly, we found that autopalmitoylation activity of the ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 to show these new experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. am anti-MBP splice form-specific antibody that is compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other

      Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      This manuscript determines how PA28g, a proteasome regulator that is overexpressed in tumors, and C1QBP, a mitochondrial protein for maintaining oxidative phosphorylation that plays a role in tumor progression, interact in tumor cells to promote their growth, migration and invasion. Evidence for the interaction and its impact on mitochondrial form and function was provided although it is not particularly strong.

      The revised manuscript corrected mislabeled data in figures and provides more details in figure legends. Misleading sentences and typos were corrected. However, key experiments that were suggested in previous reviews were not done, such as making point mutations to disrupt the protein interactions and assess the consequence on protein stability and function. Results from these experiments are critical to determine whether the major conclusions are fully supported by the data.

      The second revision of the manuscript included the proximity ligation data to support the PA28g-C1QBP interaction in cells. However, the method and data were not described in sufficient detail for readers to understand. The revision also includes the structural models of the PA28g-C1QBP complex predicted by AlphaFold. However, the method and data were not described with details for readers to understand how this structural modeling was done, what is the quality of the resulting models, and the physical nature of the protein-protein interaction such as what kind of the non-covalent interactions exist in the interface of the protein complexes. Furthermore, while the interactions mediated by the protein fragments were tested by pull-down experiments, the interactions mediated by the three residues were not tested by mutagenesis and pull-down experiments. In summary, the revision was improved, but further improvement is needed.

      Thank you very much for your comments.

      (1) Based on your suggestion, we predicted the possible interaction sites using AlphaFold 3 and found that mutations in amino acids 76 and 78 of C1QBP affect the interaction with PA28γ (Revised Appendix Figure 1J). Subsequently, pulldown experiment also found that after mutating the amino acids at the two aforementioned sites (T76A, G78N), C1QBP that could bind to PA28γ decreased (Revised Figure 1J). The above results confirm that PA28γ could interacts with C1QBP, in a manner dependent on the N-terminus of C1QBP. These findings are now included in the revised manuscript “In addition, we employed AlphaFold 3 to perform energy minimization and predict hydrogen bonds between the C1QBP N-terminus (amino acids 1-167) and the PA28γ protein interaction region. The results suggest that the T76 and G78 residues of C1QBP may be key contributors to the interaction. Consistently, coimmunoprecipitation analysis demonstrated that mutations at these sites (C1QBPT76A and C1QBPG78N) significantly reduced the binding ability to PA28γ (Fig. 1J and Appendix Fig. 1J)”, specifically in results section. We believe this additional validation strengthens the robustness of our findings.

      (2) According to your suggestion, we have added a description of the results of PLA in the figure legend (Revised Figure 1C) and the method of PLA in the appendix file (Revised Appendix file, Part “Proximity Ligation Assay”). The revised text reads as follows: (C) PLA image of UM1 cells shows the interaction between C1QBP and PA28γ in both cytoplasm and nucleus (red fluorescence).

      (3) In the light of your suggestion, we have enriched the description of AlphaFold 3 analysis in the appendix file (Revised Appendix file, Page 10-11). The revised text reads as follows:

      “Prediction and Analysis of Protein Interactions

      Protein Sequence Retrieval and Structure Prediction

      The protein sequences of C1QBP and PA28γ were obtained from the AlphaFold Protein Structure Database. Structural predictions of the protein-protein interaction between C1QBP and PA28γ were conducted using AlphaFold 3. The plDDT (predicted local distance difference test) values were utilized to assess the confidence of the predicted models. Models with a plDDT score above 70 were considered confident, while those with a score above 90 were categorized as very high confidence. These values were annotated in the figures to indicate the reliability of the structural predictions.”

      “Protein Preparation and Structure Optimization

      The best-scored model for the C1QBP-PA28γ interaction predicted by AlphaFold 3 was selected for further analysis. The model was imported into MOE 2022 (Molecular Operating Environment) software for protein preparation. This process included the removal of water molecules and other heteroatoms, followed by the addition of hydrogen atoms to the structure. This step was essential for optimizing the protein’s 3D conformation and ensuring the correctness of the protonation states at physiological pH.”

      “Energy Minimization and Hydrogen Bond Prediction

      The protein structure was subjected to energy minimization using the Amber10: EHT (Effective Hamiltonian Theory) force field, with R-field 1: 80 settings to refine the model’s geometry. The minimization process was performed to optimize the protein’s internal energy and ensure stable conformation, followed by calculation of hydrogen bond interactions. The interaction energies and hydrogen bonds were analyzed to identify potential binding sites and stabilize the predicted protein-protein complex.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The authors sought to examine the associations between child age, reports of parent-child relationship quality, and neural activity patterns while children (and also their parents) watched a movie clip. Major methodological strengths include the sample of 3-8 year-old children in China (rare in fMRI research for both age range and non-Western samples), use of a movie clip previously demonstrated to capture theory of mind constructs at the neural level, measurement of caregiver-child neural synchrony, and assessment of neural maturity. Results provide important new information about parent-child neural synchronization during this movie and associations with reports of parent-child relationship quality. The work is a notable advance in understanding the link between the caregiving context and the neural construction of theory of mind networks in the developing brain.

      We are grateful for the reviewer’s generous and thoughtful summary of our work. We particularly appreciate the recognition of the methodological strengths—including the rare developmental sample, culturally diverse context, and use of naturalistic, theory of mind-relevant stimuli—as well as the importance of integrating neural synchrony and relational variables. The reviewer’s comments affirm the core motivation behind this study: to advance our understanding of how the caregiving environment shapes the neurodevelopment of social cognition in early childhood. We have taken all specific suggestions seriously and hope the revised manuscript more clearly communicates these contributions.

      We appreciate that the authors wanted to show support for a mediational mechanism. However, we suggest that the authors drop the structural equation modeling because the data are cross-sectional so mediation is not appropriate. Other issues include the weak justification of including the parent-child neural synchronization as part of parenting.... it could just as easily be a mechanism of change or driven by the child rather than a component of parenting behavior. The paper would be strengthened by looking at associations between selected variables of interest that are MOST relevant to the imaging task in a regression type of model. Furthermore, the authors need to be more explicit about corrections for multiple comparisons throughout the manuscript; some of the associations are fairly weak so claims may need to be tempered if they don't survive correction.

      Thanks for feedback on the use of SEM in our study. We recognize the limitations of using SEM to infer mediation with cross-sectional data and acknowledge that longitudinal designs are better suited for such analyses. However, our goal was not to establish causality but to explore potential pathways linking parenting, personal traits, and Theory of Mind (ToM) behavior to social cognition outcomes. SEM allowed us to simultaneously examine the relationships among these latent constructs, providing a cohesive framework for understanding the interplay of these factors. That said, we understand your concern and are willing to revise the manuscript to de-emphasize causal interpretations of the SEM findings.

      We thank the reviewer for raising the corrections for multiple comparisons. We confirm that all correlation analyses reported in the manuscript have been corrected for multiple comparisons using the False Discovery Rate (FDR) procedure. In the revised manuscript, we now explicitly indicate FDR correction for all relevant p-values to ensure clarity and transparency. Where this information was previously missing, we have corrected the oversight and clearly labeled the results as FDR-corrected or uncorrected where appropriate. Additionally, we have carefully reviewed our interpretation of all reported associations. For any results that were close to the significance threshold, we have tempered our claims and now describe them as a marginally significant association to avoid overstating our findings.

      The corresponding changes have been made on Discussion section of the revised manuscript.

      Reverse correlation analysis is sensible given what prior developmental fMRI studies have done. But reverse correlation analysis may be more prone to overfitting and noise, and lacks sensitivity to multivariate patterns. Might inter-subject correlation be useful for *within* the child group? This would minimize noise and allow for non-linear patterns to emerge.

      We appreciate the reviewer’s thoughtful suggestion regarding potential limitations of reverse correlation analysis. While we agree that inter-subject correlation (ISC) within the child group may be useful in other contexts, our primary goal in using reverse correlation was not to identify temporally distributed or multivariate response patterns, but rather to isolate specific events within the naturalistic stimulus that reliably evoke Theory of Mind (ToM) and Social Pain-related responses in adults—who possess more stable and mature neural signatures. These adult-derived events serve as anchors for subsequent developmental comparisons and provide a principled way to define timepoints of interest that are behaviorally and theoretically meaningful.

      Using reverse correlation in adults allows us to identify canonical ToM and Social Pain events in a data-driven yet hypothesis-informed manner. We then examine how children’s neural responses to these same events vary with age, neural maturity, and dyadic synchrony. This approach is consistent with prior work in developmental social neuroscience (e.g., Richardson et al., 2018) and offers a valid framework for identifying interpretable social-cognitive events in naturalistic stimuli.

      We have now clarified the rationale for using adult-based reverse correlation in the revised manuscript and explicitly stated its advantages for identifying targeted ToM and Social Pain content in the stimulus.

      The corresponding changes have been made on pages 17 of the revised manuscript.

      “We employed reverse correlation analysis in adults to identify discrete events within the movie that elicited reliable neural responses across participants in ToM and SPM networks.

      The events of adults were chosen for this analysis due to the relative stability and maturity of their social brain responses, allowing for robust detection of canonical ToM and social pain-related moments. These events, once identified, served as stimulus-locked timepoints for subsequent analyses in the child cohort. This approach enables us to examine how children's responses to well-characterized, socially meaningful events vary with age and parent-child dyadic dynamics.”

      No learning effects or temporal lagged effects are tested in the current study, so the results do not support the authors' conclusions that the data speak to Bandura's social learning theory. The authors do mention theories of biobehavioral synchrony in the introduction but do not discuss this framework in the discussion (which is most directly relevant to the data). The data can also speak to other neurodevelopmental theories of development (e.g.,neuroconstructivist approaches), but the authors do not discuss them. The manuscript would benefit from significantly revising the framework to focus more on biobehavioral synchrony data and other neurodevelopmental approaches given the prior work done in this area rather than a social psychology framework that is not directly evaluated.

      We appreciate the reviewer’s thoughtful and constructive feedback. We agree that the current study does not directly test mechanisms central to Bandura’s social learning theory, such as observational learning over time or behavioral modeling. In light of this, we have significantly revised the theoretical framing of the manuscript to focus more directly on the biobehavioral synchrony framework, which more accurately reflects the dyadic neural measures employed in this study and is better supported by our findings.

      Specifically, we have expanded the Discussion to contextualize our findings in terms of biobehavioral synchrony, emphasizing how inter-subject neural synchronization may reflect coordinated parent-child engagement and emotional attunement. We have also incorporated insights from neurodevelopmental and neuroconstructivist models, acknowledging that social cognitive development is shaped by dynamic interactions between neural maturation and environmental input over time.

      Although we continue to briefly reference Bandura’s theory to situate our findings within broader social-cognitive frameworks, we have clearly delineated the boundaries of what our data can support and have tempered previous claims. These changes are intended to better align our conceptual framing with the empirical evidence and relevant theoretical models.

      The corresponding changes have been made on pages 11-12 of the revised manuscript.

      “Insights into mechanisms of Neuroconstructivist Perspectives and Bandura’s social learning theory

      Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions49. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture [1] . Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      Although our study was not designed to directly examine learning mechanisms such as imitation or reinforcement, the findings can be viewed as broadly consistent with social learning theory. Bandura's theory posits that human behavior is shaped by observational learning and modeling from others in one's environment [2-4]. According to Bandura, children acquire social cognitive skills by observing and interacting with their parents and other significant figures in their environment. This dynamic interplay shapes their ability to understand and predict the behavior of others, which is crucial for the development of ToM and other social competencies.”

      References

      (1) Hughes, C. et al. Origins of individual differences in theory of mind: From nature to nurture? Child development 76, 356-370 (2005).

      (2) Koole, S. L. & Tschacher, W. Synchrony in psychotherapy: A review and an integrative framework for the therapeutic alliance. Frontiers in psychology 7, 862 (2016).

      (3) Liu, D., Wellman, H. M., Tardif, T. & Sabbagh, M. A. Theory of mind development in Chinese children: a meta-analysis of false-belief understanding across cultures and languages. Developmental Psychology 44, 523 (2008).

      (4) Frith, U. & Frith, C. D. Development and neurophysiology of mentalizing. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 358, 459-473 (2003).

      The significance and impact of the findings would be clearer if the authors more clearly situated the findings in the context of (a) other movie and theory of mind fMRI task data during development; and (b) existing data on parent-child neural synchrony (often uses fNIRS or EEG). What principles of brain and social cognition development do these data speak to? What is new?

      We thank the reviewer for this thoughtful comment. In response, we have revised the Discussion section to more clearly situate our findings within two key literatures: (a) fMRI studies examining Theory of Mind using movie-based and traditional task paradigms across development, and (b) research on parent-child neural synchrony. We now articulate more explicitly how our findings advance current understanding of the neural architecture of social cognition in childhood, and how they contribute new insights into the relational processes shaping brain function. These revisions clarify the conceptual and empirical novelty of our study, particularly in its use of naturalistic fMRI, simultaneous child-parent dyads, and integration of neural maturity with interpersonal synchrony.

      The corresponding changes have been made on pages 12 of the revised manuscript.

      “Our findings contribute to and extend prior research using fMRI paradigms to investigate ToM development in children.  Previous work has shown that these networks become increasingly specialized and differentiated throughout childhood [1-3]. The current study extends these findings by demonstrating that the development of social brain networks is a gradual process that continues beyond the preschool years and is related to children's chronological age. This finding is consistent with behavioral research indicating that ToM and social abilities continue to develop and refine throughout middle childhood and adolescence [4]. Importantly, we move beyond prior work by combining reverse correlation with naturalistic stimuli to isolate discrete, behaviorally meaningful events (e.g., mental state attribution, social rejection) and relate children’s brain responses to adult patterns and social outcomes. This event-level analysis in a dyadic context offers greater ecological and interpretive precision than traditional block or condition-based designs. Our study provides novel evidence for the neural underpinnings of this protracted development, suggesting that the functional maturation of social brain networks may support the continued acquisition and refinement of social cognitive skills.

      In parallel, our study builds on and extends a growing body of work on parent-child neural synchrony, much of which has relied on fNIRS or EEG hyperscanning to demonstrate interpersonal alignment during communication, shared attention, or cooperative tasks [5-7]. While these modalities offer fine temporal resolution, they are limited in spatial precision and typically focus on surface-level cortical regions such as the prefrontal cortex. By contrast, our naturalistic fMRI approach enables the examination of deep and distributed brain networks—specifically those supporting social cognition—within child-parent dyads during emotionally and cognitively rich scenarios. Intriguingly, we found that neural synchronization during movie viewing was higher in child-mother dyads compared to child-stranger dyads.”

      Reference

      (1) Jacoby, N., Bruneau, E., Koster-Hale, J. & Saxe, R. Localizing Pain Matrix and Theory of Mind networks with both verbal and non-verbal stimuli. Neuroimage 126, 39-48 (2016).

      Astington, J. W. & Jenkins, J. M. A longitudinal study of the relation between language and theory-of-mind development. Developmental Psychology 35, 1311 (1999).

      (2) Carter, E. J. & Pelphrey, K. A. School-aged children exhibit domain-specific responses to biological motion. Social Neuroscience 1, 396-411 (2006).

      (3) Cantlon, J. F., Pinel, P., Dehaene, S. & Pelphrey, K. A. Cortical representations of symbols, objects, and faces are pruned back during early childhood. Cerebral Cortex 21, 191-199 (2011).

      (4) Im-Bolter, N., Agostino, A. & Owens-Jaffray, K. Theory of mind in middle childhood and early adolescence: Different from before? Journal of experimental child psychology 149, 98-115 (2016).

      (5) Deng, X. et al. Parental involvement affects parent-adolescents brain-to-brain synchrony when experiencing different emotions together: an EEG-based hyperscanning study. Behavioural brain research 458, 114734 (2024).

      (6) Miller, J. G. et al. Inter-brain synchrony in mother-child dyads during cooperation: an fNIRS hyperscanning study. Neuropsychologia 124, 117-124 (2019).

      (7) Nguyen, T., Bánki, A., Markova, G. & Hoehl, S. Studying parent-child interaction with hyperscanning. Progress in brain research 254, 1-24 (2020).

      There is little discussion about the study limitations, considerations about the generalizability of the findings, and important next steps and future directions. What can the data tell us, and what can it NOT tell us?

      We appreciate the reviewer’s recommendation to elaborate on the study’s limitations, generalizability, and future directions. In response, we have added a dedicated section to the Discussion that critically addresses these considerations. We acknowledge the cross-sectional nature of the study, the modest sample size, and the use of a single stimulus context as key limitations. We also clarify the inferences that can be drawn from our data and what remains speculative. Finally, we outline specific future research directions.

      The corresponding changes have been made on pages 13-14 of the revised manuscript.

      “While leveraging a naturalistic movie-viewing paradigm allowed us to study children's spontaneous neural responses during a semi-structured yet engaging task, dedicated experimental designs are still needed to make stronger inferences about the cognitive processes involved. Additionally, our region-of-interest approach precluded examination of whole-brain networks; future work could explore developmental changes in broader functional circuits. The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa. Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology. Larger and more diverse samples would enhance the generalizability of the findings.

      Several future directions emerge from this research. First, combining naturalistic neuroimaging with structured cognitive tasks could elucidate the specific mental processes underlying children's neural responses during movie viewing. Examining how these processes relate to real-world social behavior would further bridge neurocognitive function and ecological validity. Longitudinal studies beginning in infancy could chart the developmental trajectories of parent-child neural synchrony and their impact on long-term social outcomes. Such work could also explore sensitive periods when parenting may be most influential on social brain maturation. Finally, expanding this multimodal approach to clinical populations like autism could yield insights into atypical social cognitive development and inform tailored intervention strategies targeting parent-child relationships and neural plasticity.”

      To evaluate associations between child neural activity patterns during the movie AND parent-child synchronization patterns AND other variables such as parent-child communication and theory of mind behavior, it seems like a robust approach could be to examine whether similar synchronization patterns are associated with similar scores on different variables. Would allow for non-linear and multivariate associations.

      We greatly appreciate the reviewer’s thoughtful suggestion regarding the use of similarity-based or multivariate analyses to assess whether dyads with similar neural synchronization profiles also exhibit similar scores on behavioral or relational variables. We agree that this type of analysis—such as representational similarity analysis (RSA) or inter-subject pattern similarity—offers a powerful framework for capturing non-linear and multivariate associations, and could provide deeper insights into shared neurobehavioral patterns across participants. However, the analytic logic of similarity-based approaches typically requires the availability of comparable measures across individuals or dyads (e.g., child A and child B must both have measures of brain activity, behavior, and environment). In the present study, our focus was on the child as the behavioral and developmental target, and we did not collect parallel behavioral or cognitive variables from the parent side (e.g., adult Theory of Mind ability, emotional traits, parenting style questionnaires beyond dyadic reports). As a result, it was not feasible to construct pairwise similarity matrices across dyads that include both neural synchrony and matched behavioral dimensions from both individuals.

      Instead, our study was designed to examine how child-level outcomes (e.g., Theory of Mind performance, social functioning) are associated with (a) the child’s neural responses to specific social events, and (b) the degree of neural synchronization with their mother, as a marker of relational engagement. The analytical emphasis, therefore, remained on within-child variation, modulated by the quality of the parent-child interaction.

      Were there associations between parent-child neural synchronization and child age? What was the association between neural maturity and parent-child neural synchronization

      We thank the reviewer for raising this important point regarding associations between parent-child neural synchronization (ISS), child age, and neural maturity.

      As reported in the original manuscript, we did not observe significant correlations between parent-child ISS and child age for either the Theory of Mind (ToM) or Social Pain Matrix (SPM) networks (all ps > 0.1). Additionally, we conducted additional analysis, we found no significant correlations between ISS and neural maturity (Author response image 1, r = 0.2503, p = 0.1533).

      These findings indicate that parent-child neural synchronization in this naturalistic viewing context is not simply explained by age-related maturation or children's neural maturity level. Instead, ISS may predominantly reflect real-time interpersonal engagement or relational dynamics rather than individual developmental trajectories or neural maturity.

      Author response image 1.

      Scatterplot showing the association between parent-child inter-subject synchronization (ISS) and neural maturity, averaged across the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Each point represents one dyad. No significant correlation was observed between ISS and neural maturity (r = 0.2503, p = 0.1533, suggesting that interpersonal neural synchronization and individual neural maturation may reflect dissociable aspects of social brain development.

      The rationale for splitting the ages into 3 groups is unclear and creates small groups that could be more prone to spurious associations. Why not look at age continuously?

      We thank the reviewer for raising this important point. We fully agree that analyzing age as a continuous variable is statistically more robust and minimizes concerns about spurious associations due to arbitrary groupings.

      To clarify, all primary statistical models—including correlational analyses—treated age as a continuous variable, and our core developmental inferences are based on these continuous-age findings.

      In addition to these analyses, we included age group comparisons as a supplementary approach, guided by both theoretical considerations and visual inspection of the data. Specifically, we aimed to explore whether functional differentiation between social brain networks (e.g., ToM and SPM) might begin to emerge non-linearly or earlier than expected, particularly in the youngest children. Such early neural divergence may not be well-captured by linear trends alone. The grouped analysis allowed us to illustrate that network differentiation was already observable in children under age 5, suggesting that certain aspects of social brain organization may emerge earlier than classically assumed.

      We have now clarified this rationale in the revised manuscript and emphasized that the group-based analysis was used solely to highlight developmental shifts that may not follow a linear pattern, and not for formal hypothesis testing.

      The corresponding changes have been made on pages 9 of the revised manuscript.

      “While our primary analyses treated age as a continuous variable, we also performed exploratory group-based comparisons to probe for potential non-linear developmental shifts in social brain network organization. This approach revealed that the differentiation between ToM and SPM networks was already present in the youngest group (ages 3–4), suggesting that early neural specialization may begin prior to the age at which ToM behavior is reliably observed. These group-level observations provide complementary evidence to the continuous analyses and may inform future work examining sensitive periods or early markers of social brain development.”

      Tables would be improved if they were more professionally formatted (e.g., names of the variables rather than variable abbreviation codes).

      We appreciate the reviewer’s suggestion to improve the clarity and professionalism of our tables. In the revised manuscript, we have reformatted all tables to include full variable names rather than abbreviations or coded labels, and we ensured consistency in terminology across the manuscript text, tables, and figure legends. We have also added explanatory footnotes where needed to clarify any derived or composite measures. We hope these revisions improve the accessibility and readability of the results for a broader audience

      Reviewer #2:

      Summary:

      This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.

      Strengths:

      This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM).

      We appreciate the positive evaluation and valuable comments of the reviewer. According to the reviewer`s comments, we have revised the manuscript thoroughly to address the concerns raised by the reviewer. A point-by-point response to each of the issues raised by the reviewer has been made. We believe that the revision of our manuscript has now been significantly improved.

      Upon reviewing the introduction, I feel that the first goal - developmental changes of the social brain and its relation to age - seems somewhat distinct from the other two goals and the main research question of the manuscript. The authors might consider revising this section to enhance the overall coherence of the manuscript. Additionally, the introduction lacks a clear background and rationale for the importance of examining age-related changes in the social brain.

      We thank the reviewer for this thoughtful observation. In response, we have revised the Introduction to better integrate the developmental aspect of the social brain with the broader research aims. We now explicitly link age-related changes in social brain organization to the emergence of social cognitive abilities and highlight why early childhood (ages 3–8) represents a particularly formative period. This revision clarifies that our first aim—examining functional specialization and neural maturity in Theory of Mind (ToM) and Social Pain Matrix (SPM) networks—serves as a developmental foundation for understanding how dyadic influences, such as neural synchrony and caregiving quality, shape children’s social cognition.

      We have also improved the rationale for examining age-related change, drawing on key literature in developmental neuroscience to show how the early emergence and specialization of social brain networks provide a necessary context for interpreting interpersonal neural dynamics.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “These findings suggest that the development of specialized brain regions for reasoning about others' mental states and physical sensations is a gradual process that continues throughout childhood.

      Understanding how these networks differentiate with age is essential not only for mapping typical brain development, but also for contextualizing the role of environmental influences. By establishing normative patterns of neural maturity and differentiation, we can better interpret how relational experiences—such as caregiver-child synchrony and parenting quality—modulate these trajectories. Thus, our first goal provides a developmental anchor that grounds our investigation of interpersonal and environmental contributions to social brain function.”

      The manuscript uses both "mother-child" and "parent-child" terminology. Does this imply that only mothers participated in the fMRI scans while fathers completed the questionnaires? If so, have the authors considered the potential impact of parental roles (father vs. mother)?

      We thank the reviewer for raising this important point regarding terminology and parental roles. To clarify, all participating caregivers in the current study were biological mothers, and all behavioral questionnaires were also completed by these same mothers. No fathers were included in this study. We have revised the manuscript throughout to consistently use the term “mother-child” when referring to the specific dyads in our sample.

      We also appreciate the opportunity to elaborate on the rationale for including only mothers. Prior research has shown that maternal and paternal influences on child development are not interchangeable, and that the neural correlates of caregiving behaviors differ between mothers and fathers. For example, studies have demonstrated distinct patterns of brain activation during social and emotional processing in mothers versus fathers (Abraham et al., 2014; JE Swain et al., 2014). Given these differences, we deliberately focused on mother-child dyads to maintain neurobiological consistency in our analysis and reduce variance associated with heterogeneous caregiving roles. We now clarify this rationale in the revised Methods and Discussion sections.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “We chose to focus exclusively on mother-child dyads in this study based on prior evidence suggesting distinct neural and behavioral caregiving profiles between mothers and fathers [1-2], allowing us to maintain role consistency and reduce variability in dyadic interactions.

      Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology [1]. Larger and more diverse samples would enhance the generalizability of the findings.”

      Reference:

      (1) Swain, J. E. et al. Approaching the biology of human parental attachment: Brain imaging, oxytocin and coordinated assessments of mothers and fathers. Brain research 1580, 78-101 (2014).

      (2) Abraham, E. et al. Father's brain is sensitive to childcare experiences. Proceedings of the National Academy of Sciences 111, 9792-9797 (2014).

      There is inconsistent usage of the terms ISC and ISS in the text and figures, both of which appear to refer to synchronization derived from correlation analysis. It would be beneficial to maintain consistency throughout the manuscript.

      We thank the reviewer for highlighting the inconsistent use of “ISC” and “ISS” in the original manuscript. We agree that clarity and consistency in terminology are essential. In response, we have revised the manuscript to consistently use “ISS” (inter-subject synchronization) throughout the text, figures, tables, and legends.

      Of the 50 dyads, 16 were excluded due to data quality issues, which constitutes a significant proportion. It would be helpful to know whether these excluded dyads exhibited any distinctive characteristics. Providing information on demographic or behavioral differences-such as Theory of Mind (ToM) performance and age range between the excluded and included dyads would enhance the assessment of the findings' generalizability.

      We thank the reviewer for this important observation. We agree that understanding the characteristics of excluded participants is essential for assessing the generalizability of the findings.

      In response, we conducted comparative analyses between included and excluded dyads (N = 34 included; N = 16 excluded) on key demographic and behavioral variables, including child age, gender, and Theory of Mind (ToM) performance. These analyses revealed no significant differences between groups on any of these measures (ps > 0.1), suggesting that data exclusion due to quality issues (e.g., excessive motion, incomplete scans) did not introduce systematic bias.

      We have now added this information to the Results and Methods sections of the manuscript.

      The corresponding changes have been made on pages 6 and 17 of the revised manuscript.

      “Of the 50 initial mother-child dyads recruited, 16 were excluded due to excessive head motion (n = 11), incomplete scan sessions (n = 3), or technical issues during data acquisition (n = 2). The final sample consisted of 34 dyads. To assess potential bias introduced by data exclusion, we compared included and excluded dyads on child age, gender, and Theory of Mind performance. No significant differences were found across these variables (all ps > 0.1), suggesting that the analytic sample was demographically representative of the full cohort.

      Comparison between included and excluded dyads revealed no significant differences in child age (t = 1.23, p = 0.24), ToM scores (t = -0.54, p = 0.59), or sex distribution (χ² < 0.01, p = 0.98), indicating that data exclusion did not bias the sample in a systematic way.”

      The article does not adhere to the standard practice of using a resting state as a baseline for subtracting from task synchronization. Is there a rationale for this approach? Not controlling for a baseline may lead to issues, such as whether resting state synchronization already differs between subjects with varying characteristics.

      We thank the reviewer for raising this important methodological point. We agree that controlling for baseline synchronization, such as using a resting-state scan as a comparison, can help disambiguate whether task-induced synchrony reflects genuine stimulus-driven coupling or baseline differences across individuals or dyads.

      In the present study, we focused on inter-subject synchronization (ISS) during naturalistic movie viewing, a task condition that has been widely used in previous developmental and social neuroscience research to assess shared neural engagement. We did not include a resting-state scan in the current protocol due to time constraints and the young age of our participants (ages 3–8), as longer scanning sessions often result in increased motion and reduced data quality in pediatric populations. Moreover, many prior studies using ISS in naturalistic paradigms have similarly focused on task-driven synchrony without subtracting a resting baseline (e.g., Hasson et al., 2004; Nguyen et al., 2020; Reindl et al., 2018).

      That said, we acknowledge that baseline neural synchrony across dyads may vary depending on individual or relational characteristics (e.g., temperament, arousal, attentional style), and this remains an important question for future research. In the revised Discussion, we now explicitly note the absence of a resting-state baseline as a limitation and highlight the need for future studies to examine how resting and task-based ISS may interact, particularly in the context of child-caregiver dyads.

      The corresponding changes have been made on page 13 of the revised manuscript.

      “Another limitation of the current design is the lack of a resting-state baseline for inter-subject synchronization. While our focus was on synchronization during naturalistic social processing, we cannot determine whether individual differences in ISS reflect purely task-induced coupling or are partially shaped by trait-level synchrony present at rest. Including both resting and task conditions in future work would allow for stronger inferences about stimulus-specific versus baseline-driven synchronization, especially in relation to interpersonal factors such as relationship quality or social responsiveness.”

      The title of the manuscript suggests a direct influence of mother-child interactions on children's social brain and theory of mind. However, the use of structural equation modeling (SEM) may not fully establish causal relationships. It is possible that the development of children's social brain and ToM also enhances mother-child neural synchronization. The authors should address this alternative hypothesis of the potential bidirectional relationship in the discussion and exercise caution regarding terms that imply causality in the title and throughout the manuscript.

      We appreciate the reviewer’s careful attention to issues of causality in our manuscript. We agree that our cross-sectional design limits causal inference, and that the use of structural equation modeling (SEM) in this context does not allow for conclusions about directional or mechanistic pathways. In response, we have revised the Discussion to explicitly acknowledge these limitations, and now include an expanded section on the potential for bidirectional or co-constructed processes, consistent with neuroconstructivist frameworks.

      We have also tempered the interpretation of our SEM findings, avoiding causal language throughout the manuscript and clarifying that our analyses are exploratory and associational in nature. We hope that these changes provide a more cautious and developmentally grounded interpretation of the data.

      With regard to the title, we respectfully chose to retain the original wording, as we believe it captures the thematic focus and central research question of the paper—namely, the potential role of mother-child interaction in the development of children’s social brain and Theory of Mind. While we understand the reviewer’s concern, we note that the interpretation of this phrasing is contextualized within the manuscript, which now includes clear qualifications regarding the limits of causal inference. We have taken care to ensure that no claims of unidirectional causality are made in the body of the paper.

      The corresponding changes have been made on pages 11- 12 of the revised manuscript.

      “Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions54. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture. Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa.”

      I would appreciate more details about the 14 Theory of Mind (ToM) tasks, which could be included in supplemental materials. The authors score them on a scale from 0 to 14 (each task 1 point); however, the tasks likely vary in difficulty and should carry different weights in the total score (for example, the test and the control questions should have different weights). Many studies have utilized the seven tasks according to Wellman and Liu (2004), categorizing them into "basic ToM" and "advanced ToM." Different components of ToM could influence the findings of the current study, which should be further examined by a more in-depth analysis.

      We thank the reviewer for raising this important point regarding the structure and scoring of the Theory of Mind (ToM) tasks. We will provide a detailed description of all 14 tasks in the Supplemental Materials, including their content, targeted mental state concepts (e.g., beliefs, desires, intentions), and design features (e.g., test/control items, task format).

      We fully agree that ToM tasks differ in complexity, and in principle, a weighted or component-based scoring approach (e.g., distinguishing basic and advanced ToM) could offer greater interpretive value. However, in our study design, tasks were administered in a fixed sequence from lower to higher difficulty, and testing was terminated if the child was unable to successfully complete three consecutive tasks. This approach is developmentally appropriate for younger children but results in non-random missingness for more advanced tasks—particularly among children at the lower end of the age range (3–4 years).

      Given this adaptive task structure, re-scoring using weighted or subscale-based approaches would introduce systematic bias, as children who struggled with early items were not administered more complex ones. As a result, a full breakdown by task type (e.g., basic vs. advanced ToM) would only reflect a restricted subsample and would not be comparable across the full cohort. For this reason, we retained the unit-weighted total ToM score as the most developmentally valid and comparable metric across participants.

      Reviewer #3:

      Summary:

      The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.

      Strengths:

      This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions. However, I have some concerns regarding the analysis and interpretation of the findings. I have outlined these concerns below in the order they appear in the manuscript, which I hope will be helpful for the revision.

      We appreciate the reviewer’s thoughtful and constructive summary of the manuscript. The concerns raised regarding aspects of the analysis and interpretation have been carefully considered. Detailed point-by-point responses are provided below, along with descriptions of the corresponding revisions made to improve the clarity, precision, and interpretive caution of the manuscript.

      Given the importance of social cognition in this study, please cite a foundational empirical or review paper on social cognition to support its definition. The current first citation is primarily related to ASD research, which may not fully capture the broader context of social cognition development.

      We thank the reviewer for this helpful suggestion. We agree that a broader, foundational reference is more appropriate for introducing the concept of social cognition. In response, we have revised the Introduction to include a widely cited theoretical or review paper on social cognition to provide a more general developmental context.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “Social cognition, defined as the ability to interpret and predict others' behavior based on their beliefs and intentions and to interact in complex social environments and relationships is a crucial aspect of human development [1-2]”

      (1) Adolphs, R. The social brain: neural basis of social knowledge. Annual review of psychology 60, 693-716 (2009).

      (2) Frith, C. D. & Frith, U. Mechanisms of social cognition. Annual review of psychology 63, 287-313 (2012).

      It is standard practice to report the final sample size in the Abstract and Introduction, rather than the initial recruited sample, as high attrition rates are common in pediatric studies. For example, this study recruited 50 mother-child dyads, and only 34 remained after quality control. This information is crucial for interpreting the results and conclusions. I recommend reporting the final sample size in the abstract and introduction but specifying in the Methods that an additional 16 mother-child dyads were initially recruited or that 50 dyads were originally collected.

      We thank the reviewer for this helpful recommendation. In the original version of the manuscript, the Abstract and Introduction referenced the total number of dyads recruited (N = 50). In line with standard reporting practices and to ensure clarity regarding the analytic sample, we have now revised both the Abstract and Introduction to report the final sample size (N = 34). The full recruitment and exclusion details—including the number of dyads removed due to excessive motion or technical issues—are now clearly described in the Methods section.

      The corresponding changes have been made on pages 1 and 4 of the revised manuscript.

      In the "Neural maturity reflects the development of the social brain" section, the authors report the across-network correlation for adults, finding a negative correlation between ToM and SPM. However, the cross-network correlations for the three child groups are not reported. The statement that "the two networks were already functionally distinct in the youngest group of children we tested" is based solely on within-network positive correlations, which does not fully demonstrate functional distinctness. Including cross-network correlations for the child groups would strengthen this conclusion.

      We thank the reviewer for this insightful comment. We agree that within-network correlations alone do not fully establish functional distinctness, particularly in early development. To more directly test whether the ToM and SPM networks were already differentiated in children, we have now included the cross-network correlations between the two networks for each of the three age groups in the revised manuscript. These findings support and strengthen our original claim that the ToM and SPM networks are functionally dissociable even in early childhood, and we have revised the relevant Results sections accordingly to reflect this.

      The corresponding changes have been made on page 7 of the revised manuscript.

      “In children, each network also exhibited positive correlations within-network and negative correlations across networks (within-ToM correlation M(s.e.) = 0.31(0.04); within-SPM correlation M(s.e.) = 0.29(0.04); across-network M(s.e.) = −0.09 (0.02).

      In the Pre-junior group only (3-4 years old children, n = 12), both ToM and SPM networks had positive within-network correlations (within-ToM correlation M (s.e.) = 0.29(0.06); within-SPM correlation M(s.e.) = 0.23(0.05), across-network M(s.e.) = −0.05(0.02)).”

      The ROIs for the ToM and SPM networks are defined based on previous literature, applying the same ROIs across all age groups. While I understand this is a common approach, it's important to note that this assumption may not fully hold, as network architecture can evolve with age. The functional ROIs or components of a network might shift, with regions potentially joining or exiting a network or changing in size as children develop. For instance, Mark H. Johnson's interactive specialization theory suggests that network composition may adapt over developmental stages. Although the authors follow the approach of Richardson et al. (2018), it would be beneficial to discuss this limitation in the Discussion. An alternative approach would be to apply data-driven analysis to justify the selection of the ROIs for the two networks.

      We thank the reviewer for this thoughtful and theoretically grounded comment.  In our study, we followed the approach of Richardson et al. (2018), using a priori ROIs defined from adult meta-analyses and ToM/SPM task studies. This approach facilitates comparison with prior work and provides anatomical consistency across participants. However, we fully agree that applying adult-defined ROIs to pediatric populations involves important assumptions about the stability of network architecture across development, which may not fully hold in early childhood.

      We have now addressed this limitation more explicitly in the revised Discussion, emphasizing that the fixed-ROI approach may not capture the dynamic reorganization of social brain networks during development.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “Moreover, the ROIs used to define the ToM and SPM networks were based on meta-analyses and task studies primarily conducted with adults. While this approach promotes comparability with existing literature, it assumes that the spatial organization of these networks is stable across age groups. However, theories of interactive specialization suggest that the composition and boundaries of functional networks may undergo reorganization during development, with regions potentially entering or exiting networks based on experience and maturational processes. As a result, the current analysis may not fully capture age-specific functional architecture, particularly in younger children. Future studies using data-driven or age-appropriate parcellation methods could provide more precise characterizations of how social brain networks are constructed and differentiated throughout childhood.”

      The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size. I suggest discussing this limitation in more detail in the Discussion.

      We thank the reviewer for highlighting the limitations of applying structural equation modeling (SEM) with a relatively modest sample size. We agree that SEM generally benefits from larger samples to ensure model stability and parameter reliability, and that satisfactory model fit does not guarantee robustness in small-sample contexts.

      In the revised Discussion, we now more clearly acknowledge that the use of SEM in the current study is exploratory in nature, and that all results should be interpreted with caution due to potential sample size-related constraints. The model was constructed to provide an integrated view of the observed associations rather than to establish definitive pathways. We have also added a note that future research with larger samples and longitudinal designs will be needed to validate and extend the proposed model.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “In addition, the modest sample size (N = 34 dyads) presents limitations for the application of structural equation modeling (SEM), which typically requires larger samples for stable estimation and generalizable inferences. While the model fit was acceptable, the results should be interpreted as exploratory and hypothesis-generating, rather than confirmatory. Future studies with larger, independent samples will be important for validating the structure and directionality of the proposed relationships”

      Based on the above comment, I believe that conclusions regarding the relationship between social network development, parenting, and support for Bandura's theory should be tempered. The current conclusions may be too strong given the study's limitations.

      We thank the reviewer for this important and balanced observation. We agree that the conclusions drawn from the current study should reflect the exploratory nature of the analyses, as well as the methodological limitations, including the modest sample size and cross-sectional design.

      In response, we have revised the Conclusion sections to use more cautious, associative language when describing the observed relationships among social brain development, parenting factors, and Theory of Mind outcomes. In particular, we have tempered statements regarding support for Bandura’s social learning theory, clarifying that while our findings are consistent with social learning frameworks, the data do not allow for direct tests of modeling or observational learning mechanisms.

      We hope these revisions help clarify the scope of the findings and improve the conceptual rigor of the manuscript.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “Our study provides novel evidence that children's social cognitive development may be shaped by the intricate interplay between environmental influences, such as parenting, and biological factors, such as neural maturation. Our findings contribute to a growing understanding of the factors associated with social cognitive development and suggest the potential importance of parenting in this process. Specifically, the study points to the possible role of the parent-child relationship in supporting the development of social brain circuitry and highlights the relevance of family-based approaches for addressing social difficulties. The observed neural synchronization between parent and child, which was associated with relationship quality, underscores the potential significance of positive parental engagement in fostering social cognitive skills. Future longitudinal and clinical research can build on this multimodal approach to further clarify the neurobehavioral mechanisms underlying social cognitive development. Such research may help inform more effective strategies for promoting healthy social functioning and mitigating social deficits through targeted family-based interventions.”

      The SPM (pain) network is associated with empathic abilities, also an important aspect of social skills. It would be relevant to explore whether (or explain why) SPM development and child-mother synchronization are (or are not) related to parenting and the parent-child relationship.

      We thank the reviewer for this thoughtful and important comment regarding the role of the Social Pain Matrix (SPM) network in social cognition and empathy. We agree that this network represents a critical component of social-cognitive development and is theoretically linked to affective processing and interpersonal understanding.

      We would like to clarify that in our existing analyses—already included in the original submission and detailed in the Supplemental Results—SPM network measures showed similar significant associations with behavioral outcomes than the ToM network. These outcomes included children's performance on ToM tasks as well as broader measures of social functioning. We have added more discussion in the supplementary results.

      “To further investigate the specificity of our findings, we conducted additional control analyses focusing on the individual components of the social brain networks examined in our study: the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks.

      When analyzing these networks separately, we found significant correlations between neural maturity and age, as well as between inter-subject synchronization (ISS) and parent-child relationship quality for both the ToM and SPM networks individually (Fig. S1). Specifically, neural maturity within each network was positively correlated with age, indicating that both networks undergo maturation during childhood. Similarly, ISS within each network was negatively correlated with parent-child conflict scores, suggesting that both networks contribute to the observed relationship between neural synchrony and parent-child relationship quality.

      These results highlight the importance of considering the social brain as an integrated system, where the ToM and SPM networks work in concert to support social cognitive development. While each network shows age-related maturation and sensitivity to parent-child relationship quality, their combined functioning appears to be crucial for predicting broader social cognitive outcomes.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      Comments on revisions:

      My previous comments have been addressed. I have no additional points to make and congratulate the authors.

      Thank you for your acceptance.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the present of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. While the co-culture experiments are somewhat more difficult to interpret due to lack of a control for the effect of wildtype mouse astrocytes on human neurons, they are also consistent with the notion that deletion of Nlgn1-4 from astrocytes has no consequences for the function of excitatory synapses. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have fully addressed my concerns, and have in particular conducted a very elegant and compelling analysis of the degree of deletion of astrocytic Nlgn1-3/4 in their models. This greatly strengthens the main claims of their study and the fundamental nature of their conclusions for the field of synapse biology.

      I am somewhat less convinced by the newly added experiment to investigate deletion of Nlgns1-4 from glia in glia-neuron co-cultures. The authors provide no evidence to show that either WT or cKO glia have any effect on synapse formation or function in human neurons, and therefore, the current lack of a difference could equally result from the fact that both WT and cKO glia were non-functional altogether. The authors cite two studies to state that human neurons do not form synapses in the absence of astrocytes, Zhang et al. 2013 and Huang et al. 2017, but neither seem to be listed in the references (unless Zhang et al. 2014 was meant), making it difficult to assess the relevance of these data. However, since the data on astrocytic Nlgn1-3 deletion in vivo are compelling on their own, I do not see the co-culture experiment as essential for the main conclusions of the study.

      Minor comment:

      Please add the information on the strain background of the mice to the methods section of the manuscript. Strain background can have a significant impact on many aspects of neuronal function, and this information is therefore essential for the interpretation of potential differences to other studies.

      We deeply apologize for forgetting to include the two important references mentioned by the reviewer in the reference list. We understand that the reviewer as a result could not assess the validity of our statement that co-culture of glia is required for efficient synapse formation by human neurons that are induced from ES or iPS cells. Note that this conclusion does not postulate that all synapse formation requires glia, since the cited papers demonstrate that human neurons induced by our protocol still form scarce synapses without glia. This observation has been confirmed in many different experiments that were performed after the data presented in the cited papers. As a result of this extensive prior documentation that human neurons produced by forced expression of Ngn2 require coculture of glia for efficient synapse formation, we do not feel that we need to repeat this basic characterization of our culture system again to validate multiple previous papers and hope the reviewer will concur. We have additionally added the relevant mouse strain information to the methods section.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The experimental rigor and design of the noctural IOP experiments was weak with low n values and differing methods of IOP measurement (conscious versus anesthetized). The same method of IOP measurement needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      One of the goals of our study was to confirm the results from the Patel et al (2021; PMID33853948) study, which in which nocturnal IOP measurements were conducted in anesthetized mice and diurnal IOP measurements in awake animals but we agree with both Reviewers that IOP should be measured under identical experimental conditions. Parenthetically, the number of animals per each treatment paradigm in the original version (N = 4) was sufficient to produce statistical significance for diurnal control vs diurnal TGFB, and diurnal control vs nocturnal control conditions.

      To address the comment, we generated an additional cohort of TGFb2-expressing mice (N = 6) in which nocturnal and diurnal measurements were performed in awake animals. The results are shown in the revised Figure 6. Similar to the anesthetized cohort, the diurnal IOP in Lv-TGFB2 mice was statistically indistinguishable from the nocturnal value, indicating that TGFB2-induced OHT is not additive to physiological (circadian) OHT. The TRPV4-dependence of ocular hypertension induced by physiological and pathological methods suggests that the channel functions as a final common mechanism for ocular hypertension.

      Reviewer #2 (Public review):

      Figure 1A-C. Often there is a difference between the massage (message?, op. authors) and transcript data. I recommend the authors to confirm with qPCR data with another mode of protein measurements.

      We are not sure we understand the Reviewer’s comment regarding the “difference between the message and transcript data” but note that the mRNA data shown in panels A & B are confirmatory of previously published transcriptomic and proteomic screens (eg, Fleenor et al., IOVS 2006; Bollinger et al., IOVS 2011;  Callaghan et al., Scientific Reports 2022; Li et al., Current Eye Research 2022 etc) and were included to show that the transcriptional response of canonical SMAD and pro-fibrotic genes unfolds as predicted from previous work. With regard to TRPV4 signaling, we expand transcriptomic data with protein analysis (Western blots) and functional analyses (measurements of TRPV4-mediated current and calcium imaging). Transcriptomic, protein expression, electrophysiological and imaging experiments revealed a remarkable consistency in TGFB2-dependence of gene (Fig. 1C) and protein expression (Fig. 1D), transmembrane current (Fig. 3C) and intracellular calcium (Fig. 2).

      Parenthetically, we attempted to get a sense for the TGFB2-dependence of Piezo1 protein expression by conducting Western blots with multiple antibodies and experimental conditions. These efforts were unsuccessful, presumably due to the complexity (30-40 TM domains) and large molecular weight (280-300 kDa) of the protein. We note, however, that Piezo1 signaling cannot account for the observed OHT given that studies by us and others  (Yarishkin et al., 2021, PMID: 33226641 and Zhu et al., 2021; PMID: 33532718) associated Piezo1 signaling with facility increases. The revised m/s reads: “The suppression of outflow facility by Piezo1 inhibitors applied under in vitro and in vivo conditions (39, 81) instead suggests that Piezo1 opposes the hypertensive functions of TRPV4.” The preprint by Redmon et al. (bioRxiv 2024, PMID 39041037) expands the TRPV4-dependence of OHT to microbead-induced, steroid-induced and nocturnal models of OHT to indicate that TRPV4 functions as a universal driver of elevated IOP.  We reiterate this in the revised Discussion.

      Does direct TRPV4 activation also induce the expression of these markers? Does inhibition of TRPV4, after TGF-β treatment, prevent the expression of these markers? Is TRPV4 acting downstream of this response?

      A RNASeq study conducted by us (Rudzitis et al., under review) suggests that the agonist GSK101 has minimal effect on the fibrotic and canonical pathways shown in panels A and B. These data are beyond the scope of the present study. They will be published elsewhere, however, we include the data associated with genes depicted in panels A and B for the reviewer at the end of this Response.

      We conducted an additional series of experiments to test whether TGFB2-induced upregulation of the TRPV4 and Piezo1 genes is itself TRPV4-dependent. As shown in the new SFig. 1, upregulation of the two genes is unaffected by TRPV4 inhibition.

      Figure 1D. Beta tubulin is not a membrane marker. Having staining of b tubulin in membrane fraction shows contamination from the cytoplasm. Does the overall expression also increase?

      b-tubulin associates with the plasma membrane by binding to integral membrane proteins in the plasma and organellar membranes through palmitoylation and attachment to linker proteins and as an integral component of exocytotic vesicles (Wolff, BBA 2009; Hogerheide et al., PNAS 2017). The protein is often used as a loading control for the TRPV4 protein (please see https://www.cellsignal.com/products/primary-antibodies/trpv4-antibody/65893; Grove et al., Science Signaling 2019 and Moore et al., PNAS 2013).  Parenthetically, our RNASeq studies did not find modulation of b-tubulin expression by TGFβ2 [CNR and DK, unpublished observations].

      We examined the overall (cytosolic and membrane) TRPV4 expression and observed, similarly to the membrane fraction alone (Figure 2), upregulation following cytokine stimulation:

      Author response image 1.

      Western blot, total protection extract from control and TGFb2-treated TM cells [Alomone antibody].

      These results in our estimation do not add to the overall narrative and were not included into the paper.

      Figure 4A: it is not very clear. I recommend including a zoom image or better resolution image.

      We include a whole-page image as the new SFigure 4.

      Figure 5B and 6B. Why there is a difference between groups in pre-injection panel. As Figure 5A, in pre-injection, there is no difference between LV-TGFβ and LV-control while in 5B there is a significant difference between these groups.

      We revised Figure Legends to clarify that “pre-injection” in Figures 5B and 6B refers to IOP measurements before the intracameral injection of HC-06  not pre-injection of lentiviral constructs.

      Discussion section. Line 279: "TRPV4 channels in cells treated with TGFβ2 are likely to be constitutively active" ... needs to be discussed further.

      We rewrote the paragraph to clarify that TRPV4 is a thermosensitive channel that is expected to be constitutively active at the incubator temperature:

      “The effectiveness of TRPV4 inhibition in suppressing TGFB2-induced contractility (Fig. 4) is consistent with constitutive activation of TRPV4 channels in incubator-cultured cells.  TRPV4 is a thermosensitive channel (Q10 ~10). Mouse TRPV4 is activated by physiological temperatures (Chung et al., 2003; Shibasaki et al., 2007) with peak activation between ~34 - 37oC (Guler et al., 2003). The several-fold increase in functional expression of the channel in TGFB2-treated cells (Fig. 2) would be expected to promote tonic influx of Ca2+ and Ca2+-dependent cellular signaling. The abrogation of the contractile response in the presence of HC-06 indicates that TRPV4-mediated Ca2+ influx represents the principal source of calcium that drives the contractile response. Consistent with this, supplementation with the agonist GSK101 was sufficient to evoke TM contraction (Fig. 4B).”

      Line 280: "The residual contractility in HC-06-treated cells may reflect TGFβ2-mediated contributions from Piezo1." Piezo1 has a low threshold for mechanosensitivity. How do the authors discuss the observation that, in the presence of Piezo1, TRPV4 has a more prominent mechanosensory function? Is this tied to TGFβ signalling?

      This is an interesting question. Our macroscopic and single channel recordings of Piezo1 activity in TM cells recapitulate the time course published in the original Coste et al. (2010) study, showing the channel inactivates within 10-100 msec (Yarishkin et al., 2021). Thus, it is likely that the channel is largely inactivated during chronic ocular hypertension. Indeed, it has been suggested that resting membrane tension alone may be sufficient to inactivate Piezo1 (Lewis and Grandl, 2015), with cells grown on stiff substrates (e.g., under our experimental conditions) experiencing almost complete Piezo1 inactivation. We propose that the primary function of Piezo channels may be to sense and transduce transient mechanical loading. The remarkable IOP-lowering effectiveness of TRPV4 antagonists and knockdown indicates that - in contrast to Piezo1 - TRPV4 activation is sustained.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The complete strain name for the Trpv4-/- mice are missing.

      Corrected.

      The layout for Figure 6 is confusing as HC-06 was only used in panels B and C but the labels are above panel A.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      Only two mice were used for the noctural IOP experiments. Justification for retreating the same mice in opposite eyes and counting it as n=4 is not rigorous or justified.

      The number of mice investigated in the original submission was four. In Week 1, two mice underwent PBS injections and 2 two mice were treated with HC-06. After the baseline was re-established in Week 2, the treatments were reversed.

      We supplemented these numbers with an additional cohort of 6 mice, with identical results re: nocturnal vs diurnal IOP. These data are presented in the revised Figure 6.

      Why are daytime IOPs measured in awake mice but noctural IOP's measured in isoflurane anesthetized mice? Anesthesia is well known to effect IOP and using two different methods could alter the results, especially when comparing between the groups. This could be why you did not see a noctural rise in the TGFB injected eyes. The same method needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      This is a good point, please see our response above.

    1. Author response:

      Reviewer #1:

      Point 1

      Not many weaknesses, but probably validation at more enhancers could have made the paper stronger.

      We experimentally validated two sets of enhancers from two distinct tissues and observed similar effects. While this supports the idea that the TEAD-tissue-specific TF interaction we observe is not restricted to a single tissue, we agree that testing additional enhancers from a third tissue would strengthen our conclusions. We will acknowledge in the discussion that including a third tissue could provide additional support for the generality of our findings.

      Reviewer #2:

      Point 1

      The authors propose a mechanism of a TF trio (TEAD - CHD4 - tissue-specific TFs). However, only one validation experiment checked CHD4. CHD4 binding was not mentioned at all in the other cases.

      Indeed, CHD4 binding was experimentally validated at only one enhancer. This was a deliberate decision based on two key considerations:

      (1) Consistent functional response across enhancers: We tested multiple enhancers (n =8) for functional response to the TEAD+YAP and GATA4/6 combination. All enhancers tested exhibited the same trend—attenuation of GATA-mediated activation upon co-expression of TEAD or TEAD/YAP. This consistent pattern supports a shared mechanism across these elements.

      (2) Substantial prior evidence supporting CHD4 recruitment by both GATA4 and YAP: Specifically, CHD4 recruitment by GATA4 has been described in the context of cardiovascular development[1], and CHD4 can also be recruited by TEAD coactivator YAP2. Furthermore, published genomic occupancy data from embryonic heart tissue show widespread co-binding of GATA4, TEAD, and CHD4[1,3], including at most of the cardiac enhancers we functionally tested (4 out of 5).

      Given the consistent enhancer responses and the supporting literature and genomic data indicating TEAD-CHD4 co-occupancy, we chose to validate CHD4 binding at a representative enhancer as a proof of concept.

      We will clarify this rationale in the revised manuscript to better address this concern.

      Reviewer #2:

      Point 2

      The authors integrated E12.5 TEAD binding with E11.5 acetylation data, and it would be important to show that this experimental approach is valid or otherwise qualify its limitations.

      We will provide additional evidence in support of this approach in the revised manuscript or alternatively acknowledge its limitations.

      Reviewer #2:

      Point 3

      Motif co-occurrence analysis was extended to claiming TF interactions without further validation.

      We thank the reviewer for pointing out this important distinction. We reviewed the manuscript and identified seven instances where TF interactions were mentioned. Four of these correctly refer to previously established protein-protein interactions. For the remaining instances, we will adjust the wording to reflect the level of evidence, e.g.  describe combinatorial binding based on motif co-occurrence, rather than implying direct interaction.

      Reviewer #3:

      Point 1

      Much of this manuscript focuses on confirming transcription factor relationships that have been reported previously. For example, it is well known that GATA4 interacts with MEF2 in the ventricle. There are limited new or unexpected associations discussed and tested.

      We thank the reviewer for this important observation and see the recurrence of known interactions, such as GATA4-MEF2, not as a drawback, but as an important validation of our methodology.

      The identification of novel TF-TF combinations was geared toward uncovering shared regulatory principles across diverse human developmental tissues. While analysing 13 heterogeneous embryonic tissues introduced limitations, such as cellular complexity that may obscure rare interactions, it also allowed the identification of robust, recurrent patterns across tissues.  Indeed, using this approach, we identified the widespread combinatorial effect of TEAD in partnership with lineage-specific TFs, which is explored more in depth in the manuscript.

      Another main goal of the study was to develop and demonstrate a generalizable strategy for identifying combinatorial TF binding patterns that underlie tissue-specific gene regulation. Given the inherent heterogeneity of the embryonic organs analysed, the approach is naturally biased toward recovering the most prevalent, and often well-characterized, TF combinations. While we fully acknowledge this limitation, we believe that the ability to robustly recover well-established TF partnerships across multiple organs provides a valuable proof of concept. The next step will be to apply this strategy to single-cell RNA datasets, in order to define TF relationships at higher resolution, for example, resolving associations down to specific family members that cooperate within distinct lineages or cell types, and identifying less frequent or underrepresented TF-TF relationships.

      In this context, we believe that our strategy has successfully highlighted shared enhancer logic and offers a framework for future high-resolution dissection of TF cooperativity at the single-cell level. The rationale for analysing heterogeneous tissues, along with its limitations, will be addressed in the revised version.

      Reviewer #3:

      Point 2

      Embryonic tissues are highly heterogeneous, limiting the utility of the bulk ChIP-seq employed in these analyses. Does the cellular heterogeneity explain the discrepancy between TEAD binding and histone acetylation? Similarly, how does conservation between species affect the TF predictions?

      We thank the reviewer for raising these important points. We acknowledge the limitations of using bulk ChIP-seq data in the context of complex embryonic tissues (see also previous point). We cannot exclude that the discrepancy between TEAD binding and histone acetylation is an effect of cellular heterogeneity. Indeed, we mention in the results “Our ventricle-specific enhancers were sampled at a single time point and likely represent enhancers that are selectively active in different cell types and developmental stages, given the heterogeneity of cell types in the ventricle”. The limitation of bulk ChIP-seq will be addressed in the discussion. In the specific case of the enhancers selected for validation, the binding site sequences are conserved between species, suggesting that the cis-regulatory activity is likely to be similar in both.

      Reviewer #3:

      Point 3

      Some of the interpretations should also be fleshed out a bit more to clarify the advantage of the analyses presented here. For example, if Gata4 and Foxa2 transcripts are expressed during different stages of development, then it's likely that (as stated by the authors) these motifs are not used during the same stage of development. But examining the flanking regions wasn't necessary to make that statement. This type of conclusion seems tangential to the benefit of this analysis, which is to understand which TFs work together in a single organ at a single time point.

      We appreciate the reviewer’s comment and the opportunity to clarify our interpretation. The reviewer refers to the finding that GATA4 and FOXA2 motifs are flanked by different sets of motifs in liver enhancers, suggesting that these TFs operate within distinct regulatory contexts.

      Our aim was not to state that GATA4 and FOXA2 do not function simultaneously—this can indeed be inferred from their non-overlapping expression patterns. Rather, we intended to highlight the potential of our approach, even when applied to bulk data, to resolve distinct regulatory modules that may act in different subpopulations of cells or developmental windows within the same tissue.

      We will revise the relevant section of the manuscript to make this interpretative point clearer.

      Reviewer #3:

      Point 4

      This manuscript hinges on luciferase assays whose results can be difficult to translate to complex gene regulation networks. Many motifs are often clustered together, which makes designing experiments at endogenous loci important in studies such as this one.

      We agree with the Reviewer that luciferase assays represent an oversimplified model of gene regulation and do not fully capture the complexity of endogenous regulatory networks. We will explicitly acknowledge this limitation in the discussion.

      Mutagenesis of TEAD and tissue-specific TF motifs at endogenous loci would provide more conclusive evidence. However, our goal was to test the generality of TEAD effect across multiple enhancers and tissues. Despite its limitations, a luciferase-based assay was the most feasible approach, as an endogenous strategy would not have allowed us to assess a broader set of enhancers efficiently. Additionally, the presence of recurrent motifs and the potential functional redundancy among enhancers targeting the same gene can complicate the interpretation of single-locus perturbations.

      References

      (1) Robbe ZL, Shi W, Wasson LK, Scialdone AP, Wilczewski CM, Sheng X, et al. CHD4 is recruited by GATA4 and NKX2-5 to repress noncardiac gene programs in the developing heart. Genes Dev. 2022 Apr 1;36(7–8):468–82.

      (2) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional Co-repressor Function of the Hippo Pathway Transducers YAP and TAZ. Cell Rep. 2015 Apr;11(2):270–82.

      (3) Akerberg BN, Gu F, VanDusen NJ, Zhang X, Dong R, Li K, et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat Commun. 2019 Oct 28;10(1):4907.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      The first part of the manuscript is not particularly novel, and it would be beneficial to clearly state which aspects of the analyses and derivations are different from previous literature. For example, the derivation that rank-1 RNNs cannot implement selection vector modulation is already present in the Extended Discussion of Pagan et al., 2022 (Equations 42-43). Similarly, it would be helpful to more clearly explain how the proposed pathway-based information flow analysis differs from the circuit diagram of latent dynamics in Dubreuil et al., 2022.

      We thank the reviewer for the insightful comments and providing us a good opportunity to better clarify the novelty of our work regarding the analyses and derivations. In general, as the reviewer pointed out, the major novelty of our work lies in explicitly linking selection mechanisms (proposed in Mante et al. 2013) with circuit-level descriptions of low-rank RNNs (developed in Dubreuil et al. 2022). This is made possible through a set of analyses and derivation integrating both linearized dynamical systems analysis (Mante et al., 2013) and the circuit diagram of latent dynamics (Dubreuil et al. 2022). Specifically, starting from rank-3 RNN models, we first derived the circuit diagram of latent dynamics (Eqs. 18 and 19) by applying the theory developed in Dubreuil et al. 2022. However, without further analysis, there is no explicit link between this latent dynamics and selection mechanism. In this manuscript, based on the line attractor assumption, we linearized the latent dynamics around the line attractor (Mante et al., 2013), which enabled us to explicitly solve the equation (from eq. 20 to eq. 27) and derive an explicit formula for the effective coupling of information flow (Fig. 5A). This formula of effective coupling strength supported an explicit pathway-based definition of selection vector modulation (Fig. 5) and selection vector (Fig. 6), the core result of this manuscript. Importantly, the same analysis can be extended to higher-order lowrank RNNs (Eqs. 47-55), suggesting the general applicability of our result. We have revised the manuscript to clearly state the novelty of our work. Please see Lines 292-294.

      As such a set of analyses and derivation integrates many results from previous literatures, it naturally shared many similarities with previous results as the reviewer pointed out. Below, we compared our work with previous ones mentioned by the reviewer:

      (1) For example, the derivation that rank-1 RNNs cannot implement selection vector modulation is already present in the Extended Discussion of Pagan et al., 2022 (Equations 42-43). 

      For this point, we totally agree with the reviewer that the derivation of rank-1 RNNs’ limitations in implementing selection vector modulation is not particularly novel. The reason why we started from rank-1 RNNs is because these RNNs are the simplest examples revealing the intriguing link between the connectivity property and the modulation mechanism and thereby serving as the ideal introduction for the subsequent in-depth discussion for general audiences. In the original manuscript, we cited the Pagan et al. 2023 note but may not make it explicit enough. As the reviewer pointed out that the derivation has been added into the latest version of Pagan et al. paper (Pagan et al. 2024), we now cite the Pagan et al. 2024 paper and make it clear that the derivation has been derived in Pagan et al. 2024. Please see Lines 186-188 in the main text.

      (2) Similarly, it would be helpful to more clearly explain how the proposed pathway-based information flow analysis differs from the circuit diagram of latent dynamics in Dubreuil et al., 2022.

      As we explained earlier, the latent dynamics in Dubreuil et al. alone did not provide an explicit link between circuit diagram and selection mechanisms. Our analysis go beyond the theory developed in Dubreuil et al. 2022 paper by integrating the linearized dynamical systems analysis (Mante et al. 2013), eventually providing a previously-unknown explicit link between circuit diagram and selection mechanisms.

      With regard to the results linking selection vector modulation and dimensionality, more work is required to understand the generality of these results, and how practical it would be to apply this type of analysis to neural recordings. For example, it is possible to build a network that uses input modulation and to greatly increase the dimensionality of the network simply by adding additional dimensions that do not directly contribute to the computation. Similarly, neural responses might have additional high-dimensional activity unrelated to the task. My understanding is that the currently proposed method would classify such networks incorrectly, and it is reasonable to imagine that the dimensionality of activity in high-order brain regions will be strongly dependent on activity that does not relate to this task.

      We thank the reviewer for this insightful comment. As what the reviewer suggested, we did more work to better understand the generality and applicability of the index proposed in the manuscript.

      Firstly, to see if the currently proposed method can work when there is significant amount of neural activity variance irrelevant to the task, we manually added irrelevant neural activity into the trained RNNs (termed as redundant RNNs, see Methods for details, Lines 1200-1215). As expected, we found that for these redundant RNNs, the correlation between the proposed index and the proportion of selection vector modulation indeed disappeared (Figure 7-figure supplement 4B). In fact, in the original version of our manuscript, we presented an extreme example of this idea in our discussion, where we designed two RNNs with theoretically identical neural activity patterns—one relying purely on input modulation and the other on selection vector modulation (Figure 7-figure supplement 3). Therefore, for this extreme example, any activity-based index alone would fail to differentiate between these two mechanisms, suggesting the challenge of distinguishing different selection mechanisms when taskirrelevant neural activity is added.

      Secondly, we asked why the proposed index works well for the trained RNNs, which is kind of surprising in the first place as the reviewer pointed out. One possibility is that for trained RNNs, the task-irrelevant neural activity is minimal. To test this possibility, we conducted in-silico lesion experiments for the trained RNNs. The main idea is that if an RNN contains a large portion of taskirrelevant variance, there will exist a subspace (termed as task-irrelevant subspace) that captures this part of variance and removing this task-irrelevant subspace will not affect the network’s behavior. Based on this idea, we developed an optimization method to identify such a task-irrelevant subspace for any given RNN (see Methods for details, Lines 1216-1244). The results show that in the originally trained RNNs, the identified task-irrelevant subspace can only explain a small portion of neural activity variance (Figure 7-figure supplement 4, panel C). As a control, when applying the same optimization method to the redundant RNNs, we found that the identified task-irrelevant subspace can explain a significantly larger portion of neural activity variance (Figure 7-figure supplement 4, panel C). Taken together, we concluded that the reason why the index works for trained RNNs is because the major variance of the neural activity of the network learned through backpropagation is task-relevant.

      Therefore, this set of analyses provided an understanding why the proposed index works for trained RNNs and failed for the redundant RNNs. We have added this part of analyses in the Discussion part. See Lines 601-610. As the reviewer pointed out that it is highly likely that there exists taskirrelevant neural activity variance in high brain regions, the proposed index may not work well in neural recordings. With this understanding, we tone down the conclusion related to experimentally testable prediction in the main text (e.g., in Abstract and Introduction). We thank the reviewer again for helping us improve the clarity of our work.

      Finally, a number of aspects of the analysis are not clear. The most important element to clarify is how the authors quantify the "proportion of selection vector modulation" in vanilla RNNs (Figures 7d and 7g). I could not find information about this in the Methods, yet this is a critical element of the study results. In Mante et al., 2013 and in Pagan et al., 2022 this was done by analyzing the RNN linearized dynamics around fixed points: is this the approach used also in this study? Also, how are the authors producing the trial-averaged analyses shown in Figures 2f and 3f? The methods used to produce this type of plot differ in Mante et al., 2013 and Pagan et al., 2022, and it is necessary for the authors to explain how this was computed in this case.

      We thank the reviewer for the valuable comments. Yes, for proportion of selection vector modulation (Figure 7D and 7G) we employed the method used in Mante et al., 2013. For the trial-averaged analyses shown in Figures 2f and 3f, we followed a procedure used in Mante et al., 2013. In the revised version, we have added the relate information. See Lines 852-853 and 872-889. We thank the reviewer again for improving the clarify of our work.

      I am also confused by a number of analyses done to verify mathematical derivations, which seem to suggest that the results are close to identical, but not exactly identical. For example, in the histogram in Figure 6b, or the histogram in Figure 7-figure supplement 3d: what is the source of the small variability leading to some of the indices being less than 1?

      In Figure 6B, the two selection vectors are considered theoretically equivalent under the meanfield assumption. However, because the RNNs we use have a finite number of neurons, finite-size effects inevitably cause slight deviations from perfect equivalence.

      To verify this, we generated rank-3 RNNs of different sizes in the experiment for Figure 6b (see the Supplementary section “Building rank-3 RNNs with both input and selection vector modulations”). Specifically, for a fixed number of neurons 𝑁, we independently sampled 𝛼, 𝛽 and 𝛾 from a Uniform(0,1) distribution and built an RNN with 𝑁 neurons based on the procedure as in Figure 5C. We then computed the selection vector for the RNN in a given context (for example, context 1) in two ways:

      (1) via linearized dynamical system analysis following Mante et al. (2013), producing the selection vector sc<sup>classical</sup>

      (2) using the theoretical derivation

      Author response image 1.

      cos angles for selection vectors computed using two methods in RNN with different size. Black bars indicate median values.

      We repeated this process 1000 times for each 𝑁 and measured the cosine angle between these two selection vectors. As shown in Author response image 1, as 𝑁 increases, the cosine angles approach 1 more consistently, indicating that the two selection vectors become nearly equivalent in larger RNNs. Conversely, smaller RNNs display more pronounced finite-size effects, which accounts for indices slightly below 1.

      Reviewer 2 (Public review):

      The introduction could have been written in a more accessible manner for any non-expert readers.

      We sincerely thank the reviewer for the constructive feedback on the introduction and have revised it accordingly.

      Reviewer #2 (Recommendations for the authors):

      The level of mastery of the low-rank framework is altogether impressive. I need however to point to a technical detail. The derivations of the information flow assume that the vectors m and vectors I are orthogonal (e.g. in Equation 14). This is not necessarily the case in trained networks, and Figure 2F suggests this is not the case in the trained rank 1 network. In that situation, the overlap between m and I leads to an additional term in the Equation going directly from the input to the output vector (see, e.g., Equation 15 in Beiran et al. Neuron 2023). In general, these kind of overlaps can contribute an additional pathway in higher rank networks too.

      We thank the reviewer for the valuable comments. The derivations presented in Equation 14 do not actually require that the vectors 𝒎 and 𝑰 are orthogonal. Rather, our definition of the task variable differs slightly from the one in Beiran et al. (2023). Consider a rank-1 RNN with a single input channel:

      Author response image 2.

      Difference of the definition of task variable with previous work. (A) Our definition of task variable. (B) Definition of task variable in Beiran et al. 2023.

      As long as 𝒎 and 𝑰 are linearly independent, the state 𝒙(𝑡) can be uniquely written as a linear combination of the two vectors (Author response image 2):

      where and are the task variables associated with 𝒎 and 𝑰, respectively. Substituting this expression into the dynamical equations yields:

      Hence, there is no additional term directly linking the input to the output vector in our formulation. By contrast, in Beiran et al. (2023), the input vector 𝑰 is decomposed into components parallel (𝐼//) and perpendicular (𝑰-) to 𝒎, and the task variables are defined as (Figure 4-figure supplement 3B):

      This leads to dynamics of the form:

      thus creating an additional direct term from the input to the output vector under their definition.

      The designed rank 3 network relies on a multi-population structure. This is explained clearly in the methods, but it could be stressed more in the main text to dispel the notion that higherrank networks may not need a multi-population structure to perform this task (cf Dubreuil et al 2022).

      Thank you for the valuable comments. In the revised version, we emphasize this point by adding the following sentence: “our rank-3 network relies on a multi-population structure, consistent with the notion that higher-rank networks still require a multi-population structure to perform flexible computations (Dubreuil et al. 2022)”. See Lines 238-240.

      (3) An important result in Pagan et al and Mante et al is that the line attractor direction is invariant across contexts. I believe this is explicitly enforced in the models studied here, but this could be made more clear. It would be interesting to discuss the importance of this constraint.

      We thank the reviewer for the valuable comments. In our hand-crafted RNN examples (Figures 3– 6), we enforce the choice axis to be identical across the two contexts (Figure R4B). Even in the rank-1 example (Figure 2), where we analyze a trained RNN, the choice axis still shows a substantial overlap between the two contexts (Figure R4A). However, in the trained vanilla RNNs shown in Figure 7, when the regularization term is relatively small, the overlap in the choice axis between contexts is smaller (Figure R4C)—i.e., the line attractor direction shifts between different contexts.

      Author response image 3.

      Cosine angle between the choice axes in two contexts for different RNNs. (A) Rank-1 RNNs in Figure 2. (B) Rank-3 RNNs in Figure 3-6. (C) Vanilla RNNs in Figure 7.

      Our theoretical framework can also accommodate situations where the direction of the choice axis changes. For instance, consider the rank-3 RNN in Figure 6, where the choice axis is defined as with 𝐺 being a diagonal matrix whose elements represent the slopes of each neuron’s activation function. Since these slopes can change across contexts, itself can vary across contexts. Likewise, the input representation direction may be written as , allowing both the choice axis and the input axis to adapt to the context. The selection vector is given by:

      Here, we no longer assume that is context-invariant; rather, we only assume its norm remains the same across contexts. Under this weaker assumption, we still have

      Substituting these into the equations yields the following expressions for input modulation and selection vector modulation:

      Figure 6B: it was not clear to me what exactly is plotted here.

      We thank the reviewer for pointing out the missing explanation. In Figure 6B, we show the distribution of the cosine angles between two ways of computing the selection vector for randomly generated rank-3 RNNs. Specifically, We generate 1000 RNNs according to the procedure in Figure 5C, with each RNN defined by parameters 𝛼 , 𝛽 and 𝛾 independently sampled from a Uniform(0,1) distribution. For each RNN, we computed the selection vector for the RNN in a given context (e.g., context 1 or 2) in two ways:

      (1)  via linearized dynamical system analysis following Mante et al. (2013), producing the selection vector sv<supclassical</sup> (classical in Figure 6B),

      (2)  using the theoretical derivation (“our’s” in Figure 6B)

      We repeated this process 1000 times and measured the cosine angle between these two selection vectors and plot the resulting distribution for context 1 (gray) and context 2 (blue) in Figure 6B. The figure shows that the computed selection vectors via the two methods are almost equal, as evidenced by the cosine angles clustering very close to 1.

      We have revised it accordingly. See Lines 1135-1143.

      In Figure 7, how was the effective dimension of vanilla RNNs controlled or varied? The metric used (effective dimension) is relatively non-standard, it would be useful to give some intuition to the reader about it.

      We thank the reviewer for these valuable comments.

      Controlling the effective dimension

      When train vanilla RNNs, we included a regularization term in the loss function of the form

      where 𝑤536 is a regularization coefficient. By adjusting 𝑤536, we can influence the distribution of singular values of connectivity of 𝐽. When w<sub>reg</sub> is larger, the learned 𝐽 tends to have fewer large singular values, hence with lower effectivity dimension; when 𝑤536 is small, more singular values remain large, increasing the matrix’s effective dimension.

      Definition and intuition: effective dimension

      Consider a connectivity matrix 𝐽 with singular values . The matrix’s rank is the number of nonzero singular values. However, rank alone can overlook differences in how quickly those singular values decay. To capture this, we define the effective dimension as:

      Each term lies between 0 and 1, so the effective dimension satisfies:

      When all nonzero singular values are equal, edim(𝐽) equals the matrix rank. But if some singular values are much smaller than others, effective dimension will be closer to 1. For example:

      -  𝐽<sub>1</sub> has nonzero singular values (1, 0.1, 0.01). Its effective dimension is 1.0101, indicating that most of the variance is captured by the largest singular value.

      -  𝐽sub>0</sub> has nonzero singular values (1, 0.8, 0.7). Its effective dimension is 2.13, which reflects that multiple singular values contribute significantly.

      Hence, while both >𝐽<sub>1</sub> and 𝐽sub>0</sub> are rank-3 matrices, their effective dimensions highlight the difference in how each matrix distributes its variance.

      We have added the intuition underlying this concept in Methods (see Lines 1135-1143). We thank the reviewer for improving the clarity of our work. 

      Eqs 19&21: n^T_r should be n^T_dv?

      Thank you for point out this mistake. We have fixed it in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy. 

      Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses? 

      This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset. 

      Strengths: 

      Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset. 

      Weaknesses: 

      Introduction: The introduction lacks a paragraph providing an overview of ICI-induced arthritis pathogenesis and a comparison with other types of arthritis. Including this would help contextualize the study for a broader audience.

      Thank you for this suggestion, we have added a paragraph on ICI-arthritis to intro (pg. 4, middle paragraph).  

      Results Section: At the beginning of the results section, the experimental setup should be described in greater detail to make an easier transition into the results for the reader, rather than relying just on references to Figure 1 captions.

      We have clarified the experimental setup (pg. 5).  

      There is insufficient comparison between single-cell RNA-seq data from ICI-induced arthritis and previously published single-cell RA datasets. Such a comparison may include DEGs and GSEA, pathway analysis comparison for similar subsets of cells. Ideally, an integration with previous datasets with RA-tissue-derived primary monocytes would allow for a direct comparison of subsets and their transcriptomic features.

      We thank the Reviewer for this suggestion, which has increased the impact of our data and analysis. A computationally rigorous representation mapping approach showed that ICI-arthritis myeloid subsets predominantly mapped onto 4 previously defined RA subsets including IL-1β+ cells. This result was corroborated using a complementary data integration approach. Analysis of (TNF + PGE)-induced gene sets (TP signatures) in ICI-arthritis myeloid cells projected onto the RA subsets using the AUCell package showed elevated TP gene expression in similar ICI-arthritis and RA monocytic cell subsets. We also found mutually exclusive expression of TP and IFN signatures in distinct RA and ICI-arthritis myeloid cell subsets, which supports that the opposing cross-regulation between IFN-γ and PGE2 pathways that we identified in vitro also functions similarly in vivo. This analysis is shown in the new Fig. 3, described on pg. 7, and discussed on pp. 13-14.

      While it's understandable that arthritis samples are limited in numbers and myeloid cell numbers, it would still be interesting to see the results of PGE2+TNF in vitro stimulation on the primary RA or ICI-RA macrophages. It would be valuable to see RNA-Seq signatures of patient cell reactivation in comparison to primary stimulation of healthy donor-derived monocytes.

      We agree that this would be interesting but given limited samples and distribution of samples amongst many studies and investigators this is beyond the scope of the current study.  

      Discussion: Prior single-cell studies of RA and RA macrophage subpopulations from 2019, 2020, 2023 publications deserve more discussion. A thorough comparison with these datasets would place the study in a broader scientific context. 

      Creating an integrated RA myeloid cell atlas that combines ICI-RA data into the RA landscape would be ideal to add value to the field. 

      As one of the next research goals, TNF blockade data in RA and ICI-RA patients would be interesting to add to such an integrated atlas. Combining responders and non-responders to TNF blockade would help to understand patient stratification with the myeloid pathogenic phenotypes. It would be great to read the authors' opinion on this in the Discussion section. 

      Please see our response to point 3 above. This point is addressed in Fig. 3, pg. 7, and pp. 13-14, which includes a discussion of responders and nonresponders and patient stratification.  

      Conclusion: The authors demonstrated that while PGE2 maintains the inflammatory profile of macrophages, it also induces a distinct phenotype in simultaneous PGE2 and TNF treatment. The study of this specific subset in single-cell data from ICI-RA patients sheds light on the pathogenic mechanisms underlying this condition, however, how it compares with conventional RA is not clear from the manuscript. 

      Given the substantial incidence of ICI-induced autoimmune arthritis, understanding the unique macrophage subsets involved for future targeting them therapeutically is an important challenge. The findings are significant for immunologists, cancer researchers, and specialists in autoimmune diseases, making the study relevant to a broad scientific audience. 

      Reviewer #2 (Public review): 

      Summary/Significance of the findings: 

      The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis. 

      Strengths: 

      The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis. 

      Weaknesses: 

      (1) The authors carried out most of the assays in the monocytes/macrophages. How do APCcells like Dendritic cells behave with respect to this TP treatment similar dosing? 

      We agree that this is an interesting topic especially as TNF + PGE2 is one of the standard methods of maturing in vitro generated human DCs and promoting antigen-presenting function. As DC maturation is quite different from monocyte activation this would represent a new study and is beyond the scope of the current manuscript. We have instead added a paragraph to the discussion (pg. 12) and cited the literature on DC maturation by TNF + PGE2 including one of our older papers (PMID: 18678606; 2008)  

      (2) The authors studied 3h and 24h post-treatment transcriptomic and epigenomic. What happens to TP induce inflammatory genes post-treatment 12h, 36h, 48h, 72h. It is critical to see the upregulated/downregulated genes get normalised or stay the same throughout the innate immune response.

      We now clarify that subsets of inducible genes showed distinct kinetics of induction with transient expression at 3 hr versus sustained expression over the 24 hr stimulation period as shown in Supplementary Fig. 1 (pg. 5).

      (3) The authors showed IL1-axis in response to the TP-treatment. Do other cytokine axes get modulated? If yes, then how do they cooperate to reduce/induce inflammatory responses along this proposed axis?

      This is an interesting question, which we approached using a combination of pathway analysis and targeted inspection of pathways important pathogenesis of RA, which is the inflammatory condition most relevant for this study. In addition to genes in the IL-1-NF-κB core inflammatory pathway, pathway analysis of genes induced by TP co-stimulation showed enrichment of genes related to leukocyte chemotaxis, in particular neutrophil migration. Accordingly, TP costimulation increased expression of CSF3, which plays a key role in mobilizing neutrophils from the bone marrow, and major neutrophil chemokines CXCL1, CXCL2, CXCL3 and CXCL5 that recruit neutrophils to sites of inflammation including in inflammatory arthritis. Analysis of the late response to TNF similarly showed enrichment of genes important in chemotaxis, and suppression of genes in the cholesterol biosynthetic pathway, which we and others have previously linked to IFN responses. Targeted inspection of genes in additional pathways implicated in RA pathogenesis showed increased expression of genes in the Notch pathway. We believe that these pathways work together with the IL-1 pathway to increase immune cell recruitment and activation in inflammatory responses; these results are described on pp. 5-6 and are incorporated into Figures 1, 2 and Supplementary Fig. 2. 

      Overall, the data looks good and acceptable but I need to confirm the above-mentioned criticisms. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):   

      The discussion section of the manuscript claims: "In this study, we utilized transcriptomics to demonstrate a 'TNF + PGE2' (TP) signature in RA and ICI-arthritis IL-1β+ synovial macrophages." This statement is misleading, as no new transcriptomic data from RA synovial samples were generated in this study. To support such a claim, the authors would need to compare primary monocytes or macrophages from RA patients using bulk RNA-seq or singlecell RNA-seq. Based on the current data, the comparison is limited to bulk RNA-seq findings from the authors' in vitro model and prior monocyte-fibroblast coculture studies. 

      We have modified the abstract and discussion (pg. 10) to reflect that we have compared an in vitro generated TP signature with gene expression in previously identified RA macrophage subsets.

    1. Author response:

      [The following is the authors’ response to the original reviews.]

      We extend our sincere thanks to the editor, referees for eLife, and other commentators who have written evaluations of this manuscript, either in whole or in part. Sources of these comments were highly varied, including within the bioRxiv preprint server, social media (including many comments received on X/Twitter and some YouTube presentations and interviews), comments made by colleagues to journalists, and also some reviews of the work published in other academic journals. Some of these are formal and referenced with citations. Others were informal but nonetheless expressed perspectives that helped enable us to revise the manuscript with the inclusion of broader perspectives than the formal review process. It is beyond the scope of this summary to list every one of these, which have often been brought to the attention of different coauthors, but we begin by acknowledging the very wide array of peer and public commentary that have contributed to this work. The reaction speaks to a broad interest in open discussion and review of preprints. 

      As we compiled this summary of changes to the manuscript, we recognized that many colleagues made comments about the process of preprint dissemination and evaluation rather than the data or analyses in the manuscript. Addressing such comments is outside the scope of this revised manuscript, but we do feel that a broader discussion of these comments would be valuable in another venue. Many commentators have expressed confusion about the eLife system of evaluation of preprints, which differs from the editorial acceptance or rejection practiced in most academic journals. As authors in many different nations, in varied fields, and in varied career stages, we ourselves are still working to understand how the academic publication landscape is changing, and how best to prepare work for new models of evaluation and dissemination. 

      The manuscript and coauthor list reflect an interdisciplinary collaboration. Analyses presented in the manuscript come from a wide range of scientific disciplines. These range from skeletal inventory, morphology, and description, spatial taphonomy, analysis of bone fracture patterns and bone surface modifications, sedimentology, geochemistry, and traditional survey and mapping. The manuscript additionally draws upon a large number of previous studies of the Rising Star cave system and the Dinaledi Subsystem, which have shaped our current work. No analysis within any one area of research stands alone within this body of work: all are interpreted in conjunction with the outcomes of other analyses and data from other areas of research. Any single analysis in isolation might be consistent with many different hypotheses for the formation of sediments and disposition of the skeletal remains. But testing a hypothesis requires considering all data in combination and not leaving out data that do not fit the hypothesis. We highlight this general principle at the outset because a number of the comments from referees and outside specialists have presented alternative hypotheses that may arguably be consistent with one kind of analysis that we have presented, while seeming to overlook other analyses, data, or previous work that exclude these alternatives. In our revision, we have expanded all sections describing results to consider not only the results of each analysis, but how the combination of data from different kinds of analysis relate to hypotheses for the deposition and subsequent history of the Homo naledi remains. We address some specific examples and how we have responded to these in our summary of changes below. 

      General organization:

      The referee and editor comments are mostly general and not line-by-line questions, and we have compiled them and treated them as a group in this summary of changes, except where specifically noted. 

      The editorial comments on the previous version included the suggestion that the manuscript should be reorganized to test “natural” (i.e. noncultural) hypotheses for the situations that we examine. The editorial comment suggested this as a “null hypothesis” testing approach. Some outside comments also viewed noncultural deposition as a null hypothesis to be rejected. We do not concur that noncultural processes should be construed as a null hypothesis, as we discuss further below. However, because of the clear editorial opinion we elected to revise the manuscript to make more explicit how the data and analyses test noncultural depositional hypotheses first, followed by testing of cultural hypotheses. This reorganization means that the revised manuscript now examines each hypothesis separately in turn. 

      Taking this approach resulted in a substantial reorganization of the “Results” section of the manuscript. The “Results” section now begins with summaries of analyses and data conducted on material from each excavation area. After the presentation of data and analyses from each area, we then present a separate section for each of several hypotheses for the disposition and sedimentary context of the remains. These hypotheses include deposition of bodies upon a talus (as hypothesized in some previous work), slow sedimentary burial on a cave floor or within a natural depression, rapid burial by gravity-driven slumping, and burial of naturally mummified remains. We then include sections to test the hypothesis of primary cultural burial and secondary cultural burial. This approach adds substantial length to the Results. While some elements may be repeated across sections, we do consider the new version to be easier to take piece by piece for a reader trying to understand how each hypothesis relates to the evidence. 

      The Results section includes analyses on several different excavation areas within the Dinaledi Subsystem. Each of these presents somewhat different patterns of data. We conceived of this manuscript combining these distinct areas because each of them provides information about the formation history of the Homo naledi-associated sediments and the deposition of the Homo naledi remains. Together they speak more strongly than separately. In the previous version of the manuscript, two areas of excavation were considered in detail (Dinaledi Feature 1 and the Hill Antechamber Feature), with a third area (the Puzzle Box area) included only in the Discussion and with reference to prior work. We now describe the new work undertaken after the 2013-2014 excavations in more detail. This includes an overview of areas in the Hill Antechamber and Dinaledi Chamber that have not yielded substantial H. naledi remains and that thereby help contextualize the spatial concentration of H. naledi skeletal material. The most substantial change in the data presented is a much expanded reanalysis of the Puzzle Box area. This reanalysis provides greater clarity on how previously published descriptions relate to the new evidence. The reanalysis also provides the data to integrate the detailed information on bone identification fragmentation, and spatial taphonomy from this area with the new excavation results from the other areas. 

      In addition to Results, the reorganization also affected the manuscript’s Introduction section. Where the previous version led directly from a brief review of Pleistocene burial into the description of the results, this revised manuscript now includes a review of previous studies of the Rising Star cave system. This review directly addresses referee comments that express some hesitation to accept previous results concerning the structure and formation of sediments, the accessibility of the Dinaledi Subsystem, the geochronological setting of the H. naledi remains, and the relation of the Dinaledi Subsystem to nearby cave areas. Some parts of this overview are further expanded in the Supplementary Information to enable readers to dive more deeply into the previous literature on the site formation and geological configuration of the Rising Star cave system without needing to digest the entirety of the cited sources. 

      The Discussion section of the revised manuscript is differentiated from Results and focuses on several areas where the evidence presented in this study may benefit from greater context. One new section addresses hypothesis testing and parsimony for Pleistocene burial evidence, which we address at greater length in this summary below. The majority of the Discussion concerns the criteria for recognizing evidence for burial as applied in other studies. In this research we employ a minimal definition but other researchers have applied varied criteria. We consider whether these other criteria have relevance in light of our observations and whether they are essential to the recognition of burial evidence more broadly. 

      Vocabulary:

      We introduce the term “cultural burial” in this revised manuscript to refer to the burial of dead bodies as a mortuary practice. “Burial” as an unmodified term may refer to the passive covering of remains by sedimentary processes. Use of the term “intentional burial” would raise the question of interpreting intent, which we do not presume based on the evidence presented in this research. The relevant question in this case is whether the process of burial reflects repeated behavior by a group. As we received input from various colleagues it became clear that burial itself is a highly loaded term. In particular there is a common assumption within the literature and among professionals that burial must by definition be symbolic. We do not take any position on that question in this manuscript, and it is our hope that the term “cultural burial” may focus the conversation around the extent that the behavioral evidence is repeated and patterned. 

      Sedimentology and geochemistry of Dinaledi Feature 1:

      Reviewer 4 provided detailed comments on the sedimentological and geochemical context that we report in the manuscript. One outside review (Foecke et al. 2024) included some of the points raised by reviewer 4, and additionally addressed the reporting of geochemical and sedimentological data in previous work that we cite. 

      To address these comments we have revised the sedimentary context and micromorphology of sediments associated with Dinaledi Feature 1. In the new text we demonstrate the lack of microstratigraphy (supported by grain size analysis) in the unlithified mud clast breccia (UMCB), while such a microstratigraphy is observed in the laminated orange-red mudstones (LORM) that contribute clasts to the UMCB. Thus, we emphasize the presence and importance of a laterally continuous layer of LORM nature occurring at a level that appears to be the maximum depth of fossil occurrence. This layer is severely broken under extensive accumulation of fossils such as Feature 1 and only evidenced by abundant LORM clasts within and around the fossils. 

      We have completely reworked the geochemical context associated with Feature 1 following the comments of reviewer 4. We described the variations and trends observed in the major oxides separate from trace and rare-earth elements. We used Harker variations plots to assess relationships between these element groups with CaO and Zn, followed by principal component analysis of all elements analyzed. The new geochemical analysis clearly shows that Feature 1 is associated with localized trace element signatures that exist in the sediments only in association with the fossil bones, which suggests lack of postdepositional mobilization of the fossils and sediments. We additionally have included a fuller description of XRF methods. 

      To clarify the relation of all results to the features described in this study, we removed the geochemical and sedimentological samples from other sites within the Dinaledi Subsystem. These localities within the fissure network represent only surface collection of sediment, as no excavation results are available from those sites to allow for comparison in the context of assessing evidence of burial. These were initially included for comparison, but have now been removed to avoid confusion.  

      Micromorphology of sediments:

      Some referees (1, 3, and 4) and other commentators (including Martinón-Torres et al. 2024) have suggested that the previous manuscript was deficient due to an insufficient inclusion of micromorphological analysis of sediments. Because these commentators have emphasized this kind of evidence as particularly important, we review here what we have included and how our revision has addressed this comment. Previous work in the Dinaledi Chamber (Dirks et al., 2015; 2017) included thin section illustrations and analysis of sediment facies, including sediments in direct association with H. naledi remains within the Puzzle Box area. The previous work by Wiersma and coworkers (2020) used micromorphological analysis as one of several approaches to test the formation history of Unit 3 sediments in the Dinaledi Subsystem, leading to the interpretation of autobrecciation of earlier Unit 1 sediment. In the previous version of this manuscript we provided citations to this earlier work. The previous manuscript also provided new thin section illustrations of Unit 3 sediment near Dinaledi Feature 1 to place the disrupted layer of orange sediment (now designated the laminated orange silty mudstone unit) into context. 

      In the new revised manuscript we have added to this information in three ways. First, as noted above in response to reviewer 4, we have revised and added to our discussion of micromorphology within and adjacent to the Dinaledi Feature 1. Second, we have included more discussion in the Supplementary Information of previous descriptions of sediment facies and associated thin section analysis, with illustrations from prior work (CC-BY licensed) brought into this paper as supplementary figures, so that readers can examine these without following the citations. Third, we have included Figure 10 in the manuscript which includes six panels with microtomographic sections from the Hill Antechamber Feature. This figure illustrates the consistency of sub-unit 3b sediment in direct contact with H. naledi skeletal material, including anatomically associated skeletal elements, with previous analyses that demonstrate the angular outlines and chaotic orientations of LORM clasts. It also shows density contrasts of sediment in immediate contact with some skeletal elements, the loose texture of this sediment with air-filled voids, and apparent invertebrate burrowing activity. To our knowledge this is the first application of microtomography to sediment structure in association with a Pleistocene burial feature. 

      To forestall possible comments that the revised manuscript does not sufficiently employ micromorphological observations, or that any one particular approach to micromorphology is the standard, we present here some context from related studies of evidence from other research groups working at varied sites in Africa, Europe, and Asia. Hodgkins et al. (2021) noted: “Only a handful of micromorphological studies have been conducted on human burials and even fewer have been conducted on suspected burials from Paleolithic or hunter-gatherer contexts.” In that study, one supplementary figure with four photomicrographs of thin sections of sediments was presented. Interpretation of the evidence for a burial pit by Hodgkins et al. (2021) noted the more open microstructure of sediment but otherwise did not rely upon the thin section data in characterizing the sediments associated with grave fill. Martinón-Torres et al. (2021) included one Extended Data figure illustrating thin sections of sediments and bone, with two panels showing sediments (the remainder showing bone histology). The micromorphological analysis presented in the supplementary information of that paper was restricted to description of two microfacies associated with the proposed “pit” in that study. That study did carry out microCT scanning of the partially-prepared skeletal remains but did not report any sediment analysis from the microtomographic results. Maloney et al. (2022) reported no micromorphological or thin section analysis. Pomeroy et al. (2020a) included one illustration of a thin section; this study may be regarded as a preliminary account rather than a full description of the work undertaken. Goldberg et al. (2017) analyzed the geoarchaeology of the Roc de Marsal deposits in which possible burial-associated sediments had been fully excavated in the 1960s, providing new morphological assessments of sediment facies; the supplementary information to this work included five scans (not microscans) of sediment thin sections and no microphotographs. Fewlass et al. (2023) presented no thin section or micromorphological illustrations or methods. In summary of this research, we note that in one case micromorphological study provided observations that contributed to the evidence for a pit, in other cases micromorphological data did not test this hypothesis, and many researchers do not apply micromorphological techniques in their particular contexts. 

      Sediment micromorphology is a growing area of research and may have much to provide to the understanding of ancient burial evidence as its standards continue to develop (Pomeroy et al. 2020b). In particular microtomographic analysis of sediments, as we have initiated in this study, may open new horizons that are not possible with more destructive thin-section preparation. In this manuscript, the thin section data reveals valuable evidence about the disruption of sediment structure by features within the Dinaledi Chamber, and microtomographic analysis further documents that the Hill Antechamber Feature reflects similar processes, in addition to possible post-burial diagenesis and invertebrate activity. Following up in detail on these processes will require further analysis outside the scope of this manuscript. 

      Access into the Dinaledi Subsystem:

      Reviewer 1 emphasizes the difficulty of access into the Dinaledi Subsystem as a reason why the burial hypothesis is not parsimonious. Similar comments have been made by several outside commentators who question whether past accessibility into the Dinaledi Subsystem may at one time have been substantially different from the situation documented in previous work. Several pieces of evidence are relevant to these questions and we have included some discussion of them in the Introduction, and additionally include a section in the Supplementary Information (“Entrances to the cave system”) to provide additional context for these questions. Homo naledi remains are found not only within the Dinaledi Subsystem but also in other parts of the cave system including the Lesedi Chamber, which is similarly difficult for non-expert cavers to access. The body plan, mass, and specific morphology of H. naledi suggest that this species would be vastly more suited to moving and climbing within narrow underground passages than living people. On this basis it is not unparsimonious to suggest that the evidence resulted from H. naledi activity within these spaces. We note that the accessibility of the subsystem is not strictly relevant to the hypothesis of cultural burial, although the location of the remains does inform the overall context which may reflect a selection of a location perceived as special in some way. 

      Stuffing bodies down the entry to the subsystem:

      Reviewer 3 suggests that one explanation for the emplacement of articulated remains at the top of the sloping floor of the Hill Antechamber is that bodies were “stuffed” into the chute that comprises the entry point of the subsystem and passively buried by additional accumulation of remains. This was one hypothesis presented in earlier work (Dirks et al. 2015) and considered there as a minimal explanation because it did not entail the entry of H. naledi individuals into the subsystem. The further exploration (Elliott et al. 2021) and ongoing survey work, as well as this manuscript, all have resulted in data that rejects this hypothesis. The revised manuscript includes a section in the results “Deposition upon a talus with passive burial” that examines this hypothesis in light of the data. 

      Recognition of pits:

      Referee 3 and 4 and several additional commentators have emphasized that the recognition of pit features is necessary to the hypothesis of burial, and questioned whether the data presented in the manuscript were sufficient to demonstrate that pits were present. We have revised the manuscript in several ways to clarify how all the different kinds of evidence from the subsystem test the hypothesis that pits were present. This includes the presentation of a minimal definition of burial to include a pit dug by hominins, criteria for recognizing that a pit was present, and an evaluation of the evidence in each case to make clear how the evidence relates to the presence of a pit and subsequent infill. As referee 3 notes, it can be challenging to recognize a pit when sediment is relatively homogeneous. This point was emphasized in the review by Pomeroy and coworkers (2020b), who reflected on the difficulty seeing evidence for shallow pits constructed by hominins, and we have cited this in the main text. As a result, the evidence for pits has been a recurrent topic of debate for most Pleistocene burial sites. However in addition to the sedimentological and contextual evidence in the cases we describe, the current version also reflects upon other possible mechanisms for the accumulation of bones or bodies. The data show that the sedimentary fill associated with the H. naledi remains in the cases we examine could not have passively accumulated slowly and is not indicative of mass movement by slumping or other high-energy flow. To further put these results into context, we added a section to the Discussion that briefly reviews prior work on distinguishing pits in Pleistocene burial contexts, including the substantial number of sites with accepted burial evidence for which no evidence of a pit is present. 

      Extent of articulation and anatomical association:

      We have added significantly greater detail to the descriptions of articulated remains and orientation of remains in order to describe more specifically the configuration of the skeletal material. We also provide 14 figures in main text (13 of them new) to illustrate the configuration of skeletal remains in our data. For the Puzzle Box area, this now includes substantial evidence on the individuation of skeletal fragments, which enables us to illustrate the spatial configuration of remains associated with the DH7 partial skeleton, as well as the spatial position of fragments refitted as part of the DH1, DH2, DH3, and DH4 crania. For Dinaledi Feature 1 and the Hill Antechamber Feature we now provide figures that key skeletal parts as identified, including material that is unexcavated where possible, and a skeletal part representation figure for elements excavated from Dinaledi Feature 1. 

      Archaeothanatology:

      Reviewer 2 suggests that a greater focus on the archaeothanatology literature would be helpful to the analysis, with specific reference to the sequence of joint disarticulation, the collapse of sediment and remains into voids created by decomposition, and associated fragmentation of the remains. In the revised manuscript we have provided additional analysis of the Hill Antechamber Feature with this approach in mind. This includes greater detail and illustration of our current hypothesis for individuation of elements. We now discuss a hypothesis of body disposition, describe the persistent joints and articulation of elements, and examine likely decomposition scenarios associated with these remains. Additionally, we expand our description and illustration of the orientation of remains and degree of anatomical association and articulation within Dinaledi Feature 1. For this feature and for the Hill Antechamber Feature we have revised the text to describe how fracturing and crushing patterns are consistent with downward pressure from overlying sediment and material. In these features, postdepositional fracturing occurred subsequent to the decomposition of soft tissue and partial loss of organic integrity of the bone. We also indicate that the loss by postdepositional processes of most long bone epiphyses, vertebral bodies, and other portions of the skeleton less rich in cortical bone, poses a challenge for testing the anatomical associations of the remaining elements. This is a primary reason why we have taken a conservative approach to identification of elements and possible associations. 

      A further aspect of the site revealed by our analysis is the selective reworking of sediments within the Puzzle Box area subsequent to the primary deposition of some bodies. The skeletal evidence from this area includes body parts with elements in anatomical association or articulation, juxtaposed closely with bone fragments at varied pitch and orientation. This complexity of events evidenced within this area is a challenge for approaches that have been developed primarily based on comparative data from single-burial situations. In these discussions we deepen our use of references as suggested by the referee.   

      Burial positions:

      Reviewer 2 further suggests that illustrations of hypothesized burial positions would be valuable. We recognize that a hypothesized burial position may be an appealing illustration, and that some recent studies have created such illustrations in the context of their scientific articles. However such illustrations generally include a great deal of speculation and artist imagination, and tend to have an emotive character. We have added more discussion to the manuscript of possible primary disposition in the case of the Hill Antechamber Feature as discussed above. We have not created new illustrations of hypothesized burial positions for this revision. 

      Carnivore involvement:

      Referee 1 suggests that the manuscript should provide further consideration of whether carnivore activity may have introduced bones or bodies into the cave system. The reorganized Introduction now includes a review of previous work, and an expanded discussion within the Supplementary Information (“Hypotheses tested in previous work”). This includes a review of literature on the topic of carnivore accumulation and the evidence from the Dinaledi and Lesedi Chamber that rejects this hypothesis. 

      Water transport and mud:

      The eLife referees broadly accepted previous work showing that water inundation or mass flow of water-saturated sediment did not occur within the history of Unit 2 and 3 sediments, including those associated with H. naledi remains. However several outside commentators did refer specifically to water flow or mud flow as a mechanism for slumping of deposits and possible sedimentary covering of the remains. To address these comments we have added a section to the

      Supplementary Information (“Description of the sedimentary deposits of the Dinaledi Subsystem”) that reviews previous work on the sedimentary units and formation processes documented in this area. We also include a subsection specifically discussing the term “mud” as used in the description of the sedimentology within the system, as this term has clearly been confusing for nonspecialists who have read and commented on the work. We appreciate the referees’ attention to the previous work and its terminology.  

      Redescription of areas of the cave system:

      Reviewer 1 suggests that a detailed reanalysis of all portions of the cave system in and around the Dinaledi Subsystem is warranted to reject the hypothesis that bodies entered the space passively and were scattered from the floor by natural (i.e. noncultural) processes. The referee suggests that National Geographic could help us with these efforts. To address this comment we have made several changes to the manuscript. As noted above, we have added material in Supplementary Information to review the geochronology of the Dinaledi Subsystem and nearby Dragon’s Back Chamber, together with a discussion of the connections between these spaces. 

      Most directly in response to this comment we provide additional documentation of the possibility of movement of bodies or body parts by gravity within the subsystem itself. This includes detailed floor maps based on photogrammetry and LIDAR measurement, where these are physically possible, presented in Figures 2 and 3. In some parts of the subsystem the necessary equipment cannot be used due to the extremely confined spaces, and for these areas our maps are based on traditional survey methods. In addition to plan maps we have included a figure showing the elevation of the subsystem floor in a cross-section that includes key excavation areas, showing their relative elevation. All figures that illustrate excavation areas are now keyed to their location with reference to a subsystem plan. These data have been provided in previous publications but the visualization in the revised manuscript should make the relationship of areas clear for readers. The Introduction now includes text that discusses the configuration of the Hill Antechamber, Dinaledi Chamber, and nearby areas, and also discusses the instances in which gravity-driven movement may be possible, at the same time reviewing that gravity-driven movement from the entry point of the subsystem to most of the localities with hominin skeletal remains is not possible. 

      Within the Results, we have added a section on the relationship of features to their surroundings in order to assist readers in understanding the context of these bone-bearing areas and the evidence this context brings to the hypothesis in question. We have also included within this new section a discussion of the discrete nature of these features, a question that has been raised by outside commentators. 

      Passive sedimentation upon a cave floor or within a natural depression:

      Reviewer 3 suggests that the situation in the Dinaledi Subsystem may be similar to a European cave where a cave bear skeleton might remain articulated on a cave floor (or we can add, within a hollow for hibernation), later to be covered in sediment. The reviewer suggests that articulation is therefore no evidence of burial, and suggests that further documentation of disarticulation processes is essential to demonstrating the processes that buried the remains. We concur that articulation by itself is not sufficient evidence of cultural burial. To address this comment we have included a section in the Results that tests the hypothesis that bodies were exposed upon the cave floor or within a natural depression. To a considerable degree, additional data about disarticulation processes subsequent to deposition are provided in our reanalysis of the Puzzle Box area, including evidence for selective reworking of material after burial. 

      Postdepositional movement and floor drains:

      Reviewer 3 notes that previous work has suggested that subsurface floor drains may have caused some postdepositional movement of skeletal remains. The hypothesis of postdepositional slumping or downslope movement has also been discussed by some external commentators (including Martinón-Torres et al. 2024). We have addressed this question in several places within the revised manuscript. As we now review, previous discussion of floor drains attempted to explain the subvertical orientation of many skeletal elements excavated from the Puzzle Box area. The arrangement of these bones reflects reworking as described in our previous work, and without considering the possibility of reworking by hominins, one mechanism that conceivably might cause reworking was downward movement of sediments into subsurface drains. Further exploration and mapping, combined with additional excavation into the sediments beneath the Puzzle Box area provided more information relevant to this hypothesis. In particular this evidence shows that subsurface drains cannot explain the arrangement of skeletal material observed within the Puzzle Box area. As now discussed in the text, the reworking is selective and initiated from above rather than below. This is best explained by hominin activity subsequent to burial. 

      In a new section of the Results we discuss slumping as a hypothesis for the deposition of the remains. This includes discussion of downslope movement within the Hill Antechamber and the idea that floor drains may have been a mechanism for sediment reworking in and around the Puzzle Box area and Dinaledi Feature 1. As described in this section the evidence does not support these hypotheses. 

      Hypothesis testing and parsimony:

      Referees 1 and 3 and the editorial guidance all suggested that a more appropriate presentation would adopt a null hypothesis and test it. The specific suggestion that the null hypothesis should be a natural sedimentary process of deposition was provided not only by these reviewers but also by some outside commentators. To address this comment, we have edited the manuscript in two ways. The first is the addition of a section to the Discussion that specifically discusses hypothesis testing and parsimony as related to Pleistocene evidence of cultural burial. This includes a brief synopsis of recent disciplinary conversations and citation of work by other groups of authors, none of whom adopted this “null hypothesis” approach in their published work. 

      As we now describe in the manuscript, previous work on the Dinaledi evidence never assumed any role for H. naledi in the burial of remains. Reading the reviewer reports caused us to realize that this previous work had followed exactly the “null hypothesis” approach that some suggested we follow. By following this null hypothesis approach, we neglected a valuable avenue of investigation. In retrospect, we see how this approach impeded us from understanding the pattern of evidence within the Puzzle Box area. Thus in the revised manuscript we have mentioned this history within the Discussion and also presented more of the background to our previous work in the Introduction. Hopefully by including this discussion of these issues, the manuscript will broaden conversation about the relation of parsimony to these issues. 

      Language and presentation style:

      Reviewer 4 criticizes our presentation, suggesting that the text “gives the impression that a hypothesis was formulated before data were collected.” Other outside commentators have mentioned this notion also, including Martinón-Torres et al. (2024) who suggest that the study began from a preferred hypothesis and gathered data to support it. The accurate communication of results and hypotheses in a scientific article is a broader issue than this one study. Preferences about presentation style vary across fields of study as well as across languages. We do not regret using plain language where possible. In any study that combines data and methods from different scientific disciplines, the use of plain language is particularly important to avoid misunderstandings where terms may mean different things in different fields. 

      The essential question raised by these comments is whether it is appropriate to present the results of a study in terms of the hypothesis that is best supported. As noted above, we read carefully many recent studies of Pleistocene burial evidence. We note that in each of these studies that concluded that burial is the best hypothesis, the authors framed their results in the same way as our previous manuscript: an introduction that briefly reviews background evidence for treatment of the dead, a presentation of results focused on how each analysis supports the hypothesis of burial for the case, and then in some (but not all) cases discussion of why some alternative hypotheses could be rejected. We do not infer from this that these other studies started from a presupposition and collected data only to confirm it. Rather, this is a simple matter of presentation style. 

      The alternative to this approach is to present an exhaustive list of possible hypotheses and to describe how the data relate to each of them, at the end selecting the best. This is the approach that we have followed in the revised manuscript, as described above under the direction of the reviewer and editorial guidance. This approach has the advantage of bringing together evidence in different combinations to show how each data point rejects some hypotheses while supporting others. It has the disadvantage of length and repetition. 

      Possible artifact:

      We have chosen to keep the description of the possible artifact associated with the Hill Antechamber Feature in the Supplementary Information. We do this while acknowledging that this is against the opinion of reviewer 4, who felt the description should be removed unless the object in question is fully excavated and physically analyzed. The previous version of the manuscript did not rely upon the stone as positive evidence of grave goods or symbolic content, and it noted that the data do not test whether the possible artifact was placed or was intentionally modified. However this did not satisfy reviewer 4, and some outside commentators likewise asserted that the object must be a “geofact” and that it should be removed. 

      We have three arguments against this line of thinking. First, we do not omit data from our reporting. Whether Homo naledi shaped the rock or not, used it as a tool or not, whether the rock was placed with the body or not, it is unquestionably there. Omitting this one object from the report would be simply dishonest. Second, the data on this rock are at 16 micron resolution. While physical inspection of its surface may eventually reveal trace evidence and will enable better characterization of the raw material, no mode of surface scanning will produce better evidence about the object’s shape. Third, the position of this possible artifact within the feature provides significant information about the deposition of the skeletal material and associated sediments. The pitch, orientation, and position of the stone is not consistent with slow deposition but are consistent with the hypothesis that the surrounding sediment was rapidly emplaced at the same time as the articulated elements less than 2 cm away. 

      In the current version, we have redoubled our efforts to provide information about the position and shape of this stone while not presupposing the intentionality of its shape or placement. We add here that the attitude expressed by referee 4 and other commentators, if followed at other sites, would certainly lead to the loss or underreporting of evidence, which we are trying to avoid.  

      Consistency versus variability of behavior:

      As described in the revised manuscript, different features within the Dinaledi Subsystem exhibit some shared characteristics. At the same time, they vary in positioning, representation of individuals and extent of commingling. Other localities within the subsystem and broader cave system present different evidence. Some commentators have questioned whether the patterning is consistent with a single common explanation, or whether multiple explanations are necessary. To address this line of questioning, we have added several elements to the manuscript. We created a new section on secondary cultural burial, discussing whether any of the situations may reflect this practice. In the Discussion, we briefly review the ways in which the different features support the involvement of H. naledi without interpreting anything about the intentionality or meaning of the behavior. We further added a section to the Discussion to consider whether variation among the features reflects variation in mortuary practices by H. naledi. One aspect of this section briefly cites variation in the location and treatment of skeletal remains at other sites with evidence of burial. 

      Grave goods:

      Some commentators have argued that grave goods are a necessary criterion for recognizing evidence of ancient burial. We added a section to the Discussion to review evidence of grave goods at other Pleistocene sites where burial is accepted. 

      References:

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. eLife, 4, e09561. https://doi.org/10.7554/eLife.09561

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. eLife, 6, e24231. https://doi.org/10.7554/eLife.24231

      • Elliott, M., Makhubela, T., Brophy, J., Churchill, S., Peixotto, B., FEUERRIEGEL, E., Morris, H., Van Rooyen, D., Ramalepa, M., Tsikoane, M., Kruger, A., Spandler, C., Kramers, J., Roberts, E., Dirks, P., Hawks, J., & Berger, L. R. (2021). Expanded Explorations of the Dinaledi Subsystem,Rising Star Cave System, South Africa. PaleoAnthropology, 2021(1), 15–22. https://doi.org/10.48738/2021.iss1.68

      • Fewlass, H., Zavala, E. I., Fagault, Y., Tuna, T., Bard, E., Hublin, J.-J., Hajdinjak, M., & Wilczyński, J. (2023). Chronological and genetic analysis of an Upper Palaeolithic female infant burial from Borsuka Cave, Poland. iScience, 26(12). https://doi.org/10.1016/j.isci.2023.108283

      • Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (n.d.). No Sedimentological Evidence for Deliberate Burial by Homo naledi – A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024. https://doi.org/10.48738/202x.issx.xxx

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015. https://doi.org/10.1007/s12520-013-0163-2

      • Maloney, T. R., Dilkes-Hall, I. E., Vlok, M., Oktaviana, A. A., Setiawan, P., Priyatno, A. A. D., Ririmasse, M., Geria, I. M., Effendy, M. A. R., Istiawan, B., Atmoko, F. T., Adhityatama, S., Moffat, I., Joannes-Boyau, R., Brumm, A., & Aubert, M. (2022). Surgical amputation of a limb 31,000 years ago in Borneo. Nature, 609(7927), 547–551. https://doi.org/10.1038/s41586-022-05160-8

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), Article 7857. https://doi.org/10.1038/s41586021-03457-8

      • Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020a). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26. https://doi.org/10.15184/aqy.2019.207

      • Pomeroy, E., Hunt, C. O., Reynolds, T., Abdulmutalb, D., Asouti, E., Bennett, P., Bosch, M., Burke, A., Farr, L., Foley, R., French, C., Frumkin, A., Goldberg, P., Hill, E., Kabukcu, C., Lahr, M. M., Lane, R., Marean, C., Maureille, B., … Barker, G. (2020b). Issues of theory and method in the analysis of Paleolithic mortuary behavior: A view from Shanidar Cave. Evolutionary Anthropology: Issues, News, and Reviews, 29(5), 263–279. https://doi.org/10.1002/evan.21854

      • Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108

      • Wiersma, J. P., Roberts, E. M., & Dirks, P. H. G. M. (2020). Formation of mud clast breccias and the process of sedimentary autobrecciation in the hominin-bearing (Homo naledi) Rising Star Cave system, South Africa. Sedimentology, 67(2), 897–919. https://doi.org/10.1111/sed.12666

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Author response:

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The core idea is to combine the Vector Autoregressive model that is often used to infer Granger-causal connectivity in brain data with an encoding model that maps the features of a sensory stimulus to that brain data. The authors do a nice job of explaining the framework. And then they demonstrate its utility through some simulations and some analysis of real intracranial EEG data recorded from subjects as they watched movies. They infer from their analyses that the functional connectivity in these brain recordings is essentially unaltered during movie watching, that accounting for the driving movie stimulus can protect one against misidentifying brain responses to the stimulus as functional connectivity, and that recurrent brain activity enhances and prolongs the putative neural responses to a stimulus.

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. Overall, I thought this was an interesting manuscript with some rich and intriguing ideas. That said, I had some concerns also - one potentially major - with the inferences drawn by the authors on the analyses that they carried out.

      Main comments:

      (1) My primary concern with the way the manuscript is written right now relates to the inferences that can be drawn from the framework. In particular, the authors want to assert that, by incorporating an encoding model into their framework, they can do a better job of accounting for correlated stimulus-driven activity in different brain regions, allowing them to get a clearer view of the underlying innate functional connectivity of the brain. Indeed, the authors say that they want to ask "whether, after removing stimulus-induced correlations, the intrinsic dynamic itself is preserved". This seems a very attractive idea indeed. However, it seems to hinge critically on the idea of fitting an encoding model that fully explains all of the stimulus-driven activity. In other words, if one fits an encoding model that only explains some of the stimulus-driven response, then the rest of the stimulus-driven response still remains in the data and will be correlated across brain regions and will appear as functional connectivity in the ongoing brain dynamics - according to this framework. This residual activity would thus be misinterpreted. In the present work, the authors parameterize their stimulus using fixation onsets, film cuts, and the audio envelope. All of these features seem reasonable and valid. However, they surely do not come close to capturing the full richness of the stimuli, and, as such, there is surely a substantial amount of stimulus-driven brain activity that is not being accounted for by their "B" model and that is being absorbed into their "A" model and misinterpreted as intrinsic connectivity. This seems to me to be a major limitation of the framework. Indeed, the authors flag this concern themselves by (briefly) raising the issue in the first paragraph of their caveats section. But I think it warrants much more attention and discussion.

      We agree. One can never be sure that all stimulus induced correlation is accounted for. We now formulate our question more cautiously: 

      “We will ask here whether, after removing some of the stimulus-induced correlations, the intrinsic dynamic is similar between stimulus and rest conditions.”

      We also highlight that one may expect the opposite result of what we found: 

      “A general observation of these studies is that a portion of the functional connectivity is preserved between rest and stimulus conditions, while some aspects are altered by the perceptual task [12,16], sometimes showing increased connectivity during the stimulus.[15].” 

      We have added a number of additional features (acoustic edges, fixation novelty, and motion) and more carefully characterize how much “connectivity” each one explains in the neural data: 

      “Removing any of the input features increased the effect size of recurrent connections compared to a model with all features (Fig. S4). We then cumulatively added each feature to the VARX model. Effect size monotonically decreases with each feature added (Fig. 3F). Decreases of effect size are significant when adding film cuts (ΔR=-3.6*10<sup>-6</sup>, p<0.0001, N=26, FDR correction, α=0.05) and the sound envelope (ΔR=-3.59*10<sup>-6</sup>, p=0.002, N=26, FDR correction, α=0.05). Thus, adding more input features progressively reduces the strength of recurrent “connections”.”

      We also added more data to the analysis comparing movies vs rest. We now use 4 different movie segments instead of 1 and find reduced recurrent connectivity during movies: 

      “The number of significant recurrent connections in  were significantly reduced during  movie watching compared to rest (Fig. 4C, fixed effect of stimulus: beta = -3.8*10<sup>-3</sup>, t(17) = -3.9, p<0.001), as is the effect size R (Fig. 4D, fixed effect of stimulus: beta = -2.5*10<sup>-4</sup>, t(17) = -4.1, p<0.001).”

      The additional analysis is described in the Methods section:

      “To compare recurrent connectivity between movies and the resting-state, we compute VARX models in four different movie segments of 5 minutes length to match the length of the resting state recording. We use the first and second half of ‘Despicable Me English’, the first half of ‘Inscapes’ and one of the ‘Monkey’ movies. 18 patients include each of these recordings. For each recording in each patient we compute the fraction of significant channels (p<0.001) and average the effect size R across all channel pairs, excluding the diagonal. We test the difference between movies and resting-state with linear mixed-effect models with stimulus as fixed effect (movie vs rest), and patient as random effect, using matlab’s fitlme() routine.”

      We had already seen this trend of decreasing connectivity during movie watching before, and reported on it cautiously as “largely unaltered”. We updated the Abstract correspondingly from “largely unaltered” to “reduced”: 

      “We also find that the recurrent connectivity during rest is reduced during movie watching.”

      We mentioned this possibility in the Discussion before, namely, that additional input features may reduce recurrent connectivity in the model, and therefore show a difference. We discuss this result now as follows: 

      “The stimulus features we included in our model capture mostly low-level visual and auditory input. It is possible that regressing out a richer stimulus characterization would have removed additional stimulus-induced correlation. While we do not expect that this would change the overall effect of a reduced number of “connections” during movie watching compared to resting state, the interpretation of changes in specific connections will be affected by the choice of features. For example, in sensory cortices, higher recurrent connectivity in the LFP during rest would be consistent with the more synchronized state we saw in rest, as reflected by larger oscillatory activity. Synchronization in higher-order cortices, however, is expected to be more strongly influenced by semantic content of external input.”

      In the Discussion we expand on what might happen if additional stimulus features were to be included into the model:  

      “Previous literature does often not distinguish between intrinsic dynamics and extrinsic effects. By factoring out some of the linear effects of the external input we conclude here that recurrent connectivity is reduced in average. From our prior work49, we know that the stimulus features we included here capture a substantial amount of variance across the brain in intracranial EEG. Arguably, however, the video stimuli had rich semantic information that was not captured by the low-level features used here. Adding such semantic features could have further reduced shared variance, and consequently further reduced average recurrent connectivity in the model.”

      “Similarities and differences between rest and movie watching conditions reported previously, do not draw a firm conclusion as to whether overall “functional connectivity” is increased or reduced. Results seem to depend on the time scale of neural activity analyzed, and the specific brain networks [12,16,63]. However, in fMRI, the conclusion seems to be that functional connectivity during movies is stronger than during rest[15], which likely results from stimulus induced correlations. The VARX model can remove some of the effects of these stimuli, revealing that average recurrent connectivity may be reduced rather than increased during stimulus processing.”

      And in the conclusion we now write: 

      “The model revealed a small but significant decrease of recurrent connectivity when watching movies.”

      (2) Related to the previous comment, the authors make what seems to me to be a complex and important point on page 6 (of the pdf). Specifically, they say "Note that the extrinsic effects captured with filters B are specific (every stimulus dimension has a specific effect on each brain area), whereas the endogenous dynamic propagates this initial effect to all connected brain areas via matrix A, effectively mixing and adding the responses of all stimulus dimensions. Therefore, this factorization separates stimulus-specific effects from the shared endogenous dynamic." It seems to me that the interpretation of the filter B (which is analogous to the "TRF") for the envelope, say, will be affected by the fact that the matrix A is likely going to be influenced by all sorts of other stimulus features that are not included in the model. In other words, residual stimulus-driven correlations that are captured in A might also distort what is going on in B, perhaps. So, again, I worry about interpreting the framework unless one can guarantee a near-perfect encoding model that can fully account for the stimulus-driven activity. I'd love to hear the authors' thoughts on this. (On this issue - the word "dominates" on page 12 seems very strong.)

      This is an interesting point we had not thought about. After some theoretical considerations and some empirical testing we conclude that the effect of missing inputs is relevant, but can be easily anticipated. 

      We have added the following to the Results section explaining and demonstrated empirically the effects of adding features and signals to the model: 

      “As with conventional linear regression, the estimate in B for a particular input and output channel is not affected by which other signals are included in or , provided those other inputs are uncorrelated. We confirmed this here empirically by removing dimensions from (Fig. S11A), and by adding uncorrelated input to (Fig. S11B, adding fixation onset does not affect the estimate for auditory envelope responses). In other words, to estimate B, we do not require all possible stimulus features and all brain activity to be measured and included in the model. In contrast, B does vary when correlated inputs are added to (Fig. S11C, adding acoustic edges changes the auditory envelope response). Evidently the auditory envelope and acoustic edges are tightly coupled in time, whereas fixation onset is not. When a correlated input is missing (acoustic edges) then the other input (auditory envelope) absorbs the correlated variance, thus capturing the combined response of both.”

      (3) Regarding the interpretation of the analysis of connectivity between movies and rest... that concludes that the intrinsic connectivity pattern doesn't really differ. This is interesting. But it seems worth flagging that this analysis doesn't really account for the specific dynamics in the network that could differ quite substantially between movie watching and rest, right? At the moment, it is all correlational. But the dynamics within the network could be very different between stimulation and rest I would have thought.

      As discussed above, with more data and additional stimulus features we now see detectable changes in the connectivity. The example in Figure 4G also shows that specific connections may change in different directions, while overall the strength of connections slightly decreases during movie watching compared to rest. We added the following to the results:

      “While the effect size decreases on average, there is some variation across different brain areas (Fig. 4E-G).”

      But even if the connectivity were unchanged, the activity on this network can be different with varying inputs. We actually also saw that there were changes in the variability of activity (Figs. 6 and S13) that may point to non-linear effects. It seems that injecting the input will cause an overall change in power, which can be explained by a relatively simple non-linear gain adaptation. These effects are already discussed at some length in the paper. 

      (4) I didn't really understand the point of comparing the VARX connectivity estimate with the spare-inverse covariance method (Figure 2D). What was the point of this? What is a reader supposed to appreciate from it about the validity or otherwise of the VARX approach?

      We added the following motivation and clarification on this topic: 

      “To test the descriptive validity [43] of the VARX model we follow the approach of recovering structural connectivity from functional activity in simulation. [44] Specifically, we will compare the recurrent connectivity A derived from brain activity simulated assuming a given structural connectivity, i.e. we ask, can the VARX model recover the underlying structural connectivity, at least in a simulated whole-brian model with known connectivity? … For comparison, we also used the sparse-inverse covariance method to recover connectivity from the correlation matrix (functional connectivity). This method is considered state-of-the-art as it is more sensitive than other methods in detecting structural connections [48]”

      (5) I think the VARX model section could have benefitted a bit from putting some dimensions on some of the variables. In particular, I struggled a little to appreciate the dimensionality of A. I am assuming it has to involve both time lags AND electrode channels so that you can infer Granger causality (by including time) between channels. Including a bit more detail on the dimensionality and shape of A might be helpful for others who want to implement the VARX model.

      Your assumption is correct. We added the following to make this easier for readers: 

      “Therefore, A  has dimensions B has dimensions , where are the dimensions of and respectively.”

      (6) A second issue I had with the inferences drawn by the authors was a difficulty in reconciling certain statements in the manuscript. For example, in the abstract, the authors write "We find that the recurrent connectivity during rest is largely unaltered during movie watching." And they also write that "Failing to account for ... exogenous inputs, leads to spurious connections in the intrinsic "connectivity".

      Perhaps this segment of the abstract needed more explanation. To enhance clarity we have also changed the ordering of the findings. Hopefully this is more clear now: 

      “This model captures the extrinsic effect of the stimulus and separates that from the intrinsic effect of the recurrent brain dynamic. We find that the intrinsic dynamic enhances and prolongs the neural responses to scene cuts, eye movements, and sounds. Failing to account for these extrinsic inputs, leads to spurious recurrent connections that govern the intrinsic dynamic. We also find that the recurrent connectivity during rest is reduced during movie watching.”

      Reviewer #2 (Public review):

      Summary:

      The authors apply the recently developed VARX model, which explicitly models intrinsic dynamics and the effect of extrinsic inputs, to simulated data and intracranial EEG recordings. This method provides a directed method of 'intrinsic connectivity'. They argue this model is better suited to the analysis of task neuroimaging data because it separates the intrinsic and extrinsic activity. They show: that intrinsic connectivity is largely unaltered during a movie-watching task compared to eyes open rest; intrinsic noise is reduced in the task; and there is intrinsic directed connectivity from sensory to higher-order brain areas.

      Strengths:

      (1) The paper tackles an important issue with an appropriate method.

      (2) The authors validated their method on data simulated with a neural mass model.

      (3) They use intracranial EEG, which provides a direct measure of neuronal activity.

      (4) Code is made publicly available and the paper is written well.

      Weaknesses:

      It is unclear whether a linear model is adequate to describe brain data. To the author's credit, they discuss this in the manuscript. Also, the model presented still provides a useful and computationally efficient method for studying brain data - no model is 'the truth'.

      We fully agree and have nothing much to add to this, except to highlight the benefit of a linear model even as explanation for non-linear phenomena: 

      “The [noise-quenching] effect we found here can be explained by a VARX model with the addition of a divisive gain adaptation mechanism … The noise-quenching result and its explanation via gain adaptation shows the benefit of using a parsimonious linear model, which can suggest nonlinear mechanisms as simple corrections from linearity.”

      Appraisal of whether the authors achieve their aims:

      As a methodological advancement highlighting a limitation of existing approaches and presenting a new model to overcome it, the authors achieve their aim. Generally, the claims/conclusions are supported by the results.

      The wider neuroscience claims regarding the role of intrinsic dynamics and external inputs in affecting brain data could benefit from further replication with another independent dataset and in a variety of tasks - but I understand if the authors wanted to focus on the method rather than the neuroscientific claims in this manuscript.

      We fully agree. We added the following to the Discussion section:

      “Future studies should test if our findings replicate in an independent iEEG datasets, including active tasks and whether they generalize to other neuroimaging modalities.”

      Impact:

      The authors propose a useful new approach that solves an important problem in the analysis of task neuroimaging data. I believe the work can have a significant impact on the field.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Did you mean "less" or "fewer" in the following sentence "..larger values lead to overfitting, i.e. less significant connections..."?

      We mean fewer. Thanks for catching this. 

      (2) I didn't see any equations showing how the regularization parameter lambda is incorporated into the framework.

      We prefer the math and details of the algorithm to an earlier paper that has now been published. Instead we added the following clarification: 

      “The VARX models were fitted to data with the matlab version of the code31 using conventional L2-norm regularization. The corresponding regularization parameter was set to 𝜆=0.3.”

      (3) I think some readers of this might struggle to understand the paragraph beginning

      "Connectivity plots are created with nilearn's plot_connectome() function...". It's all quite opaque for the uninitiated.

      Agreed. We now write more simply: 

      “Connectivity plots in Fig. 4 were created with routines from the nilearn toolbox [51].”

      (4) The paragraph beginning "The length of responses for Figure 5..." is also very opaque and could do with being explained more fully. Or this text could be removed from the methods and incorporated into the relevant results section where you actually discuss this analysis.

      Thank you for flagging this. We expand on the details in the Methods as follows: 

      “The length of responses for each channel in B and H to external inputs in Fig. 5 is computed with Matlab's findpeaks() function. This function returns the full-width at half of the peak maximum minus baseline. Power in each channel is computed as the squares of the responses averaged over the time window that was analyzed (0-0.6s).”

      (5) I think adding some comments to the text or caption related to Figures 3C and 3D would be helpful so readers can understand these numbers a bit better. One seems to be the delta log p value and the other is the delta ratio. What does positive or negative mean? Readers might appreciate a little more help.

      We expanded it as follows, hopefully this helps: 

      “C) difference of log for VAX model without minus with inputs (panel A - B). Both models are fit to the same data. D) Thresholding panels A and B at p<0.0001 gives a fraction of significant connections. Here we show the fraction of significant channels for models with and without input. Each line is a patient with color indicating increase or decrease  E) Mean over all channels for VARX models with and without inputs. Each line is a patient.”

      (6) It is not clear what the colors mean in Figures 4 E, F, G.

      We updated the color scheme for those figure panels and carefully explained it in the caption. Please see the manuscript for updated figure 4.   

      (7) It might be nice to slightly unpack what you mean by the "variability of the internal dynamic" and why it can be equated with the power of the innovation process.

      In the methods we added the following clarification right after defining the VARX model: 

      “The innovation process captures the internal variability of the model. Without it, repeating the same input would always result in a fixed deterministic output .”

      In the results section we added the following: 

      “As a metric of internal variability we measured the power of the intrinsic innovation process , which captures the unobserved “random” brain activity which leads to variations in the responses.”

      (8) Typos etc.

      a) "... has been attributed to variability of ongoing dynamic"

      b) The manuscript refers to a Figure 3G, but there is no Figure 3G.

      c) n_a = n_a = 1. Is that a typo?

      d) fiction

      Thank you for catching these. We fixed them. 

      Reviewer #2 (Recommendations for the authors):

      (1) I'm curious about the authors' opinions on the conditions studied. Naively, eyes open rest and passive movie watching seem like similar conditions - were the authors expecting to see a difference with VARX? Do the authors expect that they would see bigger differences when there is a larger difference in sensory input, e.g. eyes closed rest vs movie watching? Given the authors are arguing the need to explicitly model external inputs, a real data example contrasting two very different external inputs might better demonstrate the model's utility.

      Thank you for this suggestion. We added an analysis of eyes-closed rest recordings, available in 8 patients (Fig. S8). The difference between movie and rest is indeed more pronounced than for eyes open rest. The result is described in the methods:

      “In a subset of patients with eyes-closed resting state we find the same effect, that is qualitatively more pronounced (Fig. S8).”

      This complements our updated finding of a difference between movie and eyes-open rest that does show a significant difference after adding more data to this analysis. The results have been updated as following

      “The number of significant recurrent connections in  were significantly reduced during  movie watching compared to rest (Fig. 4C, fixed effect of stimulus:

      beta = -3.8*10<sup>-3</sup>, t(17) = -3.9, p<0.001), as is the effect size R (Fig. 4D, fixed effect of stimulus: beta = -2.5*10<sup>-4</sup>, t(17) = -4.1, p<0.001).”

      The abstract has been updated accordingly:

      “We also find that the recurrent connectivity during rest is reduced during movie watching.”

      (2) It would also have been interesting to see how the proposed model compares to DCM - however, I understand if the authors wanted to focus on their model rather than a comparison with other models.

      We did not try the DCM for a number of reasons. 1) it does not allow for delays in the model dynamic (i.e. the entire time course of the response has to be captured by the recurrent dynamic of a single time step A). 2. It is computationally prohibitive and would not allow us to analyze large channel counts. 3. The available code is custom made for fMRI or EEG analysis with very specified signal generation models that do not obviously apply to iEEG. We added the following to the Discussion of the CDM:  

      “Similar to the VARX model, DCM includes intrinsic and extrinsic effects A and B. However, the modeling is limited to first-order dynamics (i.e. η<sub>a</sub>=η<sub>b</sub>=1). Thus, prolonged responses have to be entirely captured with a first-order recurrent A. … In contrast, here we have analyzed up to 300 channels per subject across the brain, which would be prohibitive with DCM. By analyzing a large number of recordings we were able to draw more general conclusions about whole-brain activity.”

      (3) I believe improving the consistency of the terminology used would improve the manuscript:

      a) Intrinsic dynamics vs intrinsic connectivity vs recurrent connectivity:

      - The term 'intrinsic dynamic' is first introduced in paragraph 3 of the introduction. An explicit definition of is meant by this term would benefit the manuscript.

      - Sometimes the terminology changes to 'intrinsic connectivity' or 'recurrent connectivity'. An explicit definition of these terms (if they refer to different things) would also benefit the manuscript.

      We had used the term “intrinsic” and “recurrent” interchangeably. We now try to mostly say “intrinsic dynamic” when we talk about the more general phenomenon or recurrent brain dynamic, while using “recurrent connectivity” when we refer to the model parameters A. 

      We provide now a definition already at the start of the Abstract: 

      “Sensory stimulation of the brain reverberates in its recurrent neural networks. However, current computational models of brain activity do not separate immediate sensory responses from this intrinsic dynamic. We apply a vector-autoregressive model with external input (VARX), combining the concepts of “functional connectivity” and “encoding models”, to intracranial recordings in humans. This model captures the extrinsic effect of the stimulus and separates that from the intrinsic effect of the recurrent brain dynamic.”

      And at the start of the introduction: 

      “The primate brain is highly interconnected between and within brain areas. … We will refer to the dynamic driven by this recurrent architecture as the intrinsic dynamic of the brain.”

      b) Intrinsic vs Endogenous and Extrinsic vs Exogenous:

      - Footnote 1 defines the 'intrinsic' and 'extrinsic' terminology.

      - However, there are instances where the authors switch back to endogenous/exogenous.

      - Methods section: "Overall system response", paragraph 2.

      - Results section: "Recurrent dynamic enhances and prolongs stimulus responses".

      - Conclusions section.

      With a foot in both neuroscience and systems identification, it’s a hard habit to break. Thanks for catching it. We searched and replaced all instances of endogenous and exogenous.  

      (4) Methods:

      a) The model equation would be clearer if the convolution was written out fully. (I had to read reference 1 to understand the model.).

      We now spell out the full equation and hope it's not too cumbersome to read:  

      “For the th signal channel the recurrence of the VARX model is given by: 

      b) How is an individual dimension omitted in the reduced model, are the values in the y, x set to zero?

      No, it is actually removed from the linear prediction. We added: 

      “… omitted from the prediction …”

      c) "The p-value quantifies the probability that a specific connection in A or B is zero" - for each of n_a/n_b filters?

      d) It should be clarified that D is a vector.

      We hope the following clarification addresses both these questions: 

      “The p-value quantifies the probability that a specific connection in either A or B is zero. Therefore, D,P and R<sup>2</sup> all have dimensions or for A or B  respectively.”

      (5) Results:

      a) Stimulus-induced reduction of noise in the intrinsic activity: would be good to define the frequency range for theta and beta in paragraph 2.

      Added. 

      b) Neural mass model simulation:

      - A brief description of what was simulated is needed.

      We basically ran the sample code of the neurolib library. With that in mind maybe the description we already provide is sufficient:  

      “We used the default model simulation of the neurolib python library (using their sample code for the “ALNModel”), which is a mean-field approximation of adaptive exponential integrate-and-fire neurons. This model can generate simulated mean firing rates in 80 brain areas based on connectivity and delay matrices determined with diffusion tensor imaging (DTI). We used 5 min of “resting state” activity (no added stimulus, simulated at 0.1ms resolution, subsequently downsampled to 100Hz).”

      - It's not clear to me why the A matrix should match the structural connectivity.

      We added the following introduction to make the purpose of this simulation clear:

      “To test the descriptive validity [43] of the VARX model we follow the approach of recovering structural connectivity from functional activity in simulation. [44] Specifically, we will compare the “connectivity” A derived from brain activity simulated assuming a given structural connectivity, i.e. we ask, can the VARX model recover the underlying structural connectivity, at least in a simulated whole-brian model with known connectivity?”

      - It would be interesting to see the inferred A matrix.

      We added a Supplement figure for this and the following: 

      “The VARX model was estimated with n<sub>a</sub>=2, and no input. The resulting estimate for A is dominated by the diagonal elements that capture the autocorrelation within brain areas (Fig. S1).”

      - How many filters were used here?

      No input filters were used for this simulation:

      We used 5 min of “resting state” activity (no added stimulus, simulated at 0.1ms resolution, subsequently downsampled to 100Hz). 

      c) Intracranial EEG:

      - It's not clear how overfitting was measured and how the selection of the number of filters (n_a and n_b) was done.

      We have removed the statement about overfitting. Mostly the word is used in the context of testing on a separate dataset, which we did not do here. So this “overfitting” can be confusing. Instead we used the analytic p-value as indication that a larger model order is not supported by the data. We write this now as follows: 

      “Increasing the number of delays n<sub>a</sub>, increases estimated effect size R (Fig. S3A,B), however, larger values lead to fewer significant connections (Fig. S3C). Significance (p-value) is computed analytically, i.e. non-parametrically, based on deviance. Values around n<sub>a</sub>=6 time delays appear to be the largest model order supported by this statistical analysis.”

      d) Figure 1:

      - Typo: "auto-regressive"

      Fixed. Thanks for catching that. 

      - LFP and BHA in C are defined much later in the text, would be useful to define these in the caption. o Shouldn't B (the VARX model parameter) be a 2x3 matrix for different time lags?

      Hopefully the following clarifications address both these points: 

      “C) Example of neural signal y(t) recorded at a single location in the brain. We will analyze local field potentials (LFP) and broad-band high frequency activity (BHA) in separate analyses.  D) Examples of filters B for individual feed-forward connections between an extrinsic input and a specific recording location in the brain.”

      (6) Discussion:

      I could not find Muller et al 2016 listed in the references.

      Added. Thanks for catching that omission. 

      Additional edits prompted by reviewers, but not in the context of any particular comment.

      While reviewers did not raise this following point, we felt the need clarify the terminology in the Methods to make sure there is not misunderstanding in the proposed interpretation of the model: 

      “We will refer to the filters in matrix A and B and as recurrent and feed-forward “connections”, but avoid the use of the word “causal” which can be misleading.”

      In addressing questions to Figure 4, we noticed that there is quite a bit of variability across patients, so the analysis for Figure 4 and 7 which combines data across patients now accounts for a random effect of patient (previously we have used mean values for repeated measures). We added the following to the Methods to explain this:

      “To compare recurrent connectivity between movies and the resting-state (in Fig. 4), we compute VARX models in four different movie segments of 5 minutes length to match the length of the resting state recording. We use the first and second half of ‘Despicable Me English’, the first half of ‘Inscapes’ and one of the ‘Monkey’ movies. 18 patients include each of these recordings. For each recording in each patient we compute the fraction of significant channels (p<0.001) and average the effect size R across all channel pairs, excluding the diagonal. We test the difference between movies and resting-state with linear mixed-effect models with stimulus as fixed effect (movie vs rest), and patient as random effect (to account for the repeated measures for the different video segments), using matlab’s fitlme() routine. For the analysis of asymmetry of recurrent connectivity (in Fig. 4) we also used a mixed-effect model with T1w/T2w ratio as fixed effect and patients as random effect (to account for the repeated measures in multiple brain locations).”

      All analyses were rerun with more data (eyes closed resting) and 2 additional patients that have become available since the first submission. Therefore all figures and statistics have been updated throughout the paper. Other than the difference between movies and resting state which was trending before and is now significant, no results changed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Comment 0: Summary: This work presents an Interpretable protein-DNA Energy Associative (IDEA) model for predicting binding sites and affinities of DNA-binding proteins. Experimental results demonstrate that such an energy model can predict DNA recognition sites and their binding strengths across various protein families and can capture the absolute protein-DNA binding free energies.

      We appreciate the reviewer’s careful assessment of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Comment 1: Strengths: (1) The IDEA model integrates both structural and sequence information, although such an integration is not completely original. (2) The IDEA predictions seem to have agreement with experimental data such as ChIP-seq measurements.

      We appreciate the reviewer’s positive comments on the strength of the paper.

      Comment 2: Weaknesses: (1) The authors claim that the binding free energy calculated by IDEA, trained using one MAX-DNA complex, correlates well with experimentally measured MAX-DNA binding free energy (Figure 2) based on the reported Pearson Correlation of 0.67. However, the scatter plot in Figure 2A exhibits distinct clustering of the points and thus the linear fit to the data (red line) may not be ideal. As such. the use of the Pearson correlation coefficient that measures linear correlation between two sets of data may not be appropriate and may provide misleading results for non-linear relationships.

      We thank the reviewer for the insightful comments and agree that a linear fit between our predictions and the experimental data may not be the best measure of performance. The primary utility of the IDEA model is to predict high-affinity DNA-binding sequences for a given DNA-binding protein by assessing the relative binding affinities across different DNA sequences. In this regard, the ranked order of predicted sequence binding affinities serves as a better metric for evaluating the success of this model. To evaluate this, we calculated both Spearman’s rank correlation coefficient, which does not rely on linear correlation, and the Pearson correlation coefficient between our predictions and the experimental results. As shown in Figure 2, our computation shows a Spearman’s rank correlation coefficient of 0.65 for the MAX-based predictions using one MAX-DNA complex (PDB ID: 1HLO), supporting the model’s capability to effectively distinguish strong from weak binders.

      Although our model generally captures the relative binding affinities across different DNA sequences, its predictive accuracy diminishes for low-affinity sequences (Figure 2).

      This could be due to two limitations of the current modeling framework: (1) The model is residue-based and estimates binding free energy as the additive sum of contributions from individual contacting amino-acid-nucleotide pairs. This assumption does not account for cooperative effects caused by simultaneous changes at multiple nucleotide positions. One potential direction to further improve the model would be to use a finergrained representation by incorporating more atom types within contacting residues, and to use a many-body potential to better capture cooperative effects from multiple mutations. (2) The model assumes that the target DNA adopts the same binding interface as in the reference crystal structure. However, sequence-dependent DNA shape has been shown to be important in determining protein-DNA binding affinity [1]. To address this limitation, a future direction is to use deep-learning-based methods to incorporate predicted DNA shape or protein-DNA complex structures based on their sequences [2, 3] into our model prediction.

      To fully evaluate the predictive power of IDEA, we have included Spearman’s rank correlation coefficient for every correlation plot in this manuscript and have updated the relevant texts. Across all our analyses, the Spearman’s rank correlation coefficients reveal similar predictive performance as the Pearson correlation coefficients. Additionally, we have included in our discussion the current limitations of our model and potential directions for future improvement.

      We have edited our Discussion Section to include a discussion on the limitations of the current model. Specifically, the added texts are:

      “Although IDEA has proved successful in many examples, it can be improved in several aspects. The model currently assumes the training and testing sequences share the same protein-DNA structure. While double-stranded DNA is generally rigid, recent studies have shown that sequence-dependent DNA shape contributes to their binding specificity [1, 2, 4]. To improve predictive accuracy, one could incorporate predicted DNA shapes or structures into the IDEA training protocol. In addition, the model is residue-based and evaluates the binding free energy as the additive sum of contributions from individual amino-acid-nucleotide contacts. This assumption does not account for cooperative effects that may arise from multiple nucleotide changes. A potential refinement could utilize a finer-grained model that includes more atom types within contacting residues and employs a many-body potential to account for such cooperative effects.”

      Comment 3: (2) In the same vein, the linear Pearson Correlation analysis performed in Figure 5A and the conclusion drawn may be misleading.

      We thank the reviewer for the insightful comments. As noted in our response to the previous comment, we have added Spearman’s rank correlation coefficient in addition to the Pearson correlation coefficient to all correlation plots, including Figure 5A.

      Comment 4: (3) The authors included the sequences of the protein and DNA residues that form close contacts in the structure in the training dataset, whereas a series of synthetic decoy sequences were generated by randomizing the contacting residues in both the protein and DNA sequences. In particular, synthetic decoy binders were generated by randomizing either the DNA (1000 sequences) or protein sequences (10,000 sequences) from the strong binders. However, the justification for such randomization and how it might impact the model’s generalizability and transferability remain unclear.

      We thank the reviewer for the insightful comments. The number of randomizing sequences was chosen to strike a balance between sufficient sequence coverage and computational feasibility. Because proteins have more types of amino acids than four nucleotides in DNA, we utilized more protein decoy sequences than DNA decoys. To examine the robustness of our choice against different number of decoy sequences, we repeated the transferability analysis within the bHLH superfamily (Figure 3A) and the generalizability analysis across 12 protein families (Figure 2E) using two additional decoy sequence combinations: (1) 1000 DNA sequences and 1000 protein sequences; (2) 100 DNA sequences and 1000 protein sequences. As shown in Figure S15, we achieved similar results to those reported using the original decoy set, demonstrating the robustness of our model prediction against the variations in the number of decoys. We have included this figure as Figure S15.

      Comment 5: (4) The authors performed Receiver Operating Characteristic (ROC) analysis and reported the Area Under the Curve (AUC) scores in order to quantitate the successful identification of the strong binders by IDEA. It would be beneficial to analyze the precision-recall (PR) curve and report the PRAUC metric which could be more robust.

      We agree with the reviewer that more robust statistical metrics should be used to evaluate our model’s performance. We have included the PRAUC score as an additional evaluation metric of the model’s performance. Due to a significant imbalance in the number of strong and weak binders from the experimental data [5], where the experimentally identified strong binders are far fewer than the weak binders, we reweighted the sample to achieve a balanced evaluation [6], using 0.5 as the baseline for randomized prediction. As shown in Figure S5, IDEA achieves successful predictions in 18 out of 22 cases, demonstrating its predictive accuracy.

      The updated PRAUC result has been included as Figure S5 in the manuscript. We have also included the detailed precision-recall curves for each case in Figure S4.

      In addition, we have provided PRAUC scores for comparing the performance of IDEA with other models, and have summarized these results in Table S2.

      Reviewer #2:

      Comment 0: Summary: Zhang et al. present a methodology to model protein-DNA interactions via learning an optimizable energy model, taking into account a representative bound structure for the system and binding data. The methodology is sound and interesting. They apply this model for predicting binding affinity data and binding sites in vivo. However, the manuscript lacks discussion of/comparison with state-of-the-art and evidence of broad applicability. The interpretability aspect is weak, yet over-emphasized.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Comment 1: Strengths: The manuscript is well organized with good visualizations and is easy to follow. The methodology is discussed in detail. The IDEA energy model seems like an interesting way to study a protein-DNA system in the context of a given structure and binding data. The authors show that an IDEA model trained on one system can be transferred to other structurally similar systems. The authors show good performance in discriminating between binding-vs-decoy sequences for various systems, and binding affinity prediction. The authors also show evidence of the ability to predict genome-wide binding sites.

      We appreciate the reviewer’s strong assessment of the strengths of this paper. We have further refined our Methods Section to ensure all modeling details are clearly presented.

      Comment 2: Weaknesses: An energy-based model that needs to be optimized for specific systems is inherently an uncomfortable idea. Is this kind of energy model superior to something like Rosetta-based energy models, which are generally applicable? Or is it superior to family-specific knowledge-based models? It is not clear.

      We thank the reviewer for the insightful comments. The protein-DNA energy model facilitates the calculation of protein-DNA binding free energy based on protein-DNA structures and sequences. Because this model is optimized using the structure-sequence relationship of given protein-DNA complexes, it features specificity based on the conserved structural interface characteristic of each protein family. Because of that, its predictive accuracy depends on the degree of protein-DNA interface similarity between the training and target protein-DNA pairs, and is distinct from a general protein-DNA energy model, such as a Rosetta-based energy model. The model has some connections to the familyspecific energy model. As shown in Author response image 1, systems belonging to the same protein superfamily (MAX and PHO4) exhibit similar patterns in their learned energy models, in contrast to those from a different superfamily (PDX1).

      Author response image 1:

      Comparison of learned energy models for different protein-DNA complexes: MAX (A), PHO4 (B), and PDX1 (C). MAX and PHO4 are members of the Helixloop-helix (HLH) CATH protein superfamily (4.10.280.100), while PDX1 belongs to another Homeodomain-like CATH protein superfamily (1.10.10.60).

      To compare our approach with both general and family-specific knowledge-based energy models, we conducted two studies. First, we incorporated a knowledge-based generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide (e.g., phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups). For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with this generic one to test its ability to differentiate strong binders from weak binders in the HT-SELEX dataset [5]. As shown in Figure S6, the IDEA model generally achieves better performance than the generic energy model.

      Additionally, we compared IDEA with rCLAMPS, a family-specific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families.

      As shown in Table S1 and Table S2, IDEA also shows better performance than rCLAMPS in most cases across the C2H2 and homeodomain families, demonstrating that it has better predictive accuracy than both state-of-the-art family-specific and generic knowledgebased models.

      We have included relevant texts in Appendix Section Comparison of IDEA predictive performance Using HT-SELEX data to clarify this point. The added texts are:

      In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledgebased generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide, including phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups. For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with the DBD-Hunter model to assess its ability to differentiate strong binders from weak binders in the HTSELEX dataset [5]. Additionally, we compared IDEA with rCLAMPS, a familyspecific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. rCLAMPS learns a position-dependent amino-acid-nucleotide interaction energy model. To incorporate this model into the binding free energy calculation, we averaged the energy contributions across all occurrences of each amino-acid-nucleotide pair, which resulted in a 20-by-4 residue-type-specific energy matrix. This matrix is structurally analogous to the IDEA-trained energy model and can be directly integrated into the binding free energy calculations. As shown in Figure S6, Table S1, and Table S2, the IDEA model generally outperforms DBD-Hunter and rCLAMPS, demonstrating that it can achieve better predictive accuracy than both generic and family-specific knowledge-based models.

      Comment 3: Prediction of binding affinity is a well-studied domain and many competitors exist, some of which are well-used. However, no quantitative comparison to such methods is presented. To understand the scope of the presented method, IDEA, the authors should discuss/compare with such methods (e.g. PMID 35606422).

      We thank the reviewer for the insightful comments. As detailed in our response to Comment 5, we previously misused the term “binding specificity”, and would like to clarify that our model is designed to predict protein-DNA binding affinity. To compare the performance of IDEA with state-of-the-art protein-DNA predictive models, we examined the predictive accuracies of two additional popular computational models: ProBound [8] and DeepBind [9]. ProBound has been shown to have a better performance than several earlier predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To benchmark these models’ performance, we examine each method’s capability to identify strong binders with the HT-SELEX datasets covering 22 proteins from 12 protein families [5]. As suggested by Reviewer 1, we also calculated the PRAUC score, reweighted to account for data imbalance [6], as a complementary metric for evaluating the model performance.

      As shown in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive methods. It is important to note that both ProBound and DeepBind were trained on a curated version of the HT-SELEX data [13], which overlaps with the testing data [5]. Compared with them, IDEA was trained only on the given structural and sequence information from a single protein-DNA complex, thus independent of the testing data. In order to assess how IDEA performs when incorporating knowledge from HT-SELEX data, we augmented the training by randomly including half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models. Overall, IDEA can be used to predict protein-DNA affinities in the absence of known binding sequence data, thereby filling a critical gap when such experimental datasets are unavailable.

      Additionally, we have conducted a 10-fold cross-validation using the same HT-SELEX data [5] and found that IDEA outperformed a recent regression model that considers the shape of DNA with different sequences [5].

      We have revised our text to include the comparison between IDEA and other predictive models. Specifically, we revised the text in Section: IDEA Generalizes across Various Protein Families.

      The revised text reads:

      “To examine IDEA’s predictive accuracy across different DNA-binding protein families, we applied it to calculate protein-DNA binding affinities using a comprehensive HT-SELEX dataset [5]. We focused on evaluating the capability of IDEA to distinguish strong binders from weak binders for each protein with an experimentally determined structure. We calculated the probability density distribution of the top and bottom binders identified in the SELEX experiment. A well-separated distribution indicates the successful identification of strong binders by IDEA (Figure 2D and S4). Receiver Operating Characteristic (ROC) analysis was performed to calculate the Area Under the Curve (AUC) and the precision-recall curve (PRAUC) scores for these predictions. Further details are provided in the Methods Section Evaluation of IDEA Prediction Using HT-SELEX Data. Our analysis shows that IDEA successfully differentiates strong from weak binders for 80% of the 22 proteins across 12 protein families, achieving AUC and balanced PRAUC scores greater than 0.5 (Figure 2D and S5). To benchmark IDEA’s performance against other leading methods, we compared its predictions with several popular models, including the sequence-based predictive models ProBound [8] and DeepBind [9], the familybased energy model rCLAMPS [10], and the knowledge-based energy model DBD-Hunter [7]. IDEA demonstrates performance comparable to these stateof-the-art approaches, and incorporating sequence features further improves its prediction accuracy (Figure S6, Table S1, and Table S2). We also performed 10-fold cross-validation on the binding affinities of protein–DNA pairs in this dataset and found that IDEA outperforms a recent regression model that considers the shape of DNA with different sequences [5] (Figure S7). Details are provided in Section: Comparison of IDEA predictive performance Using HT-SELEX data.”

      We also added one section Comparison of IDEA predictive performance Using HT-SELEX data in the Appendix to fully explain the comparison between IDEA and other popular models. The added texts are:

      “To benchmark the performance of IDEA against state-of-the-art protein-DNA predictive models, we evaluated its ability to recognize strong binders with the HT-SELEX datasets across 22 proteins from 12 families [5]. Specifically, we compare IDEA with two widely used sequence-based models: ProBound [8] and DeepBind [9]. ProBound has demonstrated superior performance over many other predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To use ProBound, we retrieved the trained binding model for each protein from motifcentral.org and used the GitHub implementation of ProBoundTools to infer the binding scores between protein and target DNA sequences. Except for POU3F1, binding models are available for all proteins. Therefore, we excluded POU3F1 and evaluated the protein-DNA binding affinities for the remaining 21 proteins. To use DeepBind, sequence-specific binding affinities were predicted directly with its web server. The Area Under the Curve (AUC) and the Precision-Recall AUC (PRAUC) scores were used as metrics for comparison. An AUC score of 1.0 indicates a perfect separation between the strong- and weak-binder distributions, while an AUC score of 0.5 indicates no separation. Because there is a significant imbalance in the number of strong and weak binders from the experimental data [5], where the strong binders are far fewer than the weak binders, we reweighted the samples to achieve a balanced evaluation, using 0.5 as the baseline for randomized prediction [6]. As summarized in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive models. In order to assess the performance of IDEA when augmented with additional protein-DNA binding data, we augmented IDEA using randomly selected half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models.”

      “We also performed 10-fold cross-validation using the same HT-SELEX datasets, following the protocol described in the Methods Section Enhanced Modeling Prediction with SELEX Data. For each protein, we divided the entire dataset into 10 equal, randomly assigned folds. In each iteration, we used randomly selected 9 of the 10 folds as the training dataset and the remaining fold as the testing dataset. This process was repeated 10 times so that each fold served as the test set once. We then reported the average R2 scores across these iterations to evaluate IDEA’s predictive performance. Our results are compared with the 1mer and 1mer+shape methods from [5], the latest regression model that considers the shape of DNA with different sequences (Figure S7). This comparative analysis shows IDEA achieved higher predictive accuracy than the state-of-the-art sequence-based protein-DNA binding predictors for proteinDNA complexes that have available experimentally resolved structures.”

      “Overall, these results demonstrate that IDEA can be used to predict the proteinDNA pairs in the absence of known binding sequence data, thus filling an important gap in protein-DNA predictions when experimental binding sequence data are unavailable.”

      Comment 4: The term “interpretable” has been used lavishly in the manuscript while providing little evidence on the matter. The only evidence shown is the family-specific residue-nucleotide interaction/energy matrix and speculations on how these values are biologically sensible. Recent works already present more biophysical, fine-grained, and sometimes family-independent interpretability (e.g. PMID 39103447, 36656856, 38352411, etc.). The authors should put into context the scope of the interpretability of IDEA among such works.

      We thank the reviewer for the insightful comment and agree that “interpretability” should be discussed in a relevant context. In our work, interpretability refers to the familyspecific amino-acid-nucleotide interaction energies identified from the model training, which reveal interaction preferences within protein-DNA binding interfaces. As detailed in our response to Comment 6, we performed principal component analysis (PCA) on the learned energy models and observed clustering of learned energy models corresponding to protein families. Therefore, the IDEA-learned energy models can be used as a signature to capture the energetic preferences of amino-acid-nucleotide interactions within a given protein family. This preference can be used to infer preferred sequence binding motifs, similar to those identified by other computational tools [10, 4, 15, 16].

      We have revised the text to clarify the “interpretability” as the family-specific aminoacid-nucleotide interactions that govern sequence-dependent protein-DNA binding, and to discuss IDEA’s interoperability within the context of recent works, including those suggested by the reviewers.

      We have revised the text in Introduction. The new text reads:

      “Here, we introduce the Interpretable protein-DNA Energy Associative (IDEA) model, a predictive model that learns protein-DNA physicochemical interactions by fusing available biophysical structures and their associated sequences into an optimized energy model (Figure 1). We show that the model can be used to accurately predict the sequence-specific DNA binding affinities of DNA-binding proteins and is transferrable across the same protein superfamily. Moreover, the model can be enhanced by incorporating experimental binding data and can be generalized to enable base-pair resolution predictions of genomic DNA-binding sites. Notably, IDEA learns a family-specific interaction matrix that quantifies energetic interactions between each amino acid and nucleotide, allowing for a direct interpretation of the “molecular grammar” governing sequence-specific protein-DNA binding affinities. This interpretable energy model is further integrated into a simulation framework, facilitating mechanistic studies of various biomolecular functions involving protein-DNA dynamics.”

      We have revised the text in Results. The new text reads:

      “IDEA is a coarse-grained biophysical model at the residue resolution for investigating protein-DNA binding interactions (Figure 1). It integrates both structures and corresponding sequences of known protein-DNA complexes to learn an interpretable energy model based on the interacting amino acids and nucleotides at the protein-DNA binding interface. The model is trained using available protein-DNA complexes curated from existing databases [17, 18].

      Unlike existing deep-learning-based protein-DNA binding prediction models, IDEA aims to learn a physicochemical-based energy model that quantitatively characterizes sequence-specific interactions between amino acids and nucleotides, thereby interpreting the “molecular grammar” driving the binding energetics of protein-DNA interactions. The optimized energy model can be used to predict the binding affinity of any given protein-DNA pair based on its structures and sequences. Additionally, it enables the prediction of genomic DNA binding sites by a given protein, such as a transcription factor. Finally, the learned energy model can be incorporated into a simulation framework to study the dynamics of DNA-binding processes, revealing mechanistic insights into various DNA-templated processes. Further details of the optimization protocol are provided in Methods Section Energy Model Optimization.”

      The revised text in Section: Discussion now reads:

      “Another highlight of IDEA is its ability to present an interpretable, familyspecific amino acid-nucleotide interaction energy model for given proteinDNA complexes. The optimized IDEA energy model can not only predict sequence-specific binding affinities of protein-DNA pairs but also provide a residue-specific interaction matrix that dictates the preferences of amino acidnucleotide interactions within specific protein families (Figure S11). This interpretable energy matrix would facilitate the discovery of sequence binding motifs for target DNA-binding proteins, complementing both sequencebased [24, 16, 25] and structure-based approaches [10, 26, 4, 15]. Additionally, we integrated this physicochemical-based energy model into a simulation framework, thereby improving the characterization of protein-DNA binding dynamics. IDEA-based simulation enables the investigation into dynamic interactions between various proteins and DNA, facilitating molecular-level understanding of the physical mechanisms underlying many DNA-binding processes, such as transcription, epigenetic regulations, and their modulation by sequence variations, such as single-nucleotide polymorphisms (SNPs) [22, 23].”

      Comment 5: The manuscript disregards subtle yet important differences in commonly used terminology in the field. For example, the authors use the term ”specificity” and ”affinity” almost interchangeably (for example, the caption for Figure 3A uses ”specificity” although the Methods text describes the prediction as about ”affinity”). If the authors are looking to predict specificity, IDEA needs to be put in the context of the corresponding state-of-the-art (PMID 36123148, 39103447, 38867914, 36124796, etc).

      We really appreciate the reviewer for pointing out the conflation of “specificity” and “affinity” in our manuscript. To clarify, the primary function of IDEA is to predict the binding affinities of protein-DNA pairs in a sequence-specific manner. We have revised the text to clarify the distinction between affinity and specificity and acknowledge prior works, including those provided by the reviewers, that focus on predicting protein-DNA binding specificity.

      We have revised the Section title IDEA Accurately Predicts Protein-DNA Binding Specificity to IDEA Accurately Predicts Sequence-Specific Protein-DNA Binding Affinity; and ResidueLevel Protein-DNA Energy Model for Predicting Protein-DNA Recognition Specificities to Predictive Protein-DNA Energy Model at Residue Resolution.

      We have revised the text in Introduction. The revised text reads:

      “Computational methods complement experimental efforts by providing the initial filter for assessing sequence-specific protein-DNA binding affinity. Numerous methods have emerged to enable predictions of binding sites and affinities of DNA-binding proteins [27, 9, 1, 5, 28, 29, 30, 31, 8]. These methods often utilized machine-learning-based training to extract sequence preference information from DNA or protein by utilizing experimental high-throughput (HT) assays [27, 9, 1, 5, 28, 8], which rely on the availability and quality of experimental binding assays. Additionally, many approaches employ deep neural networks [29, 30, 31], which could obscure the interpretation of interaction patterns governing protein-DNA binding specificities. Understanding these patterns, however, is crucial for elucidating the molecular mechanisms underlying various DNA-recognition processes, such as those seen in TFs [32].”

      We have revised the text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily.

      The revised text reads:

      “Since IDEA relies on the sequence-structure relationship of given protein-DNA complexes to reach predictive accuracy, we inquired whether the trained energy model from one protein-DNA complex could be generalized to predict the sequence-specific binding affinities of other complexes. To test this, we assessed the transferability of IDEA predictions across all 11 structurally available protein-DNA complexes within the MAX TF-associated CATH superfamily (CATH ID: 4.10.280.10, Helix-loop-helix DNA-binding domain). We trained IDEA based on each of these 11 complexes and then used the trained model to predict the MAX-based MITOMI binding affinity. Our results show that IDEA generally makes correct predictions of the binding affinity when trained on proteins that are homologous to MAX, with Pearson and Spearman Correlation coefficients larger than 0.5 (Figure 3A and Figure S10).”

      We have revised the caption of Figure 3: The revised text reads:

      “IDEA prediction shows transferability within the same CATH superfamily. (A) The predicted MAX binding affinity, trained on other protein-DNA complexes within the same protein CATH superfamily, correlates well with experimental measurement. The proteins are ordered by their probability of being homologous to the MAX protein, determined using HHpred [33]. Training with a homologous protein (determined as a hit by HHpred) usually leads to better predictive performance (Pearson Correlation coefficient > 0.5) compared to non-homologous proteins. (B) Structural alignment between 1HLO (white) and 1A0A (blue), two protein-DNA complexes within the same CATH Helix-loop-helix superfamily. The alignment was performed based on the Ebox region of the DNA [34]. (C) The optimized energy model for 1A0A, a protein-DNA complex structure of the transcription factor PHO4 and DNA, with 33.41% probability of being homologous to the MAX protein. The optimized energy model is presented in reduced units, as explained in the Methods Section: Training Protocol.”

      We have revised the text in Section Discussion: The revised text now reads:

      “The protein-DNA interaction landscape has evolved to facilitate precise targeting of proteins towards their functional binding sites, which underlie essential processes in controlling gene expression. These interaction specifics are determined by physicochemical interactions between amino acids and nucleotides. By integrating sequences and structural data from available proteinDNA complexes into an interaction matrix, we introduce IDEA, a data-driven method that optimizes a system-specific energy model. This model enables high-throughput in silico predictions of protein-DNA binding specificities and can be scaled up to predict genomic binding sites of DNA-binding proteins, such as TFs. IDEA achieves accurate de novo predictions using only proteinDNA complex structures and their associated sequences, but its accuracy can be further enhanced by incorporating available experimental data from other binding assay measurements, such as the SELEX data [35, 36, 37], achieving accuracy comparable or better than state-of-the-art methods (Figures S2 and S7, Table S1 and S2). Despite significant progress in genome-wide sequencing techniques [38, 39, 40, 41], determining sequence-specific binding affinities of DNA-binding biomolecules remains time-consuming and expensive. Therefore, IDEA presents a cost-effective alternative for generating the initial predictions before pursuing further experimental refinement.”

      We have revised the text in Discussion to clarify that the acquired binding affinities of target DNA sequences can be used to help existing models to infer specific DNA binding motifs.

      The revised text now reads:

      Another highlight of IDEA is its ability to present an interpretable, familyspecific amino acid-nucleotide interaction energy model for given proteinDNA complexes. The optimized IDEA energy model can not only predict sequence-specific binding affinities of protein-DNA pairs but also provide a residue-specific interaction matrix that dictates the preferences of amino acidnucleotide interactions within specific protein families (Figure S11). This interpretable energy matrix would facilitate the discovery of sequence binding motifs for target DNA-binding proteins, complementing both sequencebased [24, 16, 25] and structure-based approaches [10, 26, 4, 15]. Additionally, we integrated this physicochemical-based energy model into a simulation framework, thereby improving the characterization of protein-DNA binding dynamics. IDEA-based simulation enables the investigation into dynamic interactions between various proteins and DNA, facilitating molecular-level understanding of the physical mechanisms underlying many DNA-binding processes, such as transcription, epigenetic regulations, and their modulation by sequence variations, such as single-nucleotide polymorphisms (SNPs) [22, 23].

      Comment 6: It is not clear how much the learned energy model is dependent on the structural model used for a specific system/family. It would be interesting to see the differences in learned model based on different representative PDB structures used. Similarly, the supplementary figures show a lack of discriminative power for proteins like PDX1 (homeodomain family), POU, etc. Can the authors shed some light on why such different performances?

      We thank the reviewer for the insightful comments and agree that the trained energy model should be presented in the context of protein families. To further analyze the dependence of the energy model on protein family, we visualized the trained energy models for 24 proteins, including all proteins from the HT-SELEX dataset as well as PHO4 (PDB ID: 1A0A) and CTCF (PDB ID: 8SSQ), spanning 12 distinct protein families. To quantitatively assess similarities and differences among these energy models, we flattened each normalized energy model into an 80-dimensional vector and performed principal component analysis (PCA). As shown in Author response image 1 and Figure S11, energy models optimized from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results shown in Figure 3A, where the energy model trained from PHO4 has better transferability than those from the other two systems.

      We also greatly appreciate the reviewer’s suggestion to examine cases where IDEA failed to demonstrate strong discriminative power. When evaluating the model’s ability to distinguish between strong and weak binders, we used the available experimental structure most similar to the protein employed in the HT-SELEX experiments. In some instances, only the structure of the same protein from a different organism is available. For example, the HT-SELEX data for PDX1-DNA used the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, we used the mouse PDX1–DNA complex (PDB ID: 2H1K) for model training. The differences between species may limit the predictive accuracy of the model. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).

      We also examined the remaining cases where IDEA did not show a clear distinction between strong and weak binders: USF1, Egr1, and PROX1. For PROX1, we initially used the structure of a protein-DNA complex (PDB ID: 4Y60) in training. However, upon closer inspection, we discovered that this structure does not include the PROX1 protein, but SOX-18, a different transcription factor. This explains the inaccurate prediction made by IDEA. Since no experimental PROX1-DNA complex structure is currently available, we have removed this case from our HT-SELEX evaluation.

      IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.

      Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by kMITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.

      We have included additional text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily to discuss the PCA analysis and the dependence of the model’s transferability on the similarity among the learned energy models.

      The revised text now reads:

      “The transferability of IDEA within the same CATH superfamily can be understood from the similarities in protein-DNA binding interfaces, which determine similar learned energy models. For example, the PHO4 protein (PDB I”D: 1A0A) shares a highly similar DNA-binding interface with the MAX protein (PDB ID: 1HLO) (Figure 3B), despite sharing only a 33.41% probability of being homologous. Consequently, the energy model derived from the PHO4DNA complex (Figure 3C) exhibits a similar amino-acid-nucleotide interactive pattern as that learned from the MAX-DNA complex (Figure 2B). To further evaluate the similarity between the learned energy models and their connection to protein families, we performed principal component analysis (PCA) on the normalized energy models across 24 proteins from 12 protein families [5]. Our analysis (Figure S11) reveals that most of the energy models from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability between them. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results in Figure 3A, where the energy model trained on PHO4 has better transferability than those trained on USF1 or TCF4.”

      We have also added an Appendix section titled Analysis of examples where IDEA fails to recognize strong DNA binders to discuss the examples in which IDEA did not perform well:

      “We examine IDEA’s capability in identifying strong binders from the HT-SELEX dataset across 12 protein families [5]. The model successfully predicts 18 out of 22 protein-DNA systems, but the performance is reduced in 4 cases. Closer investigations revealed the source of these limitations. In some instances, only the protein from a different organism is available. For example, the PDX1 HT-SELEX data utilized the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, the mouse PDX1–DNA complex structure (PDB ID: 2H1K) was used for model training. Differences between model organisms may reduce predictive accuracy. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).

      IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.

      Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by k-MITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.”

      Comment 7: It is also not clear if IDEA’s prediction for reverse complement sequences is the same for a given sequence. If so, how is this property being modelled? Either this description is lacking or I missed it.

      We thank the reviewer for the insightful comments. Given a target protein-DNA sequence, the IDEA protocol substitutes it into a known protein-DNA complex structure to evaluate the binding free energy, which can be converted into binding affinity. IDEA uses sequence identity to determine whether the forward or reverse strand of the DNA should be replaced. Only the strand most similar to the target sequence is substituted. As a result, the model treats reverse-complement sequences differently. As the orientations of test sequences are specified from 5’ to 3’ in all datasets used in this study (e.g., processed MITOMI, HT-SELEX, and ChIP-seq data), this approach ensures that the target sequences are replaced and evaluated correctly. In cases where sequence orientation is not provided (though this was not an issue in this study), we recommend replacing both the forward and reverse strands with the target sequence separately and evaluating the corresponding protein–DNA binding free energies. Since strong binders are likely to dominate the experimental signals, the higher predicted binding affinity, with stronger binding free energies, should be taken as the model’s final prediction.

      We have added one section to the Methods Section titled Treatment of Complementary DNA Sequences to clarify these modeling details.

      The specific text reads:

      To replace the DNA sequence in the protein-DNA complex structure with a target sequence, IDEA uses sequence identity to determine whether the target sequence belongs to the forward or reverse strand of the DNA in the proteinDNA structure. The more similar strand is selected and replaced with the target sequence. As the orientations of test sequences are specified from 5’ to 3’ in all datasets used in this study (e.g., processed MITOMI, HT-SELEX, and ChIP-seq data), this approach ensures that the target sequences are replaced and evaluated correctly. In cases where sequence orientation is not provided (though this was not an issue in this study), we recommend replacing both the forward and reverse strands with the target sequence separately and evaluating the corresponding protein–DNA binding free energies. Since strong binders are likely to dominate the experimental signals, the higher predicted binding affinity, with stronger binding free energy, should be taken as the model’s final prediction.”

      “Comment 8: Page 21 line 403, the E-box core should be CACGTG instead of CACGTC.

      We apologize for our oversight and have corrected the relevant text.

      Comment 9: The citation for DNAproDB is outdated and should be updated (PMID 39494533).

      We thank the reviewer for pointing this out and have updated our citation accordingly.

      Reviewer #3:

      Comment 0: Summary: Protein-DNA interactions and sequence readout represent a challenging and rapidly evolving field of study. Recognizing the complexity of this task, the authors have developed a compact and elegant model. They have applied well-established approaches to address a difficult problem, effectively enhancing the information extracted from sparse contact maps by integrating artificial sequences decoy set and available experimental data. This has resulted in the creation of a practical tool that can be adapted for use with other proteins.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Comment 1: Strengths: (1) The authors integrate sparse information with available experimental data to construct a model whose utility extends beyond the limited set of structures used for training. (2) A comprehensive methods section is included, ensuring that the work can be reproduced. Additionally, the authors have shared their model as a GitHub project, reflecting their commitment to transparency of research.

      We appreciate the reviewer’s strong assessment of the strengths of this paper. In addition to sharing our model on GitHub, we have also uploaded the original data and the essential scripts required to reproduce the results presented in the manuscript. We hope this further demonstrates our commitment to transparency and reproducibility.

      Comment 2: Weaknesses: (1) The coarse-graining procedure appears artificial, if not confusing, given that full-atom crystal structures provide more detailed information about residue-residue contacts. While the selection procedure for distance threshold values is explained, the overall motivation for adopting this approach remains unclear. Furthermore, since this model is later employed as an empirical potential for molecular modeling, the use of P and C5 atoms raises concerns, as the interactions in 3SPN are modeled between Cα and the nucleic base, represented by its center of mass rather than P or C5 atoms.

      We appreciate the reviewer’s insightful comments. The selection of P and C5 atoms was based on different relative positions of protein and DNA across various complex structures, each with distinctive protein-DNA structural interfaces. To illustrate this, we selected two representative structures where our algorithm selected C5 and P atoms, respectively: MAX-DNA (PDB ID: 1HLO) and FOXP3 (PDB ID: 7TDW). As shown in Author response image 2, in the case of 1HLO, more C5 atoms are within the cutoff distance of 10 A from˚ the protein Cα atoms, thus capturing essential contacting interactions. In contrast, 7TDW has more P atoms within this cutoff. Importantly, several P atoms are distributed on the minor groove of the DNA, which were not captured by the C5 atoms. To maximize the inclusion of relevant structural contacts, we employed a filtering scheme that selectively chooses either P or C5 atoms based on their proximity to the protein to enhance the model prediction. We note that while this scheme is helpful, the IDEA predictions remain robust across different atom selections. To assess this robustness, we performed binding affinity predictions using only P atoms on the HT-SELEX dataset across 12 protein families [5]. Our predictions (Author response table 1) show comparable performance to that achieved using our filtering scheme.

      Author response image 2.

      Comparison between P and C5 atoms in proximity to the protein 3D structures of MAX–DNA (A) and FOXP-DNA (B) complexes, where P atoms (red sphere) and C5 atoms (blue sphere) that are within 10 A of Cα atoms are highlighted.

      When incorporating the trained IDEA energy model into a simulation model, we acknowledge a potential mismatch between the resolution of the data-driven model (one coarse-grained site per nucleotide) and the 3SPN simulation model (three coarse-grained sites per nucleotide). The selection of nucleic base sites for molecular interactions in the 3SPN model follows our previous work [44] and its associated code implementation. While revisiting this part of the manuscript, we identified an inconsistency in the reported results in Figure 5A of our initial version: Specifically, we previously used the protein side-chain atoms, rather than only the Cα atoms, in model training. Retraining the data using the Cα atoms results in reduced prediction performance for the IDEA model (Figure 5A). Nonetheless, incorporating this updated energy model into simulations still yielded high accuracy in the predicted absolute binding free energies (Author response image 3A), demonstrating the robustness of our simulation framework in predicting absolute binding free energies against variations in atom selection during the IDEA model training. Following the reviewer’s suggestion, we also incorporated the IDEA-trained energy model as short-range van der Waals interactions between protein Cα atoms and DNA P atoms. As shown in Author response image 3B, our simulation reveals a slightly improved performance over our original implementation, with higher Pearson and Spearman correlation coefficients and a fitted slope closer to 1.0. This result suggests that a more consistent atom selection scheme between the data-driven and simulation models can improve the overall predictions. Accordingly, we have updated Figure 5 with this improved setup, using the simulation model with short-range vdW interactions implemented between protein Cα atoms and DNA P atoms (Figure 5C), ensuring consistency between the IDEA model and simulation framework.

      Author response table 1.

      Comparison of IDEA performance using two DNA atom selection schemes: the filtering scheme presented in the manuscript (C5 and P atoms) versus using only P atoms. Cases where the two schemes result in different atom selections are highlighted in bold.

      We acknowledge that a gap still exists between the resolution of the data-driven and simulation models. To ensure a completely consistent coarse-grained level between these two models, we will work on implementing the IDEA model output for 1-bead-per-nucleotide DNA simulation models in the future.

      Comment 3: (2) Although the authors use a standard set of metrics to assess model quality and predictive power, some ∆∆G predictions compared to MITOMI-derived ∆∆G values appear nonlinear, which casts doubt on the interpretation of the correlation coefficient.

      Author response image 3.

      Comparison of simulations using different representative atoms (A) Protein-DNA binding simulation with the IDEA-model incorporated as short-range van der Waals between protein Cα atom and nucleic base site. (B) Protein-DNA binding simulation with the IDEA-model incorporated as short-range van der Waals between protein Cα atom and DNA P atoms. The predicted free energies are robust to the choice of DNA representative atoms. The predicted binding free energies are presented in physical units, and error bars represent the standard deviation of the mean.

      We thank the reviewer for the insightful comments and agree that the linear fit between our model’s prediction and the experimental data may not be the best measure of performance. The primary utility of the IDEA model is to predict high-affinity DNA-binding sequences for a given DNA-binding protein by assessing the relative binding affinities across different DNA sequences. In this regard, the ranked order of predicted sequence binding affinities serves as a better metric for evaluating the success of this model. To evaluate this, we calculated both Spearman’s rank correlation coefficient, which does not rely on linear correlation, and the Pearson correlation coefficient between our predictions and the experimental results. As shown in Figure 2, our computation shows a Spearman’s rank correlation coefficient of 0.65 for the MAX-based predictions using one MAX-DNA complex (PDB ID: 1HLO), supporting the model’s capability to effectively distinguish strong from weak binders.

      As reflected in Figure 2 of the main text, although our model generally captures the relative binding affinities across different DNA sequences, its predictive accuracy diminishes for low-affinity sequences (Figure 2). This could be due to two limitations of the current modeling framework: (1) The model is residue-based and estimates binding free energy as the additive sum of contributions from individual contacting amino-acid-nucleotide pairs. This assumption does not account for cooperative effects caused by simultaneous changes at multiple nucleotide positions. One potential direction to further improve the model would be to use a finer-grained representation by incorporating more atom types within contacting residues, and to use a many-body potential to better capture cooperative effects from multiple mutations. (2) The model assumes that the target DNA adopts the same binding interface as in the reference crystal structure. However, sequencedependent DNA shape has been shown to be important in determining protein-DNA binding affinity [1]. To address this limitation, a future direction is to use deep-learningbased methods to incorporate predicted DNA shape or protein-DNA complex structures based on their sequences [2, 3] into our model prediction.

      To fully evaluate the predictive power of IDEA, we have included Spearman’s rank correlation coefficient for every correlation plot in this manuscript. Across all our analyses, the Spearman’s rank correlation coefficients reveal similar predictive performance as the Pearson correlation coefficients. Additionally, we have included in our discussion the current limitations of our model and potential directions for future improvement.

      We have edited our Discussion Section to include a discussion on the limitations of the current model. Specifically, the added texts are:

      “Although IDEA has proved successful in many examples, it can be improved in several aspects. The model currently assumes the training and testing sequences share the same protein-DNA structure. While double-stranded DNA is generally rigid, recent studies have shown that sequence-dependent DNA shape contributes to their binding specificity [1, 2, 4]. To improve predictive accuracy, one could incorporate predicted DNA shapes or structures into the IDEA training protocol. In addition, the model is residue-based and evaluates the binding free energy as the additive sum of contributions from individual amino-acid-nucleotide contacts. This assumption does not account for cooperative effects that may arise from multiple nucleotide changes. A potential refinement could utilize a finer-grained model that includes more atom types within contacting residues and employs a many-body potential to account for such cooperative effects.”

      Comment 4: (3) The discussion section lacks information about the model’s limitations and a comprehensive comparison with other models. Additionally, differences in model performance across various proteins and their respective predictive powers are not addressed.

      We thank the reviewer for the insightful comments. As discussed in the response to Comment 3, the current structural model has several limitations, which may reduce predictive accuracy for weak DNA binders. We have noted these limitations in the Discussion section.

      To compare the performance of IDEA with state-of-the-art protein-DNA predictive models, we examined the predictive accuracies of two additional popular computational models: ProBound [8] and DeepBind [9]. ProBound has been shown to have a better performance than several earlier predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To benchmark these models’ performance, we examine each method’s capability to identify strong binders with the HT-SELEX datasets covering 22 proteins from 12 protein families [5]. As suggested by Reviewer 1, we also calculated the PRAUC score, reweighted to account for data imbalance [6], as a complementary metric for evaluating the model performance.

      As shown in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive methods. It is important to note that both ProBound and DeepBind were trained on a curated version of the HT-SELEX data [13], which overlaps with the testing data [5]. Compared with them, IDEA was trained only on the given structural and sequence information from a single protein-DNA complex, thus independent of the testing data. In order to assess how IDEA performs when incorporating knowledge from HT-SELEX data, we augmented the training by randomly including half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models. We further benchmarked IDEA using a 10-fold cross-validation on the same HT-SELEX data [5] and found that IDEA outperformed a recent regression model that considers the shape of DNA with different sequences [5]. Overall, IDEA can be used to predict protein-DNA affinities in the absence of known binding sequence data, thereby filling a critical gap when such experimental datasets are unavailable.

      In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledge-based generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide (e.g., phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups). For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with this generic one to test its ability to differentiate strong binders from weak binders in the HT-SELEX dataset [5]. As shown in Figure S6, the IDEA model generally achieves better performance than the generic energy model. Additionally, we compared IDEA with rCLAMPS, a family-specific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. As shown in Table S1 and Table S2, IDEA also shows better performance than rCLAMPS in most cases across the C2H2 and homeodomain families, demonstrating that it has better predictive accuracy than both family-specific and generic knowledge-based models.

      We have revised our text to include the comparison between IDEA and other predictive models. Specifically, we revised the text in Section: IDEA Generalizes across Various Protein Families.

      The revised text reads:

      “To examine IDEA’s predictive accuracy across different DNA-binding protein families, we applied it to calculate protein-DNA binding affinities using a comprehensive HT-SELEX dataset [5]. We focused on evaluating the capability of IDEA to distinguish strong binders from weak binders for each protein with an experimentally determined structure. We calculated the probability density distribution of the top and bottom binders identified in the SELEX experiment. A well-separated distribution indicates the successful identification of strong binders by IDEA (Figure 2D and S4). Receiver Operating Characteristic (ROC) analysis was performed to calculate the Area Under the Curve (AUC) and the precision-recall curve (PRAUC) scores for these predictions. Further details are provided in the Methods Section Evaluation of IDEA Prediction Using HT-SELEX Data. Our analysis shows that IDEA successfully differentiates strong from weak binders for 80% of the 22 proteins across 12 protein families, achieving AUC and balanced PRAUC scores greater than 0.5 (Figure 2E and S5). To benchmark IDEA’s performance against other leading methods, we compared its predictions with several popular models, including the sequence-based predictive models ProBound [8] and DeepBind [9], the familybased energy model rCLAMPS [10], and the knowledge-based energy model DBD-Hunter [7]. IDEA demonstrates performance comparable to these stateof-the-art approaches (Figure S6, Table S1, and Table S2), and incorporating sequence features further improves its prediction accuracy. We also performed 10-fold cross-validation on the binding affinities of protein–DNA pairs in this dataset and found that IDEA outperforms a recent regression model that considers the shape of DNA with different sequences [5] (Figure S7). Details are provided in Section: Comparison of IDEA predictive performance Using HT-SELEX data.”

      We also added one section Comparison of IDEA predictive performance Using HT-SELEX data in the Appendix to fully explain the comparison between IDEA and other popular models.

      The added texts are:

      “To benchmark the performance of IDEA against state-of-the-art protein-DNA predictive models, we evaluated its ability to recognize strong binders with the HT-SELEX datasets across 22 proteins from 12 families [5]. Specifically, we compare IDEA with two widely used sequence-based models: ProBound [8] and DeepBind [9]. ProBound has demonstrated superior performance over many other predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To use ProBound, we retrieved the trained binding model for each protein from motifcentral.org and used the GitHub implementation of ProBoundTools to infer the binding scores between protein and target DNA sequences. Except for POU3F1, binding models are available for all proteins. Therefore, we excluded POU3F1 and evaluated the protein-DNA binding affinities for the remaining 21 proteins. To use DeepBind, sequence-specific binding affinities were predicted directly with its web server. The Area Under the Curve (AUC) and the Precision-Recall AUC (PRAUC) scores were used as metrics for comparison. An AUC score of 1.0 indicates a perfect separation between the strong- and weak-binder distributions, while an AUC score of 0.5 indicates no separation. Because there is a significant imbalance in the number of strong and weak binders from the experimental data [5], where the strong binders are far fewer than the weak binders, we reweighted the samples to achieve a balanced evaluation, using 0.5 as the baseline for randomized prediction [6]. As summarized in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive models. In order to assess the performance of IDEA when augmented with additional protein-DNA binding data, we augmented IDEA using randomly selected half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models.”

      “In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledgebased generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide, including phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups. For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with the DBD-Hunter model to assess its ability to differentiate strong binders from weak binders in the HTSELEX dataset [5]. Additionally, we compared IDEA with rCLAMPS, a familyspecific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. rCLAMPS learns a position-dependent amino-acid-nucleotide interaction energy model. To incorporate this model into the binding free energy calculation, we averaged the energy contributions across all occurrences of each amino-acid-nucleotide pair, which resulted in a 20-by-4 residue-type-specific energy matrix. This matrix is structurally analogous to the IDEA-trained energy model and can be directly integrated into the binding free energy calculations. As shown in Figure S6, Table S1, and Table S2, the IDEA model generally outperforms DBD-Hunter and rCLAMPS, demonstrating that it can achieve better predictive accuracy than both generic and family-specific knowledge-based models.”

      “We also performed 10-fold cross-validation using the same HT-SELEX datasets, following the protocol described in the Methods Section Enhanced Modeling Prediction with SELEX Data. For each protein, we divided the entire dataset into 10 equal, randomly assigned folds. In each iteration, we used randomly selected 9 of the 10 folds as the training dataset and the remaining fold as the testing dataset. This process was repeated 10 times so that each fold served as the test set once. We then reported the average R2 scores across these iterations to evaluate IDEA’s predictive performance. Our results are compared with the 1mer and 1mer+shape methods from [5], the latest regression model that considers the shape of DNA with different sequences (Figure S7). This comparative analysis shows IDEA achieved higher predictive accuracy than the state-of-the-art sequence-based protein-DNA binding predictors for proteinDNA complexes that have available experimentally resolved structures.”

      “Overall, these results demonstrate that IDEA can be used to predict the proteinDNA pairs in the absence of known binding sequence data, thus filling an important gap in protein-DNA predictions when experimental binding sequence data are unavailable.”

      We also greatly appreciate the reviewer’s suggestion to examine the model’s performance across different proteins. To do this, we first evaluated the dependence of IDEA prediction on the availability of experimental structures similar to the target protein-DNA complexes. To quantitatively assess similarities and differences among the IDEA-derived energy models, we flattened each normalized energy model into an 80-dimensional vector and performed principal component analysis (PCA). As shown in Author response image 1 and Figure S11, energy models optimized from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results shown in Figure 3A, where the energy model trained from PHO4 has better transferability than those from the other two systems. Therefore, the availability of experimental structures from protein-DNA complexes more similar to the target can lead to better predictive performance.

      We also examine cases in which the IDEA model failed to show strong discriminative power for protein-DNA complexes in the HT-SELEX datasets [5] (Figures 2E and S5). When evaluating the model’s ability to distinguish between strong and weak binders, we used the available experimental structure most similar to the protein employed in the HT-SELEX experiments. In some instances, only the structure of the same protein from a different organism is available. For example, the HT-SELEX data for PDX1-DNA used the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, we used the mouse PDX1–DNA complex (PDB ID: 2H1K) for model training. The differences between species may limit the predictive accuracy of the model. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequencebased prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).

      We also examined the remaining cases where IDEA did not show a clear distinction between strong and weak binders: USF1, Egr1, and PROX1. For PROX1, we initially used the structure of a protein-DNA complex (PDB ID: 4Y60) in training. However, upon closer inspection, we discovered that this structure does not include the PROX1 protein, but SOX-18, a different transcription factor. This explains the inaccurate prediction made by IDEA. Since no experimental PROX1-DNA complex structure is currently available, we have removed this case from our HT-SELEX evaluation.

      IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.

      Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by kMITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.

      In summary, IDEA’s predictive performance depends on the availability of experimental structures closely related to the target protein-DNA complexes, both in terms of protein sequences and model organisms.

      We have included additional text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily to discuss the PCA analysis and the dependence of the model’s transferability on the similarity among the learned energy models.

      The revised text now reads:

      “The transferability of IDEA within the same CATH superfamily can be understood from the similarities in protein-DNA binding interfaces, which determine similar learned energy models. For example, the PHO4 protein (PDB ID: 1A0A) shares a highly similar DNA-binding interface with the MAX protein (PDB ID: 1HLO) (Figure 3B), despite sharing only a 33.41% probability of being homologous. Consequently, the energy model derived from the PHO4DNA complex (Figure 3C) exhibits a similar amino-acid-nucleotide interactive pattern as that learned from the MAX-DNA complex (Figure 2B). To further evaluate the similarity between the learned energy models and their connection to protein families, we performed principal component analysis (PCA) on the normalized energy models across 24 proteins from 12 protein families [5]. Our analysis (Figure S11) reveals that most of the energy models from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability between them. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results in Figure 3A, where the energy model trained on PHO4 has better transferability than those trained on USF1 or TCF4.”

      We have also added an Appendix section titled Analysis of examples where IDEA fails to recognize strong DNA binders to discuss the examples in which IDEA did not perform well:

      “We examine IDEA’s capability in identifying strong binders from the HT-SELEX dataset across 12 protein families [5]. The model successfully predicts 18 out of 22 protein-DNA systems, but the performance is reduced in 4 cases. Closer investigations revealed the source of these limitations. In some instances, only the protein from a different organism is available. For example, the PDX1 HT-SELEX data utilized the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, the mouse PDX1–DNA complex structure (PDB ID: 2H1K) was used for model training. Differences between model organisms may reduce predictive accuracy. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).

      IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.

      Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by k-MITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.”

      Comment 5: The authors provide an implementation of their model via GitHub, which is commendable. However, it unexpectedly requires the Modeller suite, despite no details about homology modeling being included in the methods section.

      We thank the reviewer for the helpful comments. We did not use the homology modeling module of Modeller. Instead, we only used a single Python script, buildseq.py, from the Modeller package to extract the protein and DNA sequences from the given PDB structure. We have clarified this in the README file on our GitHub repository.

      Comment 6: While the manuscript is written in clear and accessible English, some sentences are quite long and could benefit from rephrasing (e.g., lines 49-52).

      Thank you for the helpful suggestion. We agree that the original sentence was overly long and have revised it by splitting it into two for improved clarity and readability.

      The revised version reads:

      “The very robustness of evolution [46, 47, 48, 49] provides an opportunity to extract the sequence-structure relationships embedded in existing complexes. Guided by this principle, we can learn an interpretable binding energy landscape that governs the recognition processes of DNA-binding proteins.”

      Comment 7: In line 82, the citations appear out of place, as the context seems to suggest the use of the newly developed model.

      Thank you for this insightful suggestion. We have rephrased the sentence to better connect with the context of this section.

      The revised text now reads:

      “Finally, the learned energy model can be incorporated into a simulation framework to explore the dynamics of DNA-binding processes, revealing mechanistic insights into various DNA-templated processes.”

      Comment 8: Line 143 ”different structure from the bHLH TFs and thus requires a different atom” This is the first instance in the manuscript where the atom selection for distance thresholding is mentioned, making the text somewhat confusing.

      We thank the reviewer for the insightful comment and agree that the atom selection scheme appears abruptly in this section. To improve clarity, we have moved the detailed atom selection scheme and its rationale to the Methods Section titled Structural Modeling of Protein and DNA.

      Comment 9: Figures: Overall, the figures are visually appealing but could be further improved.

      We appreciate the positive feedback regarding the visual presentation of our figures. Following the reviewer’s suggestions and to further enhance clarity, we have revised several figures to improve labeling, layout, and annotations.

      Comment 10: Figure 1: The description ”highlighted in blue” considers changing to ”highlighted in blue on the structure.”.

      We have revised the text based on your suggestion.

      Comment 11: Figure 2: Panel B is missing a color bar legend and units, as is the case in Figure 3C. Additionally, the placement of Panel C is unconventional - it appears it should be Panel D. The color scheme for the spheres is not fully described. Panel E: There are too many colors used; consider employing different markers to improve clarity.

      Thank you for the helpful suggestions.

      For Figure 2B and Figure 3C, we would like to clarify that the predicted energies are presented in reduced units due to an undetermined prefactor introduced during the model optimization. This point has now been clarified in the figure captions and is also explained in the Methods section titled Training Protocol.

      Additionally, we have rearranged Panels C and D to improve the figure layout and have fully described the color coding used in the structural representations.

      We have updated it to read:

      “Results for MAX-based predictions. (A) The binding free energies calculated by IDEA, trained using a single MAX–DNA complex (PDB ID: 1HLO), correlate well with experimentally measured MAX–DNA binding free energies [50]. ∆∆G represents the changes in binding free energy relative to that of the wild-type protein–DNA complex. (B) The heatmap, derived from the optimized energy model, illustrates key amino acid–nucleotide interactions governing MAX–DNA recognition, showing pairwise interaction energies between 20 amino acids and the four DNA bases—DA (deoxyadenosine), DT (deoxythymidine), DC (deoxycytidine), and DG (deoxyguanosine). Both the predicted binding free energies and the optimized energy model are expressed in reduced units, as explained in the Methods Section Training Protocol. Each cell represents the optimized energy contribution, where blue indicates more favorable (lower) energy values, and red indicates less favorable (higher) values. (C) The 3D structure of the MAX–DNA complex (zoomed in with different views) highlights key amino acid–nucleotide contacts at the protein–DNA interface. Notably, several DNA deoxycytidines (red spheres) form close contacts with arginines (blue spheres). Additional nucleotide color coding: adenine (yellow spheres), guanine (green spheres), thymine (pink spheres). (D) Probability density distributions of predicted binding free energies for strong (blue) and weak (red) binders of the protein ZBTB7A. The mean of each distribution is marked with a dashed line. (E) Summary of AUC scores for protein–DNA pairs across 12 protein families, calculated based on the predicted probability distributions of binding free energies.”

      We fully agree that Panel E was visually overwhelming. We have revised the plot by using a combination of color and marker shapes to more clearly distinguish between different protein families, as suggested.

      Comment 12: Typos:

      Line 18: Gene expressions → Gene expression?

      Line 28: performed → utilized ?

      We really appreciate the suggestions and have corrected the text accordingly.

      References

      (1) Tianyin Zhou, Ning Shen, Lin Yang, Namiko Abe, John Horton, Richard S Mann, Harmen J Bussemaker, Raluca Gordan, and Remo Rohs. Quantitative modeling ofˆ transcription factor binding specificities using DNA shape. Proceedings of the National Academy of Sciences, 112(15):4654–4659, 2015.

      (2) Jinsen Li, Tsu-Pei Chiu, and Remo Rohs. Predicting DNA structure using a deep learning method. Nat Commun, 15(1):1243, February 2024.

      (3) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile˙ Zemgulytˇ e, Eirini Arvan-˙ iti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Zˇ´ıdek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.

      (4) Raktim Mitra, Jinsen Li, Jared M. Sagendorf, Yibei Jiang, Ari S. Cohen, Tsu-Pei Chiu, Cameron J. Glasscock, and Remo Rohs. Geometric deep learning of protein–DNA binding specificity. Nat Methods, 21(9):1674–1683, September 2024.

      (5) Lin Yang, Yaron Orenstein, Arttu Jolma, Yimeng Yin, Jussi Taipale, Ron Shamir, and Remo Rohs. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models. Mol Syst Biol, 13(2):910, February 2017.

      (6) Takaya Saito and Marc Rehmsmeier. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE, 10(3):e0118432, March 2015.

      (7) Mu Gao and Jeffrey Skolnick. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res, 36(12):3978–3992, July 2008.

      (8) H. Tomas Rube, Chaitanya Rastogi, Siqian Feng, Judith F. Kribelbauer, Allyson Li, Basheer Becerra, Lucas A. N. Melo, Bach Viet Do, Xiaoting Li, Hammaad H. Adam, Neel H. Shah, Richard S. Mann, and Harmen J. Bussemaker. Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning. Nat Biotechnol, 40(10):1520–1527, October 2022.

      (9) Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol, 33(8):831–838, August 2015.

      (10) Joshua L. Wetzel, Kaiqian Zhang, and Mona Singh. Learning probabilistic proteinDNA recognition codes from DNA-binding specificities using structural mappings. Genome Res, 32(9):1776–1786, September 2022.

      (11) Aziz Khan, Oriol Fornes, Arnaud Stigliani, Marius Gheorghe, Jaime A CastroMondragon, Robin van der Lee, Adrien Bessy, Jeanne Cheneby, Shubhada R Kulka-` rni, Ge Tan, Damir Baranasic, David J Arenillas, Albin Sandelin, Klaas Vandepoele, Boris Lenhard, Benoˆıt Ballester, Wyeth W Wasserman, Franc¸ois Parcy, and Anthony Mathelier. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Research, 46(D1):D260–D266, January 2018.

      (12) Ivan V. Kulakovskiy, Ilya E. Vorontsov, Ivan S. Yevshin, Ruslan N. Sharipov, Alla D. Fedorova, Eugene I. Rumynskiy, Yulia A. Medvedeva, Arturo Magana-Mora, Vladimir B. Bajic, Dmitry A. Papatsenko, Fedor A. Kolpakov, and Vsevolod J. Makeev. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res, 46(D1):D252–D259, January 2018.

      (13) Arttu Jolma, Jian Yan, Thomas Whitington, Jarkko Toivonen, Kazuhiro R. Nitta, Pasi Rastas, Ekaterina Morgunova, Martin Enge, Mikko Taipale, Gonghong Wei, Kimmo Palin, Juan M. Vaquerizas, Renaud Vincentelli, Nicholas M. Luscombe, Timothy R. Hughes, Patrick Lemaire, Esko Ukkonen, Teemu Kivioja, and Jussi Taipale. DNABinding Specificities of Human Transcription Factors. Cell, 152(1-2):327–339, January 2013.

      (14) Maor Asif and Yaron Orenstein. DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs. Bioinformatics, 36(Supplement 2):i634–i642, December 2020.

      (15) Oriol Fornes, Alberto Meseguer, Joachim Aguirre-Plans, Patrick Gohl, Patricia M Bota, Ruben Molina-Fernandez, Jaume Bonet, Altair Chinchilla-Hernandez, Ferran´ Pegenaute, Oriol Gallego, Narcis Fernandez-Fuentes, and Baldo Oliva. Structurebased learning to predict and model protein–DNA interactions and transcriptionfactor co-operativity in cis -regulatory elements. NAR Genomics and Bioinformatics, 6(2):lqae068, April 2024.

      (16) Sofia Aizenshtein-Gazit and Yaron Orenstein. DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning. Bioinformatics, 38(Suppl 2):ii62–ii67, September 2022.

      (17) Stephen K Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Henry Chao, Li Chen, Paul A Craig, Gregg V Crichlow, Kenneth Dalenberg, Jose M Duarte, Shuchismita Dutta, Maryam Fayazi, Zukang Feng, Justin W Flatt, Sai Ganesan, Sutapa Ghosh, David S Goodsell, Rachel Kramer Green, Vladimir Guranovic, Jeremy Henry, Brian P Hudson, Igor Khokhriakov, Catherine L Lawson, Yuhe Liang, Robert Lowe, Ezra Peisach, Irina Persikova, Dennis W Piehl, Yana Rose, Andrej Sali, Joan Segura, Monica Sekharan, Chenghua Shao, Brinda Vallat, Maria Voigt, Ben Webb, John D Westbrook, Shamara Whetstone, Jasmine Y Young, Arthur Zalevsky, and Christine Zardecki. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 51(D1):D488–D508, November 2022.

      (18) Raktim Mitra, Ari S. Cohen, Jared M. Sagendorf, Helen M. Berman, and Remo Rohs. DNAproDB: an updated database for the automated and interactive analysis of protein-DNA complexes. Nucleic Acids Res, 53(D1):D396–D402, January 2025.

      (19) Natalia Petrenko, Yi Jin, Liguo Dong, Koon Ho Wong, and Kevin Struhl. Requirements for RNA polymerase II preinitiation complex formation in vivo. eLife, 8:e43654, January 2019.

      (20) Rudolf Jaenisch and Adrian Bird. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet, 33(3):245–254, March 2003.

      (21) Claire Marchal, Jiao Sima, and David M. Gilbert. Control of DNA replication timing in the 3D genome. Nat Rev Mol Cell Biol, 20(12):721–737, December 2019.

      (22) Lucia A. Hindorff, Praveen Sethupathy, Heather A. Junkins, Erin M. Ramos, Jayashri P. Mehta, Francis S. Collins, and Teri A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362–9367, June 2009.

      (23) Tuuli Lappalainen, Alexandra J Scott, Margot Brandt, and Ira M Hall. Genomic analysis in the age of human genome sequencing. Cell, 177(1):70–84, 2019.

      (24) Sonali Mukherjee, Michael F. Berger, Ghil Jona, Xun S. Wang, Dale Muzzey, Michael Snyder, Richard A. Young, and Martha L. Bulyk. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat Genet, 36(12):1331– 1339, December 2004.

      (25) Shaoxun Liu, Pilar Gomez-Alcala, Christ Leemans, William J. Glassford, Lucas A. N. Melo, Xiang-Jun Lu, Richard S. Mann, and Harmen J. Bussemaker. Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning. bioRxiv, page 2024.01.24.577115, April 2025.

      (26) Tsu-Pei Chiu, Satyanarayan Rao, and Remo Rohs. Physicochemical models of protein–DNA binding with standard and modified base pairs. Proc. Natl. Acad. Sci. U.S.A., 120(4):e2205796120, January 2023.

      (27) Matthew T Weirauch, Atina Cote, Raquel Norel, Matti Annala, Yue Zhao, Todd R Riley, Julio Saez-Rodriguez, Thomas Cokelaer, Anastasia Vedenko, Shaheynoor Talukder, and others. Evaluation of methods for modeling transcription factor sequence specificity. Nature biotechnology, 31(2):126–134, 2013.

      (28) Chaitanya Rastogi, H. Tomas Rube, Judith F. Kribelbauer, Justin Crocker, Ryan E. Loker, Gabriella D. Martini, Oleg Laptenko, William A. Freed-Pastor, Carol Prives, David L. Stern, Richard S. Mann, and Harmen J. Bussemaker. Accurate and sensitive quantification of protein-DNA binding affinity. Proc. Natl. Acad. Sci. U.S.A., 115(16), April 2018.

      (29) Rahmatullah Roche, Bernard Moussad, Md Hossain Shuvo, Sumit Tarafder, and Debswapna Bhattacharya. EquiPNAS: improved protein–nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Research, 52(5):e27–e27, March 2024.

      (30) Yufan Liu and Boxue Tian. Protein–DNA binding sites prediction based on pretrained protein language model and contrastive learning. Briefings in Bioinformatics, 25(1):bbad488, November 2023.

      (31) Binh P. Nguyen, Quang H. Nguyen, Giang-Nam Doan-Ngoc, Thanh-Hoang Nguyen-Vo, and Susanto Rahardja. iProDNA-CapsNet: identifying protein-DNA binding residues using capsule neural networks. BMC Bioinformatics, 20(S23):634, December 2019.

      (32) Trevor Siggers and Raluca Gordan. Protein–DNA binding: complexities and multi-ˆ protein codes. Nucleic Acids Research, 42(4):2099–2111, February 2014.

      (33) Johannes Soding, Andreas Biegert, and Andrei N. Lupas. The HHpred interactive¨ server for protein homology detection and structure prediction. Nucleic Acids Research, 33(suppl 2):W244–W248, July 2005.

      (34) William Humphrey, Andrew Dalke, and Klaus Schulten. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics, 14:33–38, 1996.

      (35) Arttu Jolma, Teemu Kivioja, Jarkko Toivonen, Lu Cheng, Gonghong Wei, Martin Enge, Mikko Taipale, Juan M Vaquerizas, Jian Yan, Mikko J Sillanpa¨a, and others.¨ Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome research, 20(6):861–873, 2010.

      (36) Nobuo Ogawa and Mark D Biggin. High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro. Gene Regulatory Networks: Methods and Protocols, pages 51–63, 2012.

      (37) Alina Isakova, Romain Groux, Michael Imbeault, Pernille Rainer, Daniel Alpern, Riccardo Dainese, Giovanna Ambrosini, Didier Trono, Philipp Bucher, and Bart Deplancke. SMiLE-seq identifies binding motifs of single and dimeric transcription factors. Nature methods, 14(3):316–322, 2017.

      (38) Paul G. Giresi, Jonghwan Kim, Ryan M. McDaniell, Vishwanath R. Iyer, and Jason D. Lieb. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res., 17(6):877–885, January 2007.

      (39) Peter J Park. ChIP–seq: advantages and challenges of a maturing technology. Nature reviews genetics, 10(10):669–680, 2009.

      (40) Terrence S. Furey. ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet, 13(12):840–852, December 2012.

      (41) Anna Bartlett, Ronan C. O’Malley, Shao-shan Carol Huang, Mary Galli, Joseph R. Nery, Andrea Gallavotti, and Joseph R. Ecker. Mapping genome-wide transcriptionfactor binding sites using DAP-seq. Nat Protoc, 12(8):1659–1672, August 2017.

      (42) Marcel Geertz, David Shore, and Sebastian J Maerkl. Massively parallel measurements of molecular interaction kinetics on a microfluidic platform. Proceedings of the National Academy of Sciences, 109(41):16540–16545, 2012.

      (43) Gary D. Stormo and Yue Zhao. Determining the specificity of protein–DNA interactions. Nat Rev Genet, 11(11):751–760, November 2010.

      (44) Xingcheng Lin, Rachel Leicher, Shixin Liu, and Bin Zhang. Cooperative DNA looping by PRC2 complexes. Nucleic Acids Research, 49(11):6238–6248, June 2021.

      (45) P. L. Privalov, A. I. Dragan, and C. Crane-Robinson. Interpreting protein/DNA interactions: distinguishing specific from non-specific and electrostatic from nonelectrostatic components. Nucleic Acids Research, 39(7):2483–2491, April 2011.

      (46) J D Bryngelson and P G Wolynes. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. U.S.A., 84(21):7524–7528, November 1987.

      (47) J. N. Onuchic, Z. Luthey-Schulten, and P. G. Wolynes. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem, 48:545–600, 1997.

      (48) N. P. Schafer, B. L. Kim, W. Zheng, and P. G. Wolynes. Learning To Fold Proteins Using Energy Landscape Theory. Isr J Chem, 54(8-9):1311–1337, August 2014.

      (49) Wen-Ting Chu, Zhiqiang Yan, Xiakun Chu, Xiliang Zheng, Zuojia Liu, Li Xu, Kun Zhang, and Jin Wang. Physics of biomolecular recognition and conformational dynamics. Rep. Prog. Phys., 84(12):126601, December 2021.

      (50) Sebastian J. Maerkl and Stephen R. Quake. A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors. Science, 315(5809):233–237, January 2007.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      We thank the reviewer for describing the study’s strengths, reflecting the major conclusions of the initially submitted manuscript.  However, based on new analyses – including the requested analyses of other scRNA-seq datasets, our revision clarifies that:

      -  related to point (1), cone and rod transcripts do not appear to be mixed together at first (i.e., in immediately post-mitotic immature cone and rod precursors) but appear to be coexpressed in subsequent cone and rod precursor stages; and 

      - related to point (3), CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that co-express cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset). 

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for noting these important issues. Accordingly, in the revised manuscript:

      (1) We improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors. 

      (2) We scale back claims related to the role of SYK in the cone precursor response to RB1 loss, with wording changes in the Abstract, Results, and Discussion, which now recognize that the inhibitor studies only support the possibility that cone-intrinsic SYK expression contributes to retinoblastoma initiation, as detailed in our responses to Reviewer’s Recommendations for Authors. We agree and now mention that genetic perturbation of SYK is required to prove its role.  

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging fulllength sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, longread RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      We thank the reviewer for summarizing the main findings and noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development.  

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We agree that the manuscript covers a range of topics resulting from the full-length scRNAseq analyses and concur that some studies of developing photoreceptors were not well connected to retinoblastoma. However, we also note that the connection to retinoblastoma is emphasized in several places in the Introduction and throughout the manuscript and was a significant motivation for pursuing the analyses. We suggest that it was valuable to highlight how deep, fulllength scRNA-seq of developing retina provides insights into retinoblastoma, including i) the similar biased expression of NRL transcript isoforms in cone precursors and RB tumors, ii) the cone precursors’ co-expression of rod- and cone-related genes such as NR2E3 and GNAT2, which may explain similar co-expression in RB cells, and iii) the expression of  SYK in early cones and RB cells.  While the earlier version had mainly highlighted point (iii), the revised Discussion further refers to points (i) and (ii) as described further in the response to the Reviewer’s Recommendations for Authors. 

      We address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina by relating the different photoreceptor-related cell populations identified in our study to those characterized by Zuo et al (PMID 39117640), which was specifically highlighted by the reviewer and is especially useful for such cross-validation given the extraordinarily large ~ 220,000 cell dataset covering a wide range of retinal ages (pcw 8–23) and spatiotemporally stratified by macular or peripheral retina location. Relevant analyses of the Zuo et al dataset are shown in Supplementary Figures S3G-H, S10B, S11A-F, and S13A,B. 

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our revision addresses the concerns raised separately in the Reviewer’s Recommendations for Authors, as detailed in the responses below.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have completed their reviews. Generally, they note that your work is important and that the evidence is generally convincing. The reviewers are in general agreement that the paper adds to the field. The findings of rod/cone fate determination at a very early stage are intriguing. Generally, the paper would benefit from clarifications in the writing and figures. Experimentally, the paper would benefit from validation of the drug data, for example using RNAi or another assay. Alternatively, the authors could note the caveats of the drug experiments and describe how they could be improved. In terms of analysis, the paper would be improved by additional comparisons of the authors' data to previously published datasets.

      We thank the reviewing editor for this summary. As described in the individual reviewer responses, we clarify the writing and figures and provide comparisons to previously published datasets (in particular, the large snRNA-seq dataset of Zuo et al., 2024 (PMID 39117640).  With regard to the drug (i.e., SYK inhibitor) studies, we opted to provide caveats and describe the need for genetic approaches to validate the role of SYK, owing to the infeasibility of completing genetic perturbation experiments in the appropriate timeframe.  We are grateful for the opportunity to present our findings with appropriate caveats. 

      Reviewer #1 (Recommendations for the authors):

      Shayler cell sort human progenitor/rod/cone populations then full-length single cell RNAseq to expose features that distinguish paths towards rods or cones. They initially distinguish progenitors (RPCs), immature photoreceptor precursors (iPRPs), long/medium wavelength (LM) cones, late-LM cones, short wavelength (S) cones, early rods (ER) and late rods (LR), which exhibit distinct transcription factor regulons (Figures 1, 2). These data expose expected and novel enriched genes, and support the notion that S cones are a default state lacking expression of rod (NRL) or cone (THRB) determinants but retaining expression of generic photoreceptor drivers (CRX/OTX2/NEUROD1 regulons). They identify changes in regulon activity, such as increasing NRL activity from iPRP to ER to LR, but decreasing from iPRP to cones, or increasing RAX/ISL2/THRB regulon activity from iPRP to LM cones, but decreasing from iPRP to S cones or rods.

      They report co-expression of rod/cone determinants in LM and ER clusters, and the ratios are in the expected directions (NRLTHRB or RXRG in ER). A novel insight from the FL seq is that there are differing variants generated in each cell population. Full-length NRL (FL-NRL) predominates in the rod path, whereas truncated NRL (Tr-NRL) does so in the cone path, then similar (but opposite) findings are presented for THRB (Fig 3, 4), whereas isoforms are not a feature of RXRG expression, just the higher expression in cones.

      The authors then further subcluster and perform RNA velocity to uncover decision points in the tree (Figure 5). They identify two photoreceptor precursor streams, the Transitional Rods (TRs) that provide one source for rod maturation and (reusing the name from the initial clustering) iPRPs that form cones, but also provide a second route to rods. TR cells closest to RPCs (immediately post-mitotic) have higher levels of the rod determinant NR2E3 and NRL, whereas the higher resolution iPRPs near RPCs lack NR2E3 and have higher levels of ONECUT1, THRB, and GNAT2, a cone bias. These distinct rod-biased TR and cone-biased high-resolution iPRPs were not evident in published scRNAseq with 3′ end-counting (i.e. not FL seq). Regulon analysis confirmed higher NRL activity in TR cells, with higher THRB activity in highresolution iPRP cells.

      Many of the more mature high-resolution iPRPs show combinations of rod (GNAT1, NR2E3) and cone (GNAT2, THRB) paths as well as both NRL and THRB regulons, but with a bias towards cone-ness (Figure 6). Combined FISH/immunofluorescence in fetal retina uncovers cone-biased RXRG-protein-high/NR2E3-protein-absent cone-fated cells that nevertheless expressed NR2E3 mRNA. Thus early cone-biased iPRP cells express rod gene mRNA, implying a rod-cone hybrid in early photoreceptor development. The authors refer to these as "bridge region iPRP cells".

      In Figure 7, they identify CHRNA1 as the most specific marker of these bridge cells (overlapping with ATOH7 and DLL3, previously linked to cone-biased precursors), and FISH shows it is expressed in rod-biased NRL protein-positive and cone-biased RXRG proteinpositive cones at fetal week 12.

      Figure 8 outlines the graded expression of various lncRNAs during cone maturation, a novel pattern.

      Finally (Figure 9), the authors identify differential genes expressed in early rods (ER cluster from Figure 1) vs early cones (LM cluster, excluding the most mature opsin+ cells), revealing high levels of MYCN targets in cones. They also find SYK expression in cones. SYK was previously linked to retinoblastoma, so intrinsic expression may predispose cone precursors to transformation upon RB loss. They finish by showing that a SYK inhibitor blocks the proliferation of dividing RB1 knockdown cone precursors in the human fetal retina.

      Overall, the authors have uncovered interesting patterns of biased expression in cone/rod developmental paths, especially relating to the isoform differences for NRL and THRB which add a new layer to our understanding of this fate choice. The analyses also imply that very soon after RPCs exit the cell cycle, they generate post-mitotic precursors biased towards a rod or cone fate, that carry varying proportions of mixed rod/cone determinants and other rod/cone marker genes. They also introduce new markers that may tag key populations of cells that precede the final rod/cone choice (e.g. CHRNA1), catalogue a new lncRNA gradient in cone maturation, and provide insight into potential genes that may contribute to retinoblastoma initiation, like SYK, due to intrinsic expression in cone precursors. However, as detailed below, the text needs to be improved considerably, and overinterpretations need to be moderated, removed, or tested more rigorously with extra data.

      Major Comments

      The manuscript is very difficult to follow. The nomenclature is at times torturous, and the description of hybrid rod/cone hybrid cells is confusing in many aspects.

      (1) A single term, iPRP, is used to refer to an initial low-resolution cluster, and then to a subset of that cluster later in the paper.

      We agree that using immature photoreceptor precursor (iPRP) for both high-resolution and lowresolution clusters was confusing. We kept this name for the low-resolution cluster (which includes both immature cone and immature rod precursors), renamed the high-resolution iPRP cluster immature cone precursors (iCPs). and renamed their transitional rod (TR) counterparts immature rod precursors (iRPs). These designations are based on 

      - the biased expression of THRB, ONECUT1, and the THRB regulon in iCPs (Fig. 5D,E);

      - the biased expression of NRL, NR2E3, and NRL regulon iRPs (Fig. 5D,E);

      - the partially distinct iCP and iRP UMAP positions (Figure 5C); and 

      - the evidence of similar immature cone versus rod precursor populations in the Zuo et al 3’ snRNA-seq dataset, as noted below and described in two new paragraphs starting at the bottom of p. 12.

      (2) To complicate matters further, the reader needs to understand the subset within the iPRP referred to as bridge cells, and we are told at one point that the earliest iPRPs lack NR2E3, then that they later co-express NR2E3, and while the authors may be referring to protein and RNA, it serves to further confuse an already difficult to follow distinction. I had to read and re-read the iPRP data many times, but it never really became totally clear.

      We agree that the description of the high-resolution iPRP (now “iCP”) subsets was unclear, although our further analyses of a large 3’ snRNA-seq dataset in Figure S11 support the impression given in the original manuscript that the earliest iCPs lack NR2E3 and then later coexpress NR2E3 while the earliest iRPs lack THRB and then later express THRB. As described in new text in the Two post-mitotic immature photoreceptor precursor populations section (starting on line 7 of p. 13): 

      When considering only the main cone and rod precursor UMAP regions, early (pcw 8 – 13) cone precursors expressed THRB and lacked NR2E3 (Figure S11D,E, blue arrows), while early (pcw 10 – 15) rod precursors expressed NR2E3 and lacked THRB (Figure S11D,E, red arrows), similar to RPC-localized iCPs and iRPs in our study (Figure 5D).

      Next, as summarized in new text in the Early cone and rod precursors with rod- and conerelated RNA co-expression section (new paragraph at top of p. 16): 

      Thus, a 3’ snRNA-seq analysis confirmed the initial production of immature photoreceptor precursors with either L/M cone-precursor-specific THRB or rod-precursor-specific NR2E3 expression, followed by lower-level co-expression of their counterparts, NR2E3 in cone precursors and THRB in rod precursors. However, in the Zuo et al. analyses, the co-expression was first observed in well-separated UMAP regions, as opposed to a region that bridges the early cone and early rod populations in our UMAP plots. These findings are consistent with the notion that cone- and rod-related RNA co-expression begins in already fate-determined cone and rod precursors, and that such precursors aberrantly intermixed in our UMAP bridge region due to their insufficient representation in our dataset.  

      Importantly, and as noted in our ‘Public response’ to Reviewer 1, “CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that coexpress cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).” In support of this notion, the immature cone precursors expressing CHRNA1  and other  populations did not overlap in UMAP space in the Zuo et al dataset. We hope the new text cited above along with other changes will significantly clarify the observations.

      (3) The term "cone/rod precursor" shows up late in the paper (page 12), but it was clear (was it not?) much earlier in this manuscript that cone and rod genes are co-expressed because of the coexpressed NRL and THRB isoforms in Figures 3/4.

      We thank the reviewer for noting that the differential NRL and THRB isoform expression already implies that cone and rod genes are co-expressed. However, as we now state, the co-expression of RNAs encoding an additional cone marker (GNAT2) and rod markers (GNAT1, NR2E3) was 

      “suggestive of a proposed hybrid cone/rod precursor state more extensive than implied by the coexpression of different THRB and NRL isoforms” (first paragraph of “Early cone and rod …” section on p. 14; new text underlined). 

      (4) The (incorrect) impression given later in the manuscript is that the rod/cone transcript mixture applies to just a subset of the iPRP cells, or maybe just the bridge cells (writing is not clear), but actually, neither of those is correct as the more abundant and more mature LM and ER populations analyzed earlier coexpress NRL and THRB mRNAs (Figures 2, 3). Overall, the authors need to vastly improve the writing, simplify/clarify the nomenclature, and better label figures to match the text and help the reader follow more easily and clearly. As it stands, it is, at best, obtuse, and at worst, totally confusing.

      We thank the reviewer for bringing the extent of the confusing terminology and wording to our attention. We revised the terminology (as in our response to point 1) and extensively revised the text.  We also performed similar analyses of the Zuo et al. data (as described in more detail in our response to Reviewer 2), which clarifies the distinct status of cells with the “rod/cone transcript mixture” and cells co-expressing early cone and rod precursor markers.  

      To more clearly describe data related to cells with rod- and cone-related RNA co-expression, we divided the former Figure 6 into two figures, with Figure 6 now showing the cone- and rodrelated RNA co-expression inferred from scRNA-seq and Figure 7 showing GNAT2 and NR2E3 co-expression in FISH analyses of human retina plus a new schematic in the new panel 7E.

      To separate the conceptually distinct analyses of cone and rod related RNA co-expression and the expression of early photoreceptor precursor markers (which were both found in the so-called bridge region – yet now recognized to be different subpopulations), we separated the analyses of the early photoreceptor precursor markers to form a new section, “Developmental expression of photoreceptor precursor markers and fate determinants,” starting on p. 16. 

      Additionally, we further review the findings and their implications in four revised Discussion paragraphs starting at the bottom of p. 23).

      (5) The data showing that overexpressing Tr-NRL in murine NIH3T3 fibroblasts blocks FL-NRL function is presented at the end of page 7 and in Figure 3G. Subsequent analysis two paragraphs and two figures later (end page 8, Figure 5C + supp figs) reveal that Tr-NRL protein is not detectable in retinoblastoma cells which derive from cone precursors cells and express Tr-NRL mRNA, and the protein is also not detected upon lentiviral expression of Tr-NRL in human fetal retinal explants, suggesting it is unstable or not translated. It would be preferable to have the 3T3 data and retinoblastoma/explant data juxtaposed. E.g. they could present the latter, then show the 3T3 that even if it were expressed (e.g. briefly) it would interfere with FL-NRL. The current order and spacing are somewhat confusing.

      We thank the reviewer for this suggestion and moved the description of the luciferase assays to follow the retinoblastoma and explant data and switched the order of Figure panels 3G and 3H.  

      (6) On page 15, regarding early rod vs early cone gene expression, the authors state: "although MYCN mRNA was not detected....", yet on the volcano plot in Figure S14A MYCN is one of the marked genes that is higher in cones than rods, meaning it was detected, and a couple of sentences later: "Concordantly, the LM cluster had increased MYCN RNA". The text is thus confusing.

      With respect, we note that the original text read, “although MYC RNA was not detected,” which related to a statement in the previous sentence that the gene ontology analysis identified “MYC targets.” However, given that this distinction is subtle and may be difficult for readers to recognize, we revised the text (now on p. 19) to more clearly describe expression of MYCN (but not MYC) as follows:

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also targets of MYCN, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss8–10.  Indeed, whereas MYC RNA was not detected, the LM cone cluster had increased MYCN RNA …”

      (7) The authors state that the SYK drug is "highly specific". They provide no evidence, but no drug is 100% specific, and it is possible that off-target hits are important for the drug phenotype. This data should be removed or validated by co-targeting the SYK gene along with RB1.

      We agree that our data only show the potential for SYK to contribute to the cone proliferative response; however, we believe the inhibitor study retains value in that a negative result (no effect of the SYK inhibitor) would disprove its potential involvement. To reflect this, we changed wording related to this experiment as follows:

      In the Abstract, we changed:

      (1) “SYK, which contributed to the early cone precursors’ proliferative response to RB1 loss” To: “SYK, which was implicated in the early cone precursors’ proliferative response to RB1 loss.”  

      (2) “These findings reveal … and a role for early cone-precursor-intrinsic SYK expression.” To:  “These findings reveal … and suggest a role for early cone-precursor-intrinsic SYK expression.”

      In the last paragraph of the Results, we changed:

      (1) “To determine if SYK contributes…” To:  “To determine if SYK might contribute…”

      (2) “the highly specific SYK inhibitor” To:  “the selective SYK inhibitor”  

      (3)  “indicating that cone precursor intrinsic SYK activity is critical to the proliferative response” To: “consistent with the notion that cone precursor intrinsic SYK activity contributes to the proliferative response.”

      In the Results, we added a final sentence: 

      “However, given potential SYK inhibitor off-target effects, validation of the role of SYK in retinoblastoma initiation will require genetic ablation studies.”

      In the Discussion (2nd-to-last paragraph), we changed: 

      “SYK inhibition impaired pRB-depleted cone precursor cell cycle entry, implying that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation.” To: “…the pRB-depleted cone precursors’ sensitivity to a SYK inhibitor suggests that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation, although genetic ablation of SYK is needed to confirm this notion.” In the Discussion last sentence, we changed:

      “enabled the identification of developmental stage-specific cone precursor features that underlie retinoblastoma predisposition.” To: “enabled the identification of developmental stage-specific cone precursor features that are associated with the cone precursors’ predisposition to form retinoblastoma tumors.”

      Minor/Typos

      Figure 7 legend, H should be D.

      We corrected the figure legend (now related to Figure 8).

      Reviewer #2 (Recommendations for the authors):

      (1) The author should take advantage of recently published human fetal retina data, such as PMID:39117640, which includes a larger dataset of cells that could help validate the findings. Consequently, statements like "To our knowledge, this is the first indication of two immediately post-mitotic photoreceptor precursor populations with cone versus rod-biased gene expression" may need to be revised.

      We thank the reviewer for noting the evidence of distinct immediately post-mitotic rod and cone populations published by others after we submitted our manuscript. In response, we omitted the sentence mentioned and extensively cross-checked our results including:

      - comparison of our early versus late cone and rod maturation states to the cone and rod precursor versus cone and rod states identified by Zuo et al (new paragraph on the top half of p. 6 and new figure panels S3G,H);

      - detection of distinct immediately post-mitotic versus later cone and rod precursor populations (two new paragraphs on pp. 12-13 and new Figures S10B and S11A-E); 

      - identification of cone and rod precursor populations that co-express cone and rod marker genes (two new paragraphs starting at the bottom of p. 15 and new Figures S11D-F);

      - comparison of expression patterns of immature cone precursor (iCP) marker genes in our and the Zuo et al dataset (new paragraph on top half of p. 17 and new Figure S13).

      We also compare the cell states discerned in our study and the Zuo et al. study in a new Discussion paragraph (bottom of p. 23) and new Figure S17.

      (2) The data generated comes from dissociated cells, which inherently lack spatial context. Additionally, it is unclear whether the dataset represents a pool of retinas from multiple developmental stages, and if so, whether the developmental stage is known for each cell profiled. If this information is available, the authors should examine the distribution of developmental stages on the UMAP and trajectory analysis as part of the quality control process. 

      We thank the reviewer for highlighting the importance of spatial context and developmental stage. 

      Related to whether the dataset represents a pool of retinae from multiple developmental stages, the different cell numbers examined at each time point are indicated in Figure S1A. To draw the readers’ attention to this detail, Figure S1A is now cited in the first sentence of the Results. 

      Related to the age-related cell distributions in UMAP plots, the distribution of cells from each retina and age was (and is) shown in Fig. S1F. In addition, we now highlight the age distributions by segregating the FW13, FW15-17, and FW17-18-19 UMAP positions in the new Figure 1C. We describe the rod temporal changes in a new sentence at the top of  p. 5:

      “Few rods were detected at FW13, whereas both early and late rods were detected from FW15-19 (Figure 1C), corroborating prior reports [15,20].”  

      We describe the cone temporal changes and note the likely greater discrimination of cell state changes that would be afforded by separately analyzing macula versus peripheral retina at each age in a new sentence at the bottom of p. 5:

      “L/M cone precursors from different age retinae occupied different UMAP regions, suggesting age-related differences in L/M cone precursor maturation (Figure 1C).”

      Moreover, they should assess whether different developmental stages impact gene expression and isoform ratios. It is well established that cone and rod progenitors typically emerge at different developmental times and in distinct regions of the retina, with minimal physical overlap. Grouping progenitor cells based solely on their UMAP positioning may lead to an oversimplified interpretation of the data.

      (2a) We agree that different developmental stages may impact gene expression and isoform ratios, and evaluated stages primarily based on established Louvain clustering rather than UMAP position. However, we also used UMAP position to segregate so-called RPC-localized and nonRPC-localized iCPs and iRPs, as well as to characterize the bridge region iCP sub-populations. In the revision, we examine whether cell groups defined by UMAP positions helped to identify transcriptomically distinct populations and further examine the spatiotemporal gene expression patterns of the same genes in the Zuo et al. 3’ snRNA-seq dataset. 

      (2b) Related to analyses of immediately post-mitotic iRPs and iCPs, the new Figure S10A expanded the violin plots first shown in Figure 5D to compare gene expression in RPC-localized versus non-RPC-localized iCPs and iRPs and subsequent cone and rod precursor clusters (also presented in response to Reviewer 3). The new Figure S10C, shows a similar analysis of UMAP region-specific regulon activities. These figures support the idea that there are only subtle UMAP region-related differences in the expression of the selected gene and regulons. 

      To further evaluate early cone and rod precursors, we compared expression patterns in our cluster- and UMAP-defined cell groups to those of the spatiotemporally defined cell groups in the Zuo et al. 3’ snRNA-seq study. The results revealed similar expression timing of the genes examined, although the cluster assignments of a subset of cells were brought into question, especially the assigned rod precursors at pcw 10 and 13, as shown in new Figures S10B (grey columns) and S11, and as described in two new paragraphs starting near the bottom of p.12. 

      (2c) Related to analyses of iCPs in the so-called bridge region, our analyses of the Zuo et al dataset helped distinguish early cone and rod precursor populations (expressing early markers such as ATOH7 and CHRNA1) from the later stages exhibiting rod- and cone-related gene coexpression, which had intermixed in the UMAP bridge region in our dataset. Further parsing of early cone precursor marker spatiotemporal expression revealed intriguing differences as now described in the second half of a new paragraph at the top of p. 17, as follows:

      “Also, different iCP markers had different spatiotemporal expression: CHRNA1 and ATOH7 were most prominent in peripheral retina with ATOH7 strongest at pcw 10 and CHRNA1 strongest at pcw 13; CTC-378H22.2 was prominently expressed from pcw 10-13 in both the macula and the periphery; and DLL3 and ONECUT1 showed the earliest, strongest, and broadest expression (Figure S13B). The distinct patterns suggest spatiotemporally distinct roles for these factors in cone precursor differentiation.”

      (3) I would commend the authors for performing a validation experiment via RNA in situ to validate some of the findings. However, drawing conclusions from analyzing a small number of cells can still be dangerous. Furthermore, it is not entirely clear how the subclustering is done. Some cells change cell type identities in the high-resolution plot. For example, some iPRP cells from the low-resolution plots in Figure 1 are assigned as TR in high-resolution plots in Figure 5.

      The authors should provide justification on the identifies of RPC localized iPRP and TR.

      Comparison of their data with other publicly available data should strengthen their annotation

      We agree that drawing conclusions from scRNA-seq or in situ hybridization analysis of a small number of cells can be dangerous and have followed the reviewer’s suggestion to compare our data with other publicly available data, focusing on the 3’ snRNA-seq of Zuo et al. given its large size and extensive annotation. Our analysis of  the Zuo et al. dataset helped clarify cell identities by segregating cone and rod precursors with similar gene expression properties in distinct UMAP regions. However, we noted that the clustering of early cone and rod precursors likely gave numerous mis-assigned cells (as noted in response 2b above and shown in the new Figure S11). It would appear that insights may be derived from the combination of relatively shallow sequencing of a high number of cells and deep sequencing of substantially fewer cells. 

      Related to how subclustering was done, the Methods state, “A nearest-neighbors graph was constructed from the PCA embedding and clusters were identified using a Louvain algorithm at low and high resolutions (0.4 and 1.6)[70],” citing the Blondel et al reference for the Louvain clustering algorithm used in the Seurat package.  To clarify this, the results text was revised such that it now indicates the levels used to cluster at low resolution (0.4, p. 4, 2nd paragraph) and at high resolution (1.6, top of p. 11) .

      Related to the assignment of some iPRP cells from the low-resolution plots in Figure 1 to the TR cluster (now called the ‘iRP’ ‘cluster) in the high-resolution plots in Figure 5, we suggest that this is consistent with Louvain clustering, which does not follow a single dendrogram hierarchy. 

      The justification for referring to these groups as RPC-localized iCPs and iRPs relates to their biased gene and regulon expression in Fig. 5D and 5E, as stated on p. 12: 

      “In the RPC-localized region, iCPs had higher ONECUT1, THRB, and GNAT2, whereas iRPs trended towards higher NRL and NR2E3 (p= 0.19, p=0.054, respectively).”

      (4) Late-stage LM5 cluster Figure 9 is not defined anywhere in previous figures, in which LM clusters only range from 1 to 4. The inconsistency in cluster identification should be addressed.

      We revised the text related to this as follows: 

      “Indeed, our scRNA-seq analyses revealed that SYK RNA expression increased from the iCP stage through cluster LM4, in contrast to its minimal expression in rods (Figure 10E).  Moreover, SYK expression was abolished in the five-cell group with properties of late maturing cones (characterized in Figure 1E), here displayed separately from the other LM4 cells and designated LM5 (Figure 10E).”  (p. 19-20)

      (5) Syk inhibitor has been shown to be involved in RB cell survival in previous studies. The manuscript seems to abruptly make the connection between the single-cell data to RB in the last figure. The title and abstract should not distract from the bulk of the manuscript focusing on the rod and cone development, or the manuscript should make more connection to retinoblastoma.

      We appreciate the reviewer’s concern that the title may seem to over-emphasize the connection to retinoblastoma based solely on the SYK inhibitor studies. However, we suggest the title also emphasizes the identification and characterization of early human photoreceptor states, per se, and that there are a number of important connections beyond the SYK studies that could warrant the mention of cell-state-specific retinoblastoma-related features in the title.

      Most importantly, a prior concern with the cone cell-of-origin theory was that retinoblastoma cells express RNAs thought to mark retinal cell types other than cones, especially rods. The evidence presented here, that cone precursors also express the rod-related genes helps resolve this issue. The issue is noted numerous times in the manuscript, as follows:  

      In the Introduction, we write:

      “However, retinoblastoma cells also express rod lineage factor NRL RNAs, which – along with other evidence – suggested a heretofore unexplained connection between rod gene expression and retinoblastoma development[12,13]. Improved discrimination of early photoreceptor states is needed to determine if co-expression of rod- and cone-related genes is adopted during tumorigenesis or reflects the co-expression of such genes in the retinoblastoma cell of origin.” (bottom, p. 2) And: 

      “In this study, we sought to further define the transcriptomic underpinnings of human  photoreceptor development and their relationship to retinoblastoma tumorigenesis.” (last paragraph, p. 3)

      The Discussion also alluded to this issue and in the revised Discussion, we aimed to make the connection clearer.  We previously ended the 3rd-to-last paragraph with,  

      “iPRP [now iCP] and early LM cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin.” 

      We now separate and elaborate on this point in a new paragraph as follows: 

      “Our characterization of cone and rod-related RNA co-expression may help resolve questions about the retinoblastoma cell of origin. Past studies suggested that retinoblastoma cells co-express RNAs associated with rods, cones, or other retinal cells due to a loss of lineage fidelity[12]. However, the early L/M cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin. This idea is further supported by the retinoblastoma cells’ preferential expression of cone-enriched NRL transcript isoforms (Figure S5B).” (middle of p. 24) Based on the above, we elected to retain the title.  

      Minor comments:

      (1) It is difficult to see the orange and magenta colors in the Fig 3E RNA-FISH image. The colors should be changed, or the contrast threshold needs to be adjusted to make the puncta stand out more.

      We re-assigned colors, with red for FL-NRL puncta and green for Tr-NRL puncta. 

      (2) Figure 5C on page 8 should be corrected to Supplementary Figure 5C.

      We thank the reviewer for noting this error and changed the figure citation.

      Reviewer #3 (Recommendations for the authors):

      (1) Minor concerns

      a. Abbreviation of some words needs to be included, example: FW. 

      We now provide abbreviation definitions for FW and others throughout the manuscript.  

      b. Cat # does not matches with the 'key resource table' for many reagents/kits. Some examples are: CD133-PE mentioned on Page # 22 on # 71, SMART-Seq V4 Ultra Low Input RNA Kit and SMARTer Ultra Low RNA Kit for the Fluidigm C1 Sytem on Page # 22 on # 77, Nextera XT DNA Library preparation kit on Page # 23 on # 77.

      We thank the reviewer for noting these discrepancies. We have now checked all catalog numbers and made corrections as needed.

      c. Cat # and brand name of few reagents & kits is missing and not mentioned either in methods or in key resource table or both. Eg: FBS, Insulin, Glutamine, Penicillin, Streptomycin, HBSS, Quant-iT PicoGreen dsDNA assay, Nextera XT DNA LibraryPreparation Kit, 5' PCR Primer II A with CloneAmp HiFi PCR Premix. 

      Catalog numbers and brand names are now provided for the tissue culture and related reagents within the methods text and for kits in the Key Resources Table. Additional descriptions of the primers used for re-amplification and RACE were added to the Methods (p. 28-29).

      d. Spell and grammar check is needed throughout the manuscript is needed. Example. In Page # 46 RXRγlo is misspelled as RXRlo.

      Spelling and grammar checks were reviewed.

      (2) Methods & Key Resource table.

      a. In Page # 21, IRB# needs to be stated.      

      The IRB protocols have been added, now at top of p. 26.

      b. In Page # 21, Did the authors dissociate retinae in ice-cold phosphate-buffered saline or papain?   

      The relevant sentence was corrected to “dissected while submerged in ice-cold phosphatebuffered saline (PBS) and dissociated as described10.” ( p. 26)

      c. In Page # 21, How did the authors count or enumerate the cell count? Provide the details.

      We now state, “… a 10 µl volume was combined with 10 µl trypan blue and counted using a hemocytometer” (top of p. 27)

      d. Why did the authors choose to specifically use only 8 cells for cDNA preparation in Page # 22? State the reason and provide the details.

      The reasons for using 8 cells (to prevent evaporation and to manually transfer one slide-worth of droplets to one strip of PCR tubes) and additional single cell collection details are now provided as follows (new text underlined): 

      “Single cells were sorted on a BD FACSAria I at 4°C using 100 µm nozzle in single-cell mode into each of eight 1.2 µl lysis buffer droplets on parafilm-covered glass slides, with droplets positioned over pre-defined marks … .  Upon collection of eight cells per slide, droplets were transferred to individual low-retention PCR tubes (eight tubes per strip) (Bioplastics K69901, B57801) pre-cooled on ice to minimize evaporation. The process was repeated with a fresh piece of parafilm for up to 12 rounds to collect 96 cells). (p. 27, new text underlined)

      e. Key resource table does not include several resources used in this study. Example - NR2E3 antibody.

      We added the NR2E3 antibody and checked for other omissions.

      (3) Results & Figures & Figure Legends

      a. Regulon-defined RPC and photoreceptor precursor states

      i. On page # 4, 1 paragraph - Clarify the sentence 'Exclusion of all cells with <100,000 cells read and 18 cells.........Emsembl transcripts inferred'. Did the authors use 18 cells or 18FW retinae? 

      The sentence was changed to:

      “After sequencing, we excluded all cells with <100,000 read counts and 18 cells expressing one or more markers of retinal ganglion, amacrine, and/or horizontal cells (POU4F1, POU4F2, POU4F3, TFAP2A, TFAP2B, ISL1) and concurrently lacking photoreceptor lineage marker OTX2. This yielded 794 single cells with averages of 3,750,417 uniquely aligned reads, 8,278 genes detected, and 20,343 Ensembl transcripts inferred (Figure S1A-C).” (p. 4, new words underlined)

      To clarify that 18 retinae were used, the first sentence of the Results was revised as follows:

      “To interrogate transcriptomic changes during human photoreceptor development, dissociated RPCs and photoreceptor precursors were FACS-enriched from 18 retinae, ages FW13-19 …” (p. 4).

      Why did the authors 'exclude cells lacking photoreceptor lineage marker OTX2' from analysis especially when the purpose here was to choose photoreceptor precursor states & further results in the next paragraph clearly state that 5 clusters were comprised of cells with OTX2 and CRX expression. This is confusing.

      We apologize for the imprecise diction. We divided the evidently confusing sentence into two sentences to more clearly indicate that we removed cells that did not express OTX2, as in the first response to the previous question.

      ii. In Page # 5, the authors reported the number of cell populations (363 large and 5 distal) identified in the THRB+ L/M-cone cluster. What were the # of cell populations identified in the remaining 5 clusters of the UMAP space?

      We added the cell numbers in each group to Fig. 1B. We corrected the large LM group to 366 cells (p. 5) and note 371 LM cells , which includes the five distal cells, in Figure 1B.

      b. Differential expression of NRL and THRB isoforms in rod and cone precursors

      i. In Figure 3B, the authors compare and show the presence of 5 different NRL isoforms for all the 6 clusters that were defined in 3A. However, in the results, the ENST# of just 2 highly assigned transcript isoforms is given. What are the annotated names of the three other isoforms which are shown in 3B? Please explain in the Results.

      As requested, we now annotate the remaining isoforms as encoding full-length or truncated NRL in Fig. 3B and show isoform structures in new Supplementary Figure S4B.  We also refer to each transcript isoform in the Results (p. 7, last paragraph) and similarly evaluate all isoforms in RB31 cells (Fig. S5B).

      ii. What does the Mean FPM in the y-axis of Fig 3C refer to?

      Mean FPM represents mean read counts (fragments per million, FPM) for each position across Ensembl NRL exons for each cluster, as now stated in the 6th line of the Fig. 3 legend.

      iii. A clear explanation of the results for Figures 3E-3F is missing.

      We revised the text to more clearly describe the experiment as follows:

      “The cone cells’ higher proportional expression of Tr-NRL first exon sequences was validated by RNA fluorescence in situ hybridization (FISH) of FW16 fetal retina in which NRL immunofluorescence was used to identify rod precursors, RXRg immunofluorescence was used to identify cone precursors, and FISH probes specific to truncated Tr-NRL exon 1T or FL-NRL exons 1 and 2 were used to assess Tr-NRL and FL-NRL expression (Figure 3E,F).” (p. 8, new text underlined).

      c. Two post-mitotic photoreceptor precursor populations

      i. Although deep-sequencing and SCENIC analysis clarified the identities of four RPC-localized clusters as MG, RPC, and iPRP indicative of cone-bias and TR indicative of rod-bias. It would be interesting to see the discriminating determinant between the TR and ER by SCENIC and deep-sequencing gene expression violin/box plots.

      We agree it is of interest to see the discriminating determinant between the TR [now termed iRP] and ER clusters by SCENIC and deep-sequencing gene expression violin/box plots. We now provide this information for selected genes and regulons of interest in the new Supplementary Figures S10A and S10C, along with a similar comparison between the prior high-resolution iPRP (now termed iCP) cluster and the first high-resolution LM cluster, LM1, as described for gene expression on p. 12:

      “Notably, THRB and GNAT2 expression did not significantly change while ONECUT1 declined in the subsequent non-RPC-localized iCP and LM1 stages, whereas NR2E3 and NRL dramatically increased on transitioning to the ER state (Figure S10A).”

      And as described for regulon activities on pp. 13-14:

      “Finally, activities of the cone-specific THRB and ISL2 regulons, the rod-specific NRL regulon, and the pan-photoreceptor LHX3, OTX2, CRX, and NEUROD1 regulons increased to varying extents on transitioning from the immature iCP or iRP states to the early-maturing LM1 or ER states (Figure 10C).”

      We also show expression of the same genes for spatiotemporally grouped cells from the Zuo et al. dataset in the new Figure S10B, which displays a similar pattern (apart from the possibly mixed pcw 10 and pcw13 designated rod precursors).

      d. Early cone precursors with cone- and rod-related RNA expression

      i. On page #12, the last paragraph where the authors explain the multiplex RNA FISH results of RXRγ and NR2E3 by citing Figure S8E. However, in Fig S8E, the authors used NRL to identify the rods. Please clarify which one of the rod markers was used to perform RNA FISH?

      Figure S8E (where NRL was used as a rod marker) was cited to remind readers that RXRg has low expression in rods and high expression in cones, rather than to describe the results of this multiplex FISH section. To avoid confusion on this point, Figure S8E is now cited using “(as earlier shown in Figure S8E).” With this issue clarified, we expect the markers used in the FISH + IF analysis will be clear from the revised explanation, 

      “… we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15).

      To provide further clarity, we provide a diagram of the FISH probes, protein markers, and expression patterns in the new Figure 7E.

      ii. The Y-axis of Fig 6G-6H needs to be labelled.

      The axes have been re-labeled from “Nb of cells” to “Number of RXRg+ outermost NBL cells in each region” (original Fig. 6G, now Fig. 7C) and “Number of RXRg+ middle NBL cells in each region” (original Fig. 6H, now Fig. 7D).

      iii. The legends of Figures 6G and 6H are unclear. In the Figure 6G legend, the authors indicate 'all cells are NR2E3 protein-'. Does that imply the yellow and green bars alone? Similarly, clarify the Figure 6H legend, what does the dark and light magenta refer to? What does the light magenta color referring to NR2E3+/ NR2E3- and the dark magenta color referring to NR2E3+/ NR2E3+ indicate? 

      We regret the insufficient clarity. We revised the Fig. 6G (now Fig. 7C) key, which now reads

      “All outermost NBL cells are NR2E3 protein-negative.”  We added to the figure legend for panel 7C,D “(n.b., italics are used for RNAs, non-italics for proteins).”  The new scheme in Figure 7E shows the RNAs in italics proteins in non-italics. We hope these changes will clarify when RNA or protein are represented in each histogram category.

      Overall, the results (on page # 13) reflecting Figures 6E-6H & Figure S11 are confusing and difficult to understand. Clear descriptions and explanations are needed.

      We revised this results section described in the paragraph now spanning p. 14:

      -  We now refer to the bar colors in Figures 7C and 7D that support each statement. 

      -  We provide an illustration of the findings in Figure 7E.

      iv. Previously published literature has shown that cells of the inner NBL are RXRγ+ ganglion cells. So, how were these RXRγ+ ganglion cells in the inner NBL discriminated during multiplex RNA FISH (in Fig 6E-6H and in Fig S11)?

      We thank the reviewer for requesting this clarification. We agree that “inner NBL” is the incorrect term for the region in which we examined RXRg+ photoreceptor precursors, as this could include RXRγ+ nascent RGCs. We now clarify that 

      “we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .”  (p. 14-15) We further state, 

      “Limiting our analysis to the outer and middle NBL allowed us to disregard RXRγ+ retinal ganglion cells in the retinal ganglion cell layer or inner NBL (top of p. 15)”

      Figure 7E is provided to further aid the reader in understanding the positions examined, and the legend states “RXRg+ retinal ganglion cells in the inner NBL and ganglion cell layer not shown. 

      v. In Figure 6E, what marker does each color cell correspond to?

      In this figure (now panel 7A), we declined to provide the color key since the image is not sufficiently enlarged to visualize the IF and FISH signals. The figure is provided solely to document the regions analyzed and readers are now referred to “see Figure S12 for IF + FISH images” (2nd line, p. 15), where the marker colors are indicated.

      vi. In Figure S11 & 6E, Protein and RNA transcript color of NR2E3, GNAT2 are hard to distinguish. Usage of other colors is recommended.  

      We appreciate the reviewer’s concern related to the colors (in the now redesignated Figure S12 and 7A); however, we feel this issue is largely mitigated by our use of arrows to point to the cells needed to illustrate the proposed concepts in Figure S12B. All quantitation was performed by examining each color channel separately to ensure correct attribution, which is now mentioned in the Methods (2nd-to-last line of Quantitation of FISH section, p. 35).

      vii. 

      With due respect, we suggest that labeling each box (now in Figure 8B) makes the figure rather busy and difficult to infer the main point, which is that boxed regions were examined at various distanced from the center (denoted by the “C” and “0 mm”) with distances periodically indicated. We suggest the addition of such markers would not improve and might worsen the figure for most readers.    

      e. An early L/M cone trajectory marked by successive lncRNA expression

      i. In Figure 8C - color-coded labelling of LM1-4 clusters is recommended.

      We note Fig. 8C (now 9C) is intended to use color to display the pseudotemporal positions of each cell. We recognize that an additional plot with the pseudotime line imposed on LM subcluster colors could provide some insights, yet we are unaware of available software for this and are unable to develop such software at present. To enable readers to obtain a visual impression of the pseudotime vs subcluster positions, we now refer the reader to Figure 5A in the revised figure legend, as follows:  (“The pseudotime trajectory may be related to LM1-LM4 subcluster distributions in Figure 5A.”).

      ii. In Figure 8G - what does the horizontal color-coded bar below the lncRNAs name refer to? These bars are similar in all four graphs of the 8G figure.

      As stated in the Fig. 8G (now 9G) legend, “Colored bars mark lncRNA expression regions as described in the text.”  We revised the text to more clearly identify the color code. (p. 18-19)   

      f. Cone intrinsic SYK contributions to the proliferative response to pRB loss

      i. In Fig 9F - The expression of ARR3+ cells (indicated by the green arrow in FW18) is poorly or rarely seen in the peripheral retina.

      We thank the reviewer for finding this oversight. In panel 9F (now 10F), we removed the green arrows from the cells in the periphery, which are ARR3- due to the immaturity of cones in this region. 

      ii. In Figure 9F - Did the authors stain the FW16 retina with ARR3?

      Unfortunately, we did not stain the FW16 retina for ARR3 in this instance.

      iii. Inclusion of DAPI staining for Fig 9F is recommended to justify the ONL & INL in the images.

      We regret that we are unable to merge the DAPI in this instance due to the way in which the original staining was imaged.  A more detailed analysis corroborating and extending the current results is in progress. 

      iv. Immunostaining images for Figure 9G are missing & are required to be included. What does shSCR in Fig 9G refer to?

      We now provide representative immunostaining images below the panel (now 10G). The legend was updated: “Bottom: Example of Ki67, YFP, and RXRg co-immunostaining with DAPI+ nuclei (yellow outlines). Arrows: Ki67+, YFP+, RXRg+ nuclei.”  The revised legend now notes that shSCR refers to the scrambled control shRNA.

      v. For Figure 9H - Is the presence and loss of SYK activity consistent with all the subpopulations (S & LM) of early maturing and matured cones?

      We appreciate the reviewer’s question and interest (relating to the redesignated Figure 10H); however, we have not yet completed a comprehensive evaluation of SYK expression in all the subpopulations (S & LM) of early maturing and matured cones and will reserve such data for a subsequent study. We suggest that this information is not critical to the study’s major conclusions.

      vi. Figure 9A is not explained in the results. Why were MYCN proteins assessed along with ARR3 and NRL? What does this imply?

      We thank the reviewer for noting that this figure (now Figure 10A) was not clearly described. 

      As per the response to Reviewer 1, point 6 , the text now states,  

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also MYCN targets, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss [8–10].” (middle, p. 19, new text underlined).

      Hence, the figure demonstrates the cone cell specificity of high MYCN protein.  This is further noted in the Fig. 10a legend: “A. Immunofluorescent staining shows high MYCN in ARR3+ cones but not in NRL+ rods in FW18 retina.”

    1. Author response:

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatialnumerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an LR bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc. 

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      “EGFRvIII is mainly associated with the classical subtype, so the mesenchymal subtype might be unexpected here. This could be commented on.” 

      We acknowledge that EGFRvIII is most often associated with the classical subtype of glioblastoma and agree that mesenchymal subtype classification may be unexpected given the use of her4.1:EGFRvIII as a driver in our model. We would like to highlight the fact that our brain tumors do also express certain markers associated with the classical subtype including neural precursor and neural stem cell markers like sox2, ascl1b, and gli2 (Supplementary Fig 4, 5; Supplementary Table 1-3). However, our transcriptomic data was not found to significantly enrich for classical subtype gene expression, compared to normal brains. This could be due to a significant contribution of normal brain tissue to our analyses (bulk tumor burdened brains were harvested for RNA sequencing), as well as the significant contribution of mesenchymal subtype signatures and/or inflammatory gene expression in our brain tumor-positive samples. Because signatures associated with inflammation consist of some of the most highly upregulated genes in our samples, this could potentially dilute out and/or lessen alterative subtype and/or signature gene expression. Importantly, it is now widely appreciated that patient tumors simultaneously consist of heterogenous tumor cells reflecting multiple molecular subtypes (Couturier et al., 2020; Darmanis et al., 2017; Neftel et al., 2019), providing glioblastoma with a high level of phenotypic plasticity. We also demonstrate that the contribution of additional drivers not always present with EGFRvIII in patient glioblastoma enhances primary brain tumors in vivo. This result is consistent with more aggressive glioblastomas seen in patients with EGFRvIII variants and TP53 loss-of-function mutations (Ruano et al., 2009). It will therefore be interesting in the future to consider how single or multiple driver mutations contribute to subtype-specific gene expression in our model, as well as histopathology, relative to patients. We have included some of these discussion points to our revised manuscript.     

      “Some more histologic characterization of the tumors would be helpful. Are they invasive, do larger tumors show necrosis and microvascular proliferation? This would help with understanding the full potential of the new model.”

      We have updated our manuscript to include more histolopathological characterization and images (Supplementary Fig 2).

      “Current thinking in established glioblastoma is that the M1/M2 designations for macrophages are not relevant, with microglia macrophage populations showing a mixture of pre- and anti-inflammatory features. Ideally, there would be a much more detailed characterization of the intratumoral microglia/macrophage population here, as single markers can’t be relied upon.”

      We performed additional gene set enrichment analyses (GSEA) using our sequencing datasets and compared p53EPS gene expression to M1/M2 macrophage expression signatures and expression signatures from MCSF-stimulated macrophages at early and late (M2 polarized) time-points. From this analysis, we detected enrichment for markers of both pro- and antiinflammatory features, however, with stronger and significant enrichment for gene expression signatures associated with classical pro-inflammatory M1 macrophages. We have included these GSEA plots and gene set enrichment lists as supplementary materials (Supplementary Fig 6, Supplementary Table 6). We also performed GSEA against a broad curated set of immunologic gene sets (C7: immunologic signature gene sets, Molecular Signatures Database, (Liberzon et al., 2011)) and have included the list of signatures and enrichment scores as a supplementary table (Supplementary Table 6). 

      “Phagocytosis could have anti-tumor effects through removal of live cancer cells or could be cancer-promoting if apoptotic cells are being rapidly cleared with concomitant activation of an immunosuppressive phenotype in the phagocytes (ie. efferocytosis).” 

      We looked at efferocytosis-associated gene expression in our sequencing dataset (124 “efferocytosis” genes, GeneCards), and while we detected upregulation of certain genes associated with efferocytosis in p53EPS brains, we did not detect significant enrichment for the entire gene set. Furthermore, we did not detect up-regulation of key efferocytosis receptors including Axl and Tyro3 (Supplementary Table 1, 2), compared to normal brains. While efferocytosis may contribute to tumor growth and evolution, this GSEA combined with our functional data supporting an inhibitory role for phagocytes in p53EPS tumor initiation and engraftment following transplantation (Fig 4, Fig 5, Supplementary Fig 7), suggests that efferocytosis is not a major driver of tumor formation in our model. However, how efferocytosis affects tumor progression in our model and/or relapse following therapy will be an interesting feature to explore in the future using temporal manipulations of phagocytes and/or treatments with chemical inhibitors.

      Author response image 1.

      Gene Set Enrichment Analysis (GSEA) for efferocytosis-associated gene expression (124 “efferocytosis” genes in GeneCards) in tp53EPS tumor brains, compared to normal zebrafish brains. Normalized enrichment score (NES) and p-value are indicated. 

      “Do the irf7/8 and chlodronate experiments distinguish between effects on microglia/macrophages and dendritic cells?”

      In addition to microglia/macrophages, the IRF8 transcription factor has been shown to control survival and function of dendritic cells (Sichien et al., 2016). Chlodronate treatments are also used to deplete both macrophages and dendritic cells in vivo. Therefore, we cannot distinguish the effects of these manipulations in our experiments and have updated our manuscript throughout to reflect this.     

      Reviewer #2:

      “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. It would be important to include a wild-type or uninjected control for the pERK and pAKT staining shown in Fig1 I-K to aid in the interpretation of these results. Likewise, quantification of the pERK and pAKT staining would be useful to demonstrate the increase over WT, and would also serve to facilitate comparison with the similar staining in the KPG model (Supp Fig 2D).”

      We have updated Fig 1 and Supplementary Fig 3D (formerly Fig 2D), to include histology from tumor-free uninjected control animals, as well as quantifications of p-ERK and p-AKT staining to highlight increased MAPK/AKT signaling pathway activation in our tumor model.  

      “The authors use a transplantation assay to further test the tumorigenic potential of dissociated cells from glial-derived tumors. Listing the percentage of transplants that generate fluorescent tumor would be helpful to fully interpret these data. Additionally, it was not clear based on the description in the results section that the transplantation assay was an “experimental surrogate” to model the relapse potential of the tumor cell. This is first mentioned in the discussion. The authors may consider adding a sentence for clarity earlier in the manuscript as it helps the reader better understand the logic of the assay.” 

      We have clarified in the text the percentage of transplants that generated fluorescent tumor (1625%, n=3 independent screens). This is also represented in Fig 5C,D. We also added text when introducing the transplantation assay, explaining that transplantation is frequently used as an experimental surrogate to assess relapse potential, and that our objective was to assess tumor cell propagation in the context of specific manipulations within the TME.  

      “The authors nicely show high levels of immune cell infiltration and associations between microglia/macrophages and tumor cells. However, a quantification of the emergence of macrophages over time in relation to tumor initiation and growth would provide significant support to the observations of tumor suppressive activity of the phagocytes. Along these lines, the inclusion of a statement about when leukocytes emerge during normal development would be informative for those not familiar with the zebrafish model.”

      In zebrafish, microglia colonize the neural retina by 48 hpf, and the optic tectum by 84 hpf (Herbomel et al., 2001), prior to when we typically observe lesions in our p53EPS brains. To validate the emergence of microglia prior to tumor formation in p53EPS, we have now used live confocal imaging through the brains of uninjected control and p53EPS injected zebrafish at 5, 7 and 9 dpf. As expected, microglia were present throughout the cephalic region and in the brain at 5 dpf (120 hpf). At this stage, p53EPS injected zebrafish brains displayed mosaic cellular expression of her4.1:mScarlet; however, cells were sparse and diffuse, and no large intensely fluorescent tumor-like clusters were detected at this stage (n=12/12 tumor negative). At 7 dpf, microglia were observed in the brains of control and p53EPS zebrafish; however, at this stage we detected clusters of her4.1:mScarlet+ cells (n=5/9), indicative of tumor formation. Lesions were found to be surrounded and/or infiltrated by mpeg:_EGFP+ microglia. Finally, at 9 dpf _her4.1:mScarlet+ expression became highly specific to tumor lesions, and these lesions were associated with _mpeg:_EGFP+ microglia/macrophages (n=8/8 of tumor-positive zebrafish). These descriptions along with representative images has been added to Figure 3.

      “From the data provided in Figure 4G and Supp Fig 7b, the authors suggest that “increased p53EPS tumor initiation following Irf gene knock-down is a consequence of irf7 and irf8 loss-of-function in the TME.” Given the importance of the local microenvironment highlighted in this study, spatial information on the form of in situ hybridization to identify the relevant location of the expression change would be important to support this conclusion.”

      We performed fluorescent in situ hybridization (using HCR RNA-FISH, Molecular Instruments) on whole mount control and irf7 CRISPR-injected p53EPG animals (her4.1:EGFRvIII +her4.1:PI3KCAH1047R + her4.1:GFP, GFP was used in this case because of probe availability).

      Representative confocal projections through tumors, as well as single optical sections are presented and discussed in Figure 4, highlighting the location of irf7 expression change following gene knock-down. We found significant irf7 signal in and surrounding p53EPS tumors at early stages of tumor formation_. This expression was reduced and/or lost following _irf7 CRISPR gene targeting, consistent with RT-PCR data (Supplementary Fig 7).          

      “The authors used neutral red staining that labels lysosomal-rich phagocytes to assess enrichment at the early stages of tumor initiation. The images in Figure 3 panel A should be labeled to denote the uninjected controls to aid in the interpretation of the data. In Supplemental Figure 6, the neutral red staining in the irf8 CRISPR-injected larvae looks to be increased, counter to the quantification. Can the authors comment if the image is perhaps not representative?”

      We have updated Figure 3 and Supplementary Figure 6 to aid in the interpretation of our results. In Fig 3A, we used tumor-negative controls from our injected cohorts. This was done to control for exogenous transgene presence and/or over-expression prior to (or in the absence of) malignant transformation. In Supplementary Fig 6, our images are representative, but we have now used unprocessed images with arrowheads to highlight neutral-red positive foci for clarity. In our original manuscript the images contained software generated markers, which could have obscured and/or confused the neutral red staining we were trying the highlight.    

      Recommendations For the Authors:

      Reviewer #1: 

      “The PI 3-kinase does a lot more than just activating mTOR and Akt – I would suggest modifying that sentence in the introduction.”

      We have adjusted text in the introduction to reflect the broad role for PI3K signaling.

      Reviewer #2:

      “In Supplemental Fig 1, it would be helpful for the authors to provide a co-stain, such as DAPI to label all nuclei, which would allow the reader to assess the morphology of the cells in the context of the surrounding tissue.”

      We have included brightfield images in Supplementary Fig 1, that together with her4.1:mScarlet fluorescence, should help readers assess tumor location and morphology in the context of surrounding tissue. Tumor cell morphology at high-resolution can be visualized in Fig 3, Movie 1 and Movie 2.

      “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. The authors may consider testing if the addition of an inhibitor of MAPK signaling may prevent or decrease the formation of glial-derived tumors in this context to further support their results.” 

      To further assess the role for MAPK activation, we decided to test the effect of 50uM AZD6244 MAPK inhibitor following transplantation of dissociated primary p53EPS cells into syngeneic CG1 strain zebrafish embryos, similar to as previously described (Modzelewska et al., 2016). Following 5 days of drug treatments, we did not detect significant differences in tumor engraftment or in tumor size between DMSO control and AZD6244-treated cohorts, suggesting that MAPK inhibition is not sufficient to prevent p53EPS engraftment and growth in our model. In the future, assessments of on-target drug effects, possible resistance mechanisms, and/or testing MAPK inhibitors in combination with other targeted agents including Akt and/or mTOR inhibitors (Edwards et al., 2006; McNeill et al., 2017; Schreck et al., 2020) will enhance our understanding of potential therapeutic strategies.

      Author response image 2.

      Dorsal views of 8 dpf zebrafish larvae engrafted with her4.1:mScarlet+ p53EPS tumor cells following treatment from 3-8dpf with 0.1% DMSO (control) or 50uM AZD6244. Tumor cell injections were performed at 2 dpf into syngeneic CG1 strain embryos. The percentage of total animals with persisting engraftment following drug treatments, as well as tumor size (microns squared, quantified using Carl Zeiss ZEN software) are shown for control and AZD6244 treated larvae. 

      “Have the authors tested if EGFR and PI3KCA driven by other neural promoters produce similar results, or not? This would help support the specificity of her4.1 neural progenitors and glia as the cell of origin in this model.”

      At this time, we have not tested other neural promoters. However, previous reports describe a zebrafish zic4-driven glioblastoma model with mesenchymal-like gene expression (Mayrhofer et al., 2017), supporting neural progenitors as a cell of origin. In the future it will be interesting to test sox2, nestin, and gfap promoters to further define and support her4.1-expressing neural progenitors and glia as the cell of origin in our model.

      “Other leukocyte populations, such as neutrophils, can also respond to inflammatory cues. Can the authors comment if neutrophils are also observed in the TME?”

      We performed initial assessments of neutrophils in the TME using our expression datasets as well as her4.1:EGFRvIII + her4.1:PI3KCAH1047R co-injection into Tg(mpx:EGFP) strain zebrafish. We observed tumor formation without significant infiltration of mpx:EGFP+ neutrophils. Future investigations will be important to assess differences in the contributions of different myeloidderived lineages in the TME of p53EPS, as well as how heterogeneity may be altered depending on different oncogenic drivers and/or stage of tumor progression, as seen in human glioblastoma (Friedmann-Morvinski and Hambardzumyan, 2023). We have added text in the disscussion section of our manuscript to indicate the possibility of neutrophils and/or other immune cell types contributing to p53EPS tumor biology. 

      Author response image 3.

      Control-injected tumornegative and tumor-positive Tg(mpx:EGFP) zebrafish at 10 dpf. Tg(mpx:EGFP) strain embryos were injected at the one-cell stage with her4.1:EGFRvIII + her4.1:PI3KCAH1047R + her4.1:mScarlet.

      “It is not clear if the transcriptomics data has been deposited in a publicly available database, such as the Gene Expression Omnibus (GEO). Sharing of these data would be a benefit to the field and facilitate use in other studies.”

      We have uploaded all transcriptomic data to GEO under accession GSE246295.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors engineer the endogenous left boundary of the Drosophila eve TAD, replacing the endogenous Nhomie boundary by either a neutral DNA, a wildtype Nhomie boundary, an inverted Nhomie boundary, or a second copy of the Homie boundary. They perform Micro-C on young embryos and conclude that endogenous Nhomie and Homie boundaries flanking eve pair with head-to-tail directionality to form a chromosomal stem loop. Abrogating the Nhomie boundary leads to ectopic activation of genes in the former neighboring TAD by eve embryonic stripe enhancers. Replacing Nhomie by an inverted version or by Homie (which pairs with itself head-to-head) transformed the stem loop into a circle loop. An important finding was that stem and circle loops differentially impact endogenous gene regulation both within the eve TAD and in the TADs bracketing eve. Intriguingly, an eve TAD with a circle loop configuration leads to ectopic activation of flanking genes by eve enhancers - indicating compromised regulatory boundary activity despite the presence of an eve TAD with intact left and right boundaries.

      Strengths:

      Overall, the results obtained are of high-quality and are meticulously discussed. This work advances our fundamental understanding of how 3D genome topologies affect enhancer-promoter communication.

      Weaknesses:

      Though convincingly demonstrated at eve, the generalizability of TAD formation by directional boundary pairing remains unclear, though the authors propose this mechanism could underly the formation of all TADs in Drosophila and possibly even in mammals. Strong and ample evidence has been obtained to date that cohesin-mediated chromosomal loop extrusion explains the formation of a large fraction of TADs in mammals. 

      (1.1) The difficultly with most all of the studies on mammal TADs, cohesin and CTCF roadblocks is that the sequencing depth is not sufficient, and large bin sizes (>1 kb) are needed to visualize chromosome architecture.  The resulting contact profiles show TAD neighborhoods, not actual TADs.

      The problem with these studies is illustrated by comparing the contact profiles of mammalian MicroC data sets at different bin sizes in Author response image 1.  In this figure, the darkness of the “pixels” in panels E, F, G and H was enhanced by reducing brightness in photoshop.

      Author response image 1.

      Mammalian MicroC profiles different bun sizes

      Panels A and C show “TADs” using bin sizes typical of most mammalian studies (see Krietenstein et al. (2023) (Krietenstein et al. 2020)).  At this level of resolution, TADs, the “trees” that are the building blocks of chromosomes, are not visible.  Instead, what is seen are TAD neighborhoods or “forests”.  Each neighborhood consists of several dozen individual TADs.  The large bins in these panels also artificially accentuated TAD:TAD interactions, generating a series of “stripes” and “dots” that correspond to TADs bumping into each other and sequences getting crosslinked.  For example, in panel A there is prominent stripe on the edge of a “TAD” (blue arrow).  In panel C, this stripe resolves into a series of dots arranged as parallel, but interrupted “stripes” (green and blue arrows).  At the next level of resolution, it can be seen that the stripe marked by the blue arrow and magenta asterisk is generated by contacts between the left boundary of the TAD indicated by the magenta bar with sequences in a TAD (blue bar) ~180 kb way.  While dots and stripes are prominent features in contact profiles visualized with larger bin sizes (A and C), the actual TADs that are observed with a bin size of 200 bp (examples are underlined by black bars in panel G) are not bordered by stripes, nor are they topped by obvious dots.  The one possible exception is the dot that appears at the top of the volcano triangle underlined with magenta.

      The chromosome 1 DNA segment from the MicroC data of Hseih et al. (2023) (Hsieh et al. 2020) shows a putative volcano triangle with a plume (indicated by a V in Author response image 1 panels D, F and H).  Sequences in the V TAD don’t crosslink with their immediate neighbors, and this gives a “plume” above the volcano triangle, as indicate by the light blue asterisk in panels D, F and H.  Interestingly the V TAD does contact two distant TADs, U on the left and W on the right. The U TAD is ~550 kb from V, and the region of contact is indicated by the black arrow.  The W TAD is ~585 kb from V, and the region of contact is indicated by the magenta arrow.  While the plume still seems to be visible with a bin size of 400 bp (light blue asterisk), it is hard to discern when the bin size is 200 bp, as there are not enough reads.

      The evidence demonstrating that cohesin is required for TAD formation/maintenance is based on low resolution Hi-C data, and the effects that are observed are on TAD neighborhoods (forests) and not TADs (trees).  In fact, there is published evidence that cohesin is not required in mammals for TAD formation/maintenance.  In an experiment from Goel et al. 2023 the authors depleted the cohesin component Rad21 and then visualized the effects on TAD organization using the high resolution region capture MicroC (RCMC) protocol.  The MicroC contact map in this figure visualizes a ~250 kb DNA segment around the Ppm1pg locus at 250 bp resolution.  On the right side of the diagonal is the untreated control, while the left side shows the MicroC profile of the same region after Rad21 depletion.  The authors indicated that there was a 97% depletion of Rad21 in their experiment.  However, as is evident from a comparison of the experimental and control, loss of Rad21 has no apparent effect on the TAD organization of this mammalian DNA segment.

      Several other features are worth noting.  First, unlike the MicroC experiments shown in Author response image 1, there are dots at the apex of the TADs in this chromosomal segment.  In the MicroC protocol, fixed chromatin is digested to mononucleosomes by extensive MNase digestion.  The resulting DNA fragments are then ligated, and dinucleosome-length fragments are isolated and sequenced. 

      DNA sequences that are nucleosome free in chromatin (which would be promoters, enhancers, silencers and boundary elements) are typically digested to oligonucleotides in this procedure and won’t be recovered. This means that the dots shown here must correspond to mononucleosome-length elements that are MNase resistant.  This is also true for the dots in the MicroC contact profiles of the Drosophila Abd-B regulatory domain (see Fig. 2B in the paper).  Second, the TADs are connected to each other by 45o stripes (see blue and green arrowheads).  While it is not clear from this experiment whether the stipes are generated by an active mechanism (enzyme) or by some “passive” mechanism (e.g., sliding), the stripes in this chromosomal segment are not generated by cohesin, as they are unperturbed by Rad21 depletion.  Third, there are no volcano triangles with plumes in this chromosomal DNA segment.  Instead, the contact patterns (purple and green asterisks) between neighboring TADs closely resemble those seen for the Abd-B regulatory domains (compare Goel et al. 2023 with Fig. 2B in the paper).  This similarity suggests that the TADs in and around Ppm1g may be circle-loops, not stem-loops.  As volcano triangles with plumes also seem to be rare in the MicroC data sets of Krietenstein et al. (Krietenstein et al. 2020) and Hesih et al. (Hsieh et al. 2020) (with the caveat that these data sets are low resolution: see Author response image 1), it is possible that much of the mammalian genome is assembled into circle-loop TADs, a topology that can’t be generated by the cohesin loop extrusion (bolo tie clip) /CTCF roadblock model.

      While Rad21 depletion has no apparent effect on TADs, it does appear to impact TAD neighborhoods.  This is in a supplemental figure in Goel et al. (Goel et al. 2023).  In this figure, TADs in the Ppm1g region of chromosome 5 are visualized with bin sizes of 5 kb and 1 kb.  A 1.2 Mb DNA segment is shown for the 5 kb bin size, while an 800 kb DNA segment is shown for the 1 kb bin size.  As can be seen from comparing the MicroC profiles in Author response image 2 with that in Goel et al. 2023, individual TADs are not visible.  Instead, the individual TADs are binned into large TAD “neighborhoods” that consist of several dozen or more TADs.

      Unlike the individual TADs shown in Goel et al. 2023, the TAD neighborhoods in Author response image 2 are sensitive to Rad21 depletion.  The effects of Rad21 depletion can be seen by comparing the relative pixel density inside the blue lines before (above the diagonal) and after (below the diagonal) auxin-induced Rad21 degradation.  The reduction in pixel density is greatest for more distant TAD:TAD contacts (farthest from the diagonal).  By contrast, the TADs themselves are unaffected (Goel et al. 2023), as are contacts between individual TADs and their immediate neighbors.  In addition, contacts between partially overlapping TAD neighborhoods are also lost.  At this point it isn’t clear why contacts between distant TADs in the same neighborhood are lost when Rad21 is depleted; however, a plausible speculation is that it is related to the functioning of cohesin in holding newly replicated DNAs together until mitosis and whatever other role it might have in chromosome condensation.

      Author response image 2.

      Ppm1g full locus chr5

      Moreover, given the unique specificity with which Nhomie and Homie are known to pair (and exhibit "homing" activity), it is conceivable that formation of the eve TAD by boundary pairing represents a phenomenon observed at exceptional loci rather than a universal rule of TAD formation. Indeed, characteristic Micro-C features of the eve TAD are only observed at a restricted number of loci in the fly genome…..

      (1.2) The available evidence does not support the claim that nhomie and homie are “exceptional.”  To begin with, nhomie and homie rely on precisely the same set of factors that have been implicated in the functioning of other boundaries in the fly genome.  For example, homie requires (among other factors) the generic boundary protein Su(Hw) for insulation and long-distance interactions (Fujioka et al. 2024).  (This is also true of nhomie: unpublished data.)  The Su(Hw) protein (like other fly polydactyl zinc finger proteins) can engage in distant interactions.  This was first shown by Sigrist and Pirrotta (Sigrist and Pirrotta 1997), who found that the su(Hw) element from the gypsy transposon can mediate long-distance regulatory interactions (PRE dependent silencing) between transgenes inserted at different sites on homologous chromosomes (trans interactions) and at sites on different chromosomes.

      The ability to mediate long-distance interactions is not unique to the su(Hw) element, or homie and nhomie.  Muller et al. (Muller et al. 1999) found that the Mcp boundary from the Drosophila BX-C is also able to engage in long-distance regulatory interactions—both PRE-dependent silencing of mini-white and enhancer activation of mini-white and yellow.  The functioning of the Mcp boundary depends upon two other generic insulator proteins, Pita and the fly CTCF homolog (Kyrchanova et al. 2017).  Like Su(Hw) both are polydactyl zinc finger proteins, and they resemble the mammalian CTCF protein in that their N-terminal domain mediates multimerization (Bonchuk et al. 2020; Zolotarev et al. 2016).  Figure 6 from Muller et el. 1999 shows PRE-dependent “pairing sensitive silencing” interactions between transgenes carrying a mini-white reporter, the Mcp and scs’ (Beaf dependent)(Hart et al. 1997) boundary elements, and a PRE closely linked to Mcp.  In this experiment flies homozygous for different transgene inserts were mated and the eye color was examined in their transheterozygous progeny.  As indicated in the figure, the strongest trans-silencing interactions were observed for inserts on the same chromosomal arm; however, transgenes inserted on the left arm of chromosome 3 can interact across the centromere with transgenes inserted on the right arm of chromosome 3. 

      Figure 5C (left) from Muller et el. 1999 shows a trans-silencing interaction between w#11.102 at 84D and w#11.16 approximately 5.8 Mb away, at 87D.  Figure 5C (right) shows a trans-silencing interaction across the centromere between w#14.29 on the left arm of chromosome 3 at 78F and w#11.102 on the right arm of chromosome 3 at 84D. The eye color phenotype of mini-white-containing transgenes is usually additive: homozygyous inserts have twice as dark eye color as the corresponding hemizygous inserts.  Likewise, in flies trans-_heterozygous for _mini-white transgenes inserted at different sites, the eye color is equivalent to the sum of the two transgenes.  This is not true when mini-white transgenes are silenced by PREs.  In the combination shown in panel A, the t_rans-_heterozygous fly has a lighter eye color than either of the parents.  In the combination in panel B, the _trans-_heterozygous fly is slightly lighter than either parent.

      As evident from the diagram in Figure 6 from Muller et el. 1999, all of the transgenes inserted on the 3rd chromosome that were tested were able to participate in long distance (>Mbs) regulatory interactions.  On the other hand, not all possible pairwise interactions are observed.  This would suggest that potential interactions depend upon the large scale (Mb) 3D folding of the 3rd chromosome.

      When the scs boundary (Zw5 dependent) (Gaszner et al. 1999) was added to the transgene to give sMws’, it further enhanced the ability of distant transgenes to find each other and pair.  All eight of the sMws’ inserts that were tested were able to interact with at least one other sMws’ insert on a different chromosome and silence mini-white.  Vazquez et al. () subsequently tagged the sMws’ transgene with LacO sequences (ps0Mws’) and visualized pairing interactions in imaginal discs.  Trans-heterozygous combinations on the same chromosome were found paired in 94-99% of the disc nuclei, while a trans-heterozygous combination on different chromosomes was found paired in 96% of the nuclei (Table 3 from Vazquez et al. 2006).  Vazquez et al. also examined a combination of four transgenes inserted on the same chromosome (two at the same insertion site, and two at different insertion sites).  In this case, all four transgenes were clustered together in 94% of the nuclei (Table 3 from Vazquez et al. 2006).  Their studies also suggest that the distant transgenes remain paired for at least several hours.  A similar experiment was done by Li et al. (Li et al. 2011), except that the transgene contained only a single boundary, Mcp or Fab-7.  While pairing was still observed in trans-heterozygotes, the frequency was reduced without scs and scs’.

      It is worth pointing out that there is no plausible mechanism in which cohesin could extrude a loop through hundreds of intervening TADs, across the centromere (ff#13.101_ßà_w#11.102: Figure 6 from Muller et el. 1999; w#14.29_ßà_w#11.02: Figure 6 from Muller et el. 1999 and 5) and come to a halt when it “encounters” Mcp containing transgenes on different homologs.  The same is true for Mcp-dependent pairing interactions in cis (Fig. 7 in Muller et al. (Muller et al. 1999)) or Mcp-dependent pairing interactions between transgenes inserted on different chromosomes (Fig. 8 in Muller et al. (Muller et al. 1999); Line 8 in Table 3 from Vazquez et al. 2006). 

      These are not the only boundaries that can engage in long-distance pairing.  Mohana et al. (Mohana et al. 2023) identified nearly 60 meta-loops, many of which appear to be formed by the pairing of TAD boundary elements.  Two examples (at 200 bp resolution from 12-16 hr embryos) are shown in Author response image 3.

      Author response image 3.

      Metaloops on the 2nd and 3rd chromosomes: circle-loops and multiple stem-loops

      One of these meta-loops (panel A) is generated by the pairing of two TAD boundaries on the 2nd chromosome.  The first boundary, blue, (indicated by blue arrow) is located at ~2,006, 500 bp between a small TAD containing the Nplp4 and CG15353 genes and a larger TAD containing 3 genes, CG33543, Obp22a and Npc2aNplp4 encodes a neuropeptide.  The functions of CG15354 and CG33543 are unknown.  Obp22a encodes an odorant binding protein, while Npc2a encodes the Niemann-Pick type C-2a protein which is involved sterol homeostasis.  The other boundary (purple: indicated by purple arrow) is located between two TADs 2.8 Mb away at 4,794,250 bp.  The upstream TAD contains the fipi gene (CG15630) which has neuronal functions in male courtship, while the downstream TAD contains CG3294, which is thought to be a spliceosome component, and schlaff (slf) which encodes a chitin binding protein.  As illustrated in the accompanying diagram, the blue boundary pairs with the purple boundary in a head-to-head orientation, generating a ~2.8 Mb loop with a circle-loop topology.  As a result of this pairing, the multi-gene (CG33543, Obp22a and Npc2a) TAD upstream of the blue boundary interacts with the CG15630 TAD upstream of the purple boundary.  Conversely the small Nplp4:CG15353 TAD downstream of the blue boundary interacts with the CG3294:slf TAD downstream of the purple boundary.  Even if one imagined that the cohesin bolo tie clip was somehow able to extrude 2.8 Mb of chromatin and then know to stop when it encountered the blue and purple boundaries, it would’ve generated a stemloop, not a circle-loop.

      The second meta-loop (panel B) is more complicated as it is generated by pairing interactions between four boundary elements.  The blue boundary (blue arrow) located ~4,801,800 bp (3L) separates a large TAD containing the RhoGEF64C gene from a small TAD containing CG7509, which encodes a predicted subunit of an extracellular carboxypeptidase.  As can be seen in the MicroC contact profile and the accompanying diagram, the blue boundary pairs with the purple boundary (purple arrow) which is located at ~7,013, 500 (3L) just upstream of the 2nd internal promoter (indicated by black arrowhead) of the Mp (Multiplexin) gene.  This pairing interaction is head-to-tail and generates a large stem-loop that spans ~2.2 Mb.  The stem-loop brings sequences upstream of the blue boundary and downstream of the purple boundary into contact (the strings below a bolo tie clip), just as was observed in the boundary bypass experiments of Muravyova et al. (Muravyova et al. 2001) and Kyrchanova et al. (Kyrchanova et al. 2008).  The physical interactions result in a box of contacts (right top) between sequences in the large RhoGEF64C TAD and sequences in a large TAD that contains an internal Mp promoter.  The second pairing interaction is between the brown boundary (brown arrow) and the green boundary (green arrow).  The brown boundary is located at ~4 805,600 bp (3L) and separates the TAD containing CG7590 from a large TAD containing CG1808 (predicted to encode an oxidoreductase) and the Dhc64C (Dynein heavy chain 64C) gene.  The green boundary is located at ~6,995,500 bp (3L), and it separates a TAD containing CG32388 and the biniou (bin) transcription factor from a TAD that contains the most distal promoter of the Mp (Multiplexin) gene (blue arrowhead).  As indicated in the diagram, the brown and green boundaries pair with each other head-to-tail, and this generates a small internal loop (and the final configuration would resemble a bolo tie with two tie clips).  This small internal loop brings the CG7590 TAD into contact with the TAD that extends from the distal Mp promoter to the 2nd internal Mp promoter.  The resulting contact profile is a rectangular box with diagonal endpoints corresponding to the paired blue:purple and brown:green boundaries.  The pairing of the brown:green boundaries also brings the TADs immediately downstream of the brown boundary and upstream of the green boundary into contact with each other, and this gives a rectangular box of interactions between the Dhc64C TAD, and sequences in the bin/CG3238 TAD.  This box is located on the lower left side of the contact map.

      Since the bin and Mp meta-loops in Author response image 3B are stem-loops, they could have been generated by “sequential” cohesin loop extrusion events.  Besides the fact that cohesin extrusion of 2 Mb of chromatin and breaking through multiple intervening TAD boundaries challenges the imagination, there is no mechanism in the cohesion loop extrusion/CTCF roadblock model to explain why cohesion complex 1 would come to a halt at the purple boundary on one side and the blue boundary on the other, while cohesin complex 2 would instead stop when it hits the brown and green boundaries.  This highlights another problem with the cohesin loop extrusion/CTCF roadblock model, namely that the roadblocks are functionally autonomous: they have an intrinsic ability to block cohesin that is entirely independent of the intrinsic ability of other roadblocks in the neighborhood.  As a result, there is no mechanism for generating specificity in loop formation.  By contrast, boundary pairing interactions are by definition non-autonomous and depend on the ability of individual boundaries to pair with other boundaries: specificity is built into the model. The mechanism for pairing, and accordingly the basis for partner preferences/specificity, are reasonably well understood.  Probably the most common mechanism in flies is based on shared binding sites for architectural proteins that can form dimers or multimers (Bonchuk et al. 2021; Fedotova et al. 2017).  Flies have a large family of polydactyl zinc finger DNA binding proteins, and as noted above, many of these form dimers or multimers and also function as TAD boundary proteins.  This pairing principle was first discovered by Kyrchanova et al. (Kyrchanova et al. 2008).  This paper also showed that orientation-dependent pairing interactions is a common feature of endogenous fly boundaries.  Another mechanism for pairing is specific protein:protein interactions between different DNA binding factors (Blanton et al. 2003).  Yet a third mechanism would be proteins that bridge different DNA binding proteins together.  The boundaries that use these different mechanisms (BX-C boundaries, scs, scs’) depend upon the same sorts of proteins that are used by homie and nhomie.  Likewise, these same set of factors reappear in one combination or another in most other TAD boundaries.  As for the orientation of pairing interactions, this is most likely determined by the order of binding sites for chromosome architectural proteins in the partner boundaries.

      …and many TADs lack focal 3D interactions between their boundaries.

      (1.3) The idea that flies differ from mammals in that they “lack” focal 3D interactions is simply mistaken.  One of the problems with drawing this distinction is that most all of the “focal 3D interactions” seen mammalian Hi-C experiments are a consequence of binning large DNA segments in low resolution restriction enzyme-dependent experiments.  This is even true in the two “high” resolution MicroC experiments that have been published (Hsieh et al. 2020; Krietenstein et al. 2020).  As illustrated above in Author response image 1, most of the “focal 3D interactions” (the dots at the apex of TAD triangles) seen with large bin sizes (1 kb and greater) disappear when the bin size is 200 bp and TADs rather than TAD neighborhoods are being visualized.

      As described in point #1.1, in the MicroC protocol, fixed chromatin is first digested to mononucloesomes by extensive MNase digestion, processed/biotinylated, and ligated to give dinucleosome-length fragments, which are then sequenced.  Regions of chromatin that are nucleosome free (promoters, enhancers, silencers, boundary elements) will typically be reduced to oligonucleotides in this procedure and will not be recovered when dinucleosome-length fragments are sequenced.  The loss of sequences from typical paired boundary elements is illustrated by the lar meta-loop shown in Author response image 4 (at 200 bp resolution).  Panels A and B show the contact profiles generated when the blue boundary (which separates two TADs that span  the Lar (Leukocyteantigen-related-like) transcription unit interacts with the purple boundary (which separates two TADs in a gene poor region ~620 kb away).  The blue and purple boundaries pair with each other head-to-head, and this pairing orientation generates yet another circle-loop.  In the circle-loop topology, sequences in the TADs upstream of both boundaries come into contact with each other, and this gives the small dark rectangular box to the upper left of the paired boundaries (Author response image 4A).  (Note that this small box corresponds to the two small TADs upstream of the blue and purple boundaries, respectively. See panel B.)  Sequences in the TADs downstream of the two boundaries also come into contact with each other, and this gives the large box to the lower right of the paired boundaries.  While this meta-loop is clearly generated by pairing interactions between the blue and purple boundaries, the interacting sequences are degraded in the MicroC protocol, and sequences corresponding to the blue and purple boundaries aren’t recovered.  This can be seen in panel B (red arrow and red arrowheads).  When a different Hi-C procedure is used (dHS-C) that captures nucleosome-free regions of chromatin that are physically linked to each other (Author response image 4C & D), the sequences in the interacting blue and purple boundaries are recovered and generate a prominent “dot” at their physical intersection (blue arrow in panel D).

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      While sequences corresponding to the blue and purple boundaries are lost in the MicroC procedure, there is at least one class of elements that engage in physical pairing interactions whose sequences are (comparatively) resistant to MNase digestion.  This class of elements includes many PREs ((Kyrchanova et al. 2018); unpublished data), the boundary bypass elements in the Abd-B region of BX-C (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018), and “tethering” elements (Batut et al. 2022; Li et al. 2023).  In all of the cases tested, these elements are bound in nuclear extracts by a large (>1000 kD) GAGA factor-containing multiprotein complex called LBC.  LBC also binds to the hsp70 and eve promoters (unpublished data).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that the LBC protects a ~120-180 bp DNA segment from MNase digestion.  It is likely that this is the reason why LBC-bound sequences can be recovered in MicroC experiments as dots when they are physically linked to each other.  One such example (based on the ChIP signatures of the paired elements) is indicated by the green arrow in panel B and D of Author response image 4.  Note that there are no dots corresponding to these two LBC elements within either of the TADs immediately downstream of the blue and purple boundaries.  Instead the sequences corresponding to the two LBC elements are only recovered when the two elements pair with each other over a distance of ~620 kb.  The fact that these two elements pair with each other is consistent with other findings which indicate that, like classical boundaries, LBC elements exhibit partner preferences.  In fact, LBC elements can sometimes function as TAD boundaries.  For example, the Fab-7 boundary has two LBC elements, and full Fab-7 boundary function can be reconstituted with just these two elements (Kyrchanova et al. 2018).

      Reviewer #2 (Public Review):

      "Chromatin Structure II: Stem-loops and circle-loops" by Ke*, Fujioka*, Schedl, and Jaynes reports a set of experiments and subsequent analyses focusing on the role of Drosophila boundary elements in shaping 3D genome structure and regulating gene expression. The authors primarily focus on the region of the fly genome containing the even skipped (eve) gene; eve is expressed in a canonical spatial pattern in fly embryos and its locus is flanked by the well-characterized neighbor of homie (nhomie) and homie boundary elements. The main focus of investigation is the orientation dependence of these boundary elements, which had been observed previously using reporter assays. In this study, the authors use Crispr/Cas9 editing followed by recombination-mediated cassette exchange to create a series of recombinant fly lines in which the nhomie boundary element is either replaced with exongenous sequence from phage 𝝀, an inversion of nhomie, or a copy of homie that has the same orientation as the endogenous homie sequence. The nhomie sequence is also regenerated in its native orientation to control for effects introduced by the transgenesis process.

      The authors then perform high-resolution Micro-C to analyze 3D structure and couple this with fluorescent and colorimetric RNA in situ hybridization experiments to measure the expression of eve and nearby genes during different stages of fly development. The major findings of these experiments are that total loss of boundary sequence (replacement with 𝝀 DNA) results in major 3D structure changes and the most prominent observed gene changes, while inversion of the nhomie boundary or replacement with homie resulted in more modest effects in terms of 3D structure and gene expression changes and a distinct pattern of gene expression change from the 𝝀 DNA replacement. As the samples in which the nhomie boundary is inverted or replaced with homie have similar Micro-C profiles at the eve locus and show similar patterns of a spurious gene activation relative to the control, the observed effects appear to be driven by the relative orientation of the nhomie and homie boundary elements to one another.

      Collectively, the findings reported in the manuscript are of broad interest to the 3D genome field. Although extensive work has gone into characterizing the patterns of 3D genome organization in a whole host of species, the underlying mechanisms that structure genomes and their functional consequences are still poorly understood. The perhaps best understood system, mechanistically, is the coordinated action of CTCF with the cohesin complex, which in vertebrates appears to shape 3D contact maps through a loop extrusion-pausing mechanism that relies on orientation-dependent sequence elements found at the boundaries of interacting chromatin loops.

      (2.1) The notion that mammalian genome is shaped in 3D by the coordinate action of cohesin and CTCF has achieved the status of dogma in the field of chromosome structure in vertebrates.  However, as we have pointed out in #1.1, the evidence supporting this dogma is far from convincing.  To begin with, it is based on low resolution Hi-C experiments that rely on large bin sizes to visualize so-called “TADs.”  In fact, the notion that cohesin/CTCF are responsible on their own for shaping the mammalian 3D genome appears to be a result of mistaking a series of forests for the actual trees that populate each of the forests.

      As illustrated in Author response image 1 above, the “TADs” that are visualized in these low resolution data sets are not TADs at all, but rather TAD neighborhoods consisting of several dozen or more individual TADs.  Moreover, the “interesting” features that are evident at low resolution (>1 kb)—the dots and stripes—largely disappear at resolutions appropriate for visualizing individual TADs (~200 bp).

      In Goel et al. 2023, we presented data from one of the key experiments in Goel et al. (Goel et al. 2023).  In this experiment,  the authors used RCMC to generate high resolution (~250 bp) MicroC contact maps before and after Rad21 depletion.  Contrary to dogma, Rad21 depletion has absolutely no effect on TADs in a ~250 kb DNA segment—and these TADs look very much like the TADs we observe in the Drosophila genome, in particular in the Abd-B region of BX-C that is thought to be assembled into a series of circle-loops (see Fig. 2B).

      While Goel et al. (Goel et al. 2023) observed no effect of Rad21 depletion on TADs, they found that loss of Rad21 disturbs long-distance (but not short-distance) contacts in large TAD neighborhoods when their RCMC data set is visualized using bin sizes of 5 kb and I kb.  This is shown in Author response image 2.  The significance of this finding is, however, uncertain.  It could mean that the 3D organization of large TAD neighborhoods have a special requirement for cohesin activity.  On the other hand, since cohesin functions to hold sister chromosomes together after replication until they separate during mitosis (and might also participate in mitotic condensation), it is also possible that the loss of long-range contacts in large TAD neighborhoods when Rad21 is depleted is simply a reflection of this particular activity.  Further studies will be required to address these possibilities.

      As for CTCF: a careful inspection of the ChIP data in Goel et al. 2023 indicates that CTCF is not found at each and every TAD boundary.  In fact, the notion that CTCF is the be-all and end-all of TAD boundaries in mammals is truly hard to fathom.  For one, the demands for specificity in TAD formation (and in regulatory interactions) are likely much greater than those in flies, and specificity can’t be generated by a single DNA binding protein.  For another, several dozen chromosomal architectural proteins have already been identified in flies.  This means that (unlike what is thought to be true in mammals) it is possible to use a combinatorial mechanism to generate specificity in, for example, the long distance interactions in RFig 6 and 7.  As noted in #2.1 above, many of the known chromosomal architectural proteins in flies are polydactyl zinc finger proteins (just like CTCF).  There are some 200 different polydactyl zinc finger proteins in flies, and the function of only a hand full of these is known at present.  However, it seems likely that a reasonable fraction of this class of DNA binding proteins will ultimately turn out to have an architectural function of some type (Bonchuk et al. 2021; Fedotova et al. 2017).  The number of different polydactyl zinc finger protein genes in mammals is nearly 3 times that of flies.  It is really possible that of these, only CTCF is involved in shaping the 3D structure of the mammalian genome?

      Despite having a CTCF paralog and cohesin, the Drosophila genome does not appear to be structure by loop extrusion-pausing. The identification of orientation-dependent elements with pronounced structural effects on genome folding thus may shed light on alternative mechanisms used to regulated genome structure, which in turn may yield insights into the significance of particular folding patterns.

      (2.2) Here we would like to draw the reviewer’s and reader’s attention to Author response image 3, which shows that orientation-dependent pairing interactions have a significant impact on physical interactions between different sequences.  We would also refer the reader to two other publications.  One of these is Kyrchanova et al. (Kyrchanova et al. 2008), which was the first to demonstrate that orientation of pairing interactions matters.  The second is Fujioka et al. (Fujioka et al. 2016), which describes experiments indicating that nhomie and homie pair with each other head-to-tail and with themselves head-to-head.

      On the whole, this study is comprehensive and represents a useful contribution to the 3D genome field. The transgenic lines and Micro-C datasets generated in the course of the work will be valuable resources for the research community. Moreover, the manuscript, while dense in places, is generally clearly written and comprehensive in its description of the work. However, I have a number of comments and critiques of the manuscript, mainly centering on the framing of the experiments and presentation of the Micro-C results and on manner in which the data are analyzed and reported. They are as follows:

      Major Points:

      (1) The authors motivate much of the introduction and results with hypothetical "stem loop" and "circle loop" models of chromosome confirmation, which they argue are reflected in the Micro-C data and help to explain the observed ISH patterns. While such structures may possibly form, the support for these specific models vs. the many alternatives is not in any way justified. For instance, no consideration is given to important biophysical properties such as persistence length, packing/scaling, and conformational entropy. As the biophysical properties of chromatin are a very trafficked topic both in terms of experimentation and computational modeling and generally considered in the analysis of chromosome conformation data, the study would be strengthened by acknowledgement of this body of work and more direct integration of its findings.

      (2.3) The reviewer is not correct in claiming that “stem-loops” and “circle-loops” are “hypothetical.”  There is ample evidence that both types of loops are present in eukaryotic genomes, and that loop conformation has significant readouts in terms of not only the physical properties of TADs but also their functional properties.  Here we would draw the reviewer’s attention to Author response image 3 and Author response image 4 for examples of loops formed by the orientation-dependent pairing of yet other TAD boundary elements.  As evident from the MicroC data in these figures, circle-loops and stem-loops have readily distinguishable contact patterns.  The experiments in Fujioka et al. (Fujioka et al. 2016) demonstrate that homie and nhomie pair with each other head-to-tail, while they pair with themselves head-to-head.  The accompany paper (Bing et al. 2024) also provides evidence that loop topology is reflected both in the pattern of activation of reporters and in the MicroC contact profiles.  We would also mention again Kyrchanova et al. (Kyrchanova et al. 2008), who were the first to report orientation-dependent pairing of endogenous fly boundaries.

      At this juncture it would premature to try to incorporate computational modeling of chromosome conformation in our studies.  The reason is that the experimental foundations that would be essential for building accurate models are lacking.  As should be evident from RFigs. 1-3 above, studies on mammalian chromosomes are simply not of high enough resolution to draw firm conclusions about chromosome conformation: in most studies only the forests are visible.  While the situation is better in flies, there are still too many unknown.  As just one example, it would be important to know the orientation of the boundary pairing interactions that generate each TAD.  While it is possible to infer loop topology from how TADs interact with their neighbors (a plume versus clouds), a conclusive identification of stem- and circle-loops will require a method to unambiguously determine whether a TAD boundary pairs with its neighbor head-to-head or headto-tail.

      (2) Similar to Point 1, while there is a fair amount of discussion of how the observed results are or are not consistent with loop extrusion, there is no discussion of the biophysical forces that are thought to underly compartmentalization such as block-polymer co-segregation and their potential influence. I found this absence surprising, as it is generally accepted that A/B compartmentalization essentially can explain the contact maps observed in Drosophila and other non-vertebrate eukaryotes (Rowley, ..., Corces 2017; PMID 28826674). The manuscript would be strengthened by consideration of this phenomenon.

      (2.4) Compartments in mammals have typically been identified and characterized using lowresolution data sets, and these studies have relied on visualizing compartments using quite large bin sizes (>>1 kb).  Our experiments have nothing to do with the large-scale compartments seen in these Hi-C experiments.  Instead, we are studying the properties of individual TADs: how TADs are formed, the relationship between TAD topology and boundary:boundary pairing, and the impact of TAD topology on interactions between TADs in the immediate neighborhood.  There is no evidence to date that these large compartments or “block polymer co-segregation” have a) any impact on the properties of individual boundary elements, b) have a role in determining which boundary elements actually come together to form a given TAD, c) impact the orientation of the interactions between boundaries that generate the TAD or d) determine how TADs tend to interact with their immediate neighbors.  

      In more recent publications (c.f., Harris et al. 2023) compartments have shrunk in size and instead of being units of several hundred kb, the median length of the “compartmental” unit in mammalian cells is about12 kb. This is not too much different from the size of fly TADs.  However, the available evidence does not support the idea that block polymer co-segregation/co-repulsion drive the TAD:TAD interactions seen in MicroC experiments.  For example, according to this “micro-compartment” model, the specific patterns of interaction between TADs in the CG3294 meta-loop in Author response image 3 would be driven by block polymer co-segregation and co-repulsion. In this model, the TAD upstream of the blue boundary (which contains CG33543, the odorant binding protein gene Obp22a and the Npc2a gene which encodes a protein involved in sterol homeostasis) would share the same chromatin state/biophysical properties as the TAD upstream of the purple boundary, which has the fipi gene. While it is true that CG33543, Obp22a and also the fipi gene are not expressed in embryos, Npc2a is expressed at high levels during embryogenesis, yet it is part of the TAD that interacts with the fipi TAD.  The TAD downstream of the blue boundary contains CG15353 and Nplp4 and it interacts with the TAD downstream of the purple boundary which contains CG3294 and slfCG15353 and Nplp4 are not expressed in the embryo and as such should share a compartment with a TAD that is also silent. However, slf is expressed at a high level in 1216 hr embryos, while CG3294 is expressed at a low level.  In neither case would one conclude that the TADs upstream and downstream of the blue and purple boundaries, respectively, interact because of shared chromatin/biophysical states that drive block polymer co-segregation corepulsion. 

      One might also consider several gedanken experiments involving the long-range interactions that generate the CG3294 meta-loop in Author response image 3.    According to the micro-compartment model the patchwork pattern of crosslinking evident in the CG3294 meta-loop arises because the interacting  TADs share the same biochemical/biophysical properties, and this drives block polymer cosegregation and co-repulsion.  If this model is correct, then this patchwork pattern of TAD:TAD interactions would remain unchanged if we were to delete the blue or the purple boundary.  However, given what we know about how boundaries can find and pair with distant boundaries (c.f., Figure 6 from Muller et el. 1999 and the discussion in #1.2), the result of these gedanken experiments seem clear: the patchwork pattern shown in Author response image 3A will disappear.  What would happen if we inverted the blue or the purple boundary? Would the TAD containing CG33543, Obp22a and Npc2a still interact with fipi as would be expected from the compartment model?  Or would the pattern of interactions flip so that the CG33543, Obp22a and Npc2a TAD interacts with the TAD containing CG3294 and slf?  Again we can anticipate the results based on previous studies: the interacting TADs will switch when the CG3294 meta-loop is converted into a stem-loop.  If this happened, the only explanation possible in the compartment model is that the chromatin states change when the boundary is inverted so that TAD upstream of blue boundary now shares the same chromatin state as the TAD downstream of the purple boundary, while the TAD downstream of the blue boundary shares same state as the TAD upstream of the purple boundary.  However, there is no evidence that boundary orientation per se can induce a complete switch in “chromatin states” as would be required in the compartment model. 

      While we have not done these experimental manipulations with the CG3294 meta-loop, an equivalent experiment was done in Bing et al. (Bing et al. 2024).  However, instead of deleting a boundary element, we inserted a homie boundary element together with two reporters (gfp and LacZ) 142 kb away from the eve TAD.  The result of this gedanken “reverse boundary deletion” experiment is shown in Author response image 5.  Panel A shows the MicroC contact profile in the region spanning the transgene insertion site and the eve TAD in wild type (read “deletion”) NC14 embryos.  Panel B shows the MicroC contact profile from 12-16 hr embryos carrying the homie dual reporter transgene inserted at -142 kb.  Prior to the “deletion”, the homie element in the transgene pairs with nhomie and homie in the eve TAD and this generates a “mini-metaloop.”  In this particular insert, the homie boundary in the transgene (red arrow) is “pointing” in the opposite orientation from the homie boundary in the eve TAD (red arrow).  In this orientation, the pairing of the transgene homie with eve nhomie/homie brings the LacZ reporter into contact with sequences in the eve TAD.  Since a mini-metaloop is formed by homie_à _nhomie/homie pairing, sequences in TADs upstream and downstream of the transgene insert interact with sequences in TADs close to the eve TAD (Author response image 5B).  Taken together these interactions correspond to the interaction patchwork that is typically seen in “compartments” (see boxed region and inset).  If this patchwork is driven as per the model, by block polymer co-segregation and co-repulsion, then it should still be present when the transgene is deleted.  However, panel A shows that the interactions linking the transgene and the sequences in TADs next to the transgene to eve and TADs next to eve disappear when the homie boundary (plus transgene) is “deleted” in wild type flies.

      Author response image 5.

      Boundary deletion and compartments

      A second experiment would be to invert the homie boundary so that instead of pointing away from eve it points towards eve.  Again, if the compartmental patchwork is driven by block polymer co-segregation and co-repulsion, inverting the homie boundary in the transgene should have no effect on the compartmental contact profile.  Inspection of Fig. 7 in Bing et al. (Bing et al. 2024) will show that this prediction doesn’t hold either.  When homie is inverted, sequences in the eve TAD interact with the gfp reporter not the LacZ reporter.  In addition, there are corresponding changes in how sequences in TADs to either side of eve interact with sequences to either side of the transgene insert.  

      Yet another “test” of compartments generated by block polymer co-segregation/co-repulsion is provided by the plume above the eve volcano triangle.  According to the compartment model, sequences in TADs flanking the eve locus form the plume above the eve volcano triangle because their chromatin shares properties that drive block polymer co-segregation.  These same properties result in repulsive interactions with chromatin in the eve TAD, and this would explain why the eve TAD doesn’t crosslink with its neighbors.  If the distinctive chromatin properties of eve and the neighboring TADs drive block polymer co-segregation and co-repulsion, then inverting the nhomie boundary or introducing homie in the forward orientation should have absolutely no effect on the physical interactions between chromatin in the eve TAD and chromatin in the neighboring TADs.  However, Figures 4 and 6 in this paper indicate that boundary pairing orientation, not block polymer co-segregation/co-repulsion, is responsible for forming the plume above the eve TAD. Other findings also appear to be inconsistent with the compartment model. (A) The plume topping the eve volcano triangle is present in NC14 embryos when eve is broadly expressed (and potentially active throughout the embryo).  It is also present in 12-16 hr embryos when eve is only expressed in a very small subset of cells and is subject to PcG silencing everywhere else in the embryo.  B) According to the compartment model the precise patchwork pattern of physical interactions should depend upon the transcriptional program/chromatin state that is characteristic of a particular developmental stage or cell type.  As cell fate decisions are just being made during NC14 one might expect that most nuclei will share similar chromatin states throughout much of the genome.  This would not be true for 12-16 hr embryos.  At this stage the compartmental patchwork would be generated by a complex mixture of interactions in cells that have quite different transcriptional programs and chromatin states.  In this case, the patchwork pattern would be expected to become fuzzy as a given chromosomal segment would be in compartment A in one group of cells and in compartment B in another.   Unlike 12-16 hr embryos,  larval wing discs would be much more homogeneous and likely give a distinct and relatively well resolved compartmental pattern. We’ve examined the compartment patchwork of the same chromosomal segments in NC14 embryos, 12-16 hr embryos and larval wing disc cells.  While there are some differences (e.g., changes in some of the BX-C TADs in the wing disc sample) the compartmental patchwork patterns are surprisingly similar in all three cases. Nor is there any “fuzziness” in the compartmental patterns evident in 12-16 hr embryos, despite the fact that there are many different cell types at this stage of development.  C) TAD interactions with their neighbors and compartmental patchworks are substantially suppressed in salivary gland polytene chromosomes.  This would suggest that features of chromosome structure might be the driving force behind many of the “compartmental” interactions as opposed to distinct biochemical/biophysical of properties of small chromosomal segments that drive polymer co- segregation/co-repulsion.  

      (3) The contact maps presented in the study represent many cells and distinct cell types. It is clear from single-cell Hi-C and multiplexed FISH experiments that chromosome conformation is highly variable even within populations of the same cell, let alone between cell types, with structures such as TADs being entirely absent at the single cell level and only appearing upon pseudobulking. It is difficult to square these observations with the models of relatively static structures depicted here. The authors should provide commentary on this point.

      (2.5) As should be evident from Author response image 1, single-cell Hi-C experiments would not provide useful information about the physical organization of individual TADs, TAD boundaries or how individual TADs interact with their immediate neighbors.  In addition, since they capture only a very small fraction of the possible contacts within and between TADs, we suspect that these single-cell studies aren’t likely to be useful for making solid conclusions about TAD neighborhoods like those shown in Author response image 1 panels A, B, C and D, or Author response image 2.  While it might be possible to discern relatively stable contacts between pairs of insulators in single cells with the right experimental protocol, the stabilities/dynamics of these interactions may be better judged by the length of time that physical interactions are seen to persist in live imaging studies such as Chen et al. (2018), Vazquez et al. (2006) and Li et al. (2011).

      The in situ FISH data we’ve seen also seems problematic in that probe hybridization results in a significant decondensation of chromatin.  For two probe sets complementary to adjacent ~1.2 kb DNA sequences, the measured center-to-center distance that we’ve seen was ~110 nM.  This is about 1/3rd the length that is expected for a 1.2 kb naked DNA fragment, and about 1.7 times larger than that expected for a beads-on-a-string nucleosome array (~60 nM).  However, chromatin is thought to be compacted into a 30 nM fiber, which is estimated to reduce the length of DNA by at least another ~6 fold.  If this estimate is correct, FISH hybridization would appear to result in a ~10 fold decompaction of chromatin.  A decompaction of this magnitude would necessarily be followed by a significant distortion in the actual conformation of chromatin loops.

      (4) The analysis of the Micro-C data appears to be largely qualitative. Key information about the number of reads sequenced, reaps mapped, and data quality are not presented. No quantitative framework for identifying features such as the "plumes" is described. The study and its findings would be strengthened by a more rigorous analysis of these rich datasets, including the use of systematic thresholds for calling patterns of organization in the data.

      Additional information on the number of reads and data quality have been included in the methods section. 

      (5) Related to Point 4, the lack of quantitative details about the Micro-C data make it difficult to evaluate if the changes observed are due to biological or technical factors. It is essential that the authors provide quantitative means of controlling for factors like sampling depth, normalization, and data quality between the samples.

      In our view the changes in the MicroC contact patterns for the eve locus and its neighbors when the nhomie boundary is manipulated are not only clear cut and unambiguous but are also readily evident in the Figs that are presented in the manuscript.  If the reviewer believes that there aren’t significant differences between the MicroC contact patterns for the four different nhomie replacements, it seems certain that they would also remain unconvinced by a quantitative analysis.

      The reviewer also suggests that biological and/or technical differences between the four samples could account for the observed changes in the MicroC patterns for the eve TAD and its neighbors.  If this were the case, then similar changes in MicroC patterns should be observed elsewhere in the genome.  Since much of the genome is analyzed in these MicroC experiments there is an abundance of internal controls for each experimental manipulation of the nhomie boundary.  For two of the nhomie replacements, nhomie reverse and homie forward, the plume above the eve volcano triangle is replaced by clouds surrounding the eve volcano triangle.  If these changes in the eve MicroC contact patterns are due to significant technical (or biological) factors, we should observe precisely the same sorts of changes in TADs elsewhere in the genome that are volcano triangles with plumes.   Author response image 6 shows the MicroC contact pattern for several genes in the Antennapedia complex.  The deformed gene is included in a TAD which, like eve, is a volcano triangle topped by a plume.  A comparison of the deformed MicroC contact patterns for nhomie forward (panel B) with the MicroC patterns for nhomie reverse (panel C) and homie forward (panel D) indicates that while there are clearly technical differences between the samples, these differences do not result in the conversion of the deformed plume into clouds as is observed for the eve TAD.  The MicroC patterns elsewhere in Antennapedia complex are also very similar in all four samples.  Likewise, comparisons of regions elsewhere in the fly genome indicate that the basic contact patterns are similar in all four samples.   So while there are technical differences which are reflected in the relative pixel density in the TAD triangles and the LDC domains, these differences do not result in converting plumes into clouds nor do the alter the basic patterns of TAD triangles and LDC domains.  As for biological differences— the embryos in each sample are at roughly the same developmental stage and were collected and processed using the same procedures. Thus, the biological factors that could reasonably be expected to impact the organization of specific TADs (e.g., cell type specific differences) are not going to impact the patterns we see in our experiments. 

      Author response image 6.

      (6) The ISH effects reported are modest, especially in the case of the HCR. The details provided for how the imaging data were acquired and analyzed are minimal, which makes evaluating them challenging. It would strengthen the study to provide much more detail about the acquisition and analysis and to include depiction of intermediates in the analysis process, e.g. the showing segmentation of stripes.

      The imaging analysis is presented in Fig. 5 is just standard confocal microscopy.  Individual embryos were visualized and scored.  An embryo in which stripes could be readily detected was scored as ‘positive’ while an embryo in which stripes couldn’t be detected was scored as ‘negative.’   

      Recommendations for the authors:

      Editor comments:

      It was noted that the Jaynes lab previously published extensive genetic evidence to support the stem loop and circle loop models of Homie-Nhomie interactions (Fujioka 2016 Plos Genetics) that were more convincing than the Micro-C data presented here in proof of their prior model. Maybe the authors could more clearly summarize their prior genetic results to further try to convince the reader about the validity of their model.

      Reviewer #1 (Recommendations For The Authors):

      Below, I list specific comments to further improve the manuscript for publication. Most importantly, I recommend the authors tone down their proposal that boundary pairing is a universal TAD forming mechanism.

      (1) The title is cryptic.

      (2) The second sentence in the abstract is an overstatement: "In flies, TADs are formed by physical interactions between neighboring boundaries". Hi-C and Micro-C studies have not provided evidence that most TADs in Drosophila show focal interactions between their bracketing boundaries. The authors rely too strongly on prior studies that used artificial reporter transgenes to show that multimerized insulator protein binding sites or some endogenous fly boundaries can mediate boundary bypass, as evidence that endogenous boundaries pair.

      Please see responses #1.1 and #1.3 and figures Author response image 1 and Author response image 3.  Note that using dHS-C, most TADs that we’ve looked at so far are topped by a “dot” at their apex.

      (3) Line 64: the references do not cite the stated "studies dating back to the '90's'".

      The papers cited for that sentence are reviews which discussed the earlier findings.  The relevant publications are cited at the appropriate places in the same paragraph.  

      (4) Line 93: "On the other hand, while boundaries have partner preferences, they are also promiscuous in their ability to establish functional interactions with other boundaries." It was unclear what is meant here.

      Boundaries that a) share binding sites for proteins that multimerized, b) have binding sites for proteins that interact with each other, or c) have binding sites for proteins that can be bridged by a third protein can potentially pair with each other.  However, while these mechanisms enable promiscuous pairing interactions, they will also generate partner preferences (through a greater number of a, b and/or c).

      (5) It could be interesting to discuss the fact that it remains unclear whether Nhomie and Homie pair in cis or in trans, given that homologous chromosomes are paired in Drosophila.

      The studies in Fujioka et al. (Fujioka et al. 2016) show that nhomie and homie can pair both in cis and in trans.  Given the results described in #1.2, we imagine that they are paired in both cis and trans in our experiments.

      (6) Line 321: Could the authors further explain why they think that "the nhomie reverse circle-loop also differs from the nhomie deletion (λ DNA) in that there is not such an obvious preference for which eve enhancers activate expression"?

      The likely explanation is that the topology/folding of the altered TADs impacts the probability of interactions between the various eve enhancers and the promoters of the flanking genes.  

      (7) The manuscript would benefit from shortening the long Discussion by avoiding repeating points described previously in the Results.

      (8) Line 495: "If, as seems likely, a significant fraction of the TADs genome-wide are circle loops, this would effectively exclude cohesin-based loop extrusion as a general mechanism for TAD formation in flies". The evidence provided in this manuscript appears insufficient to discard ample evidence from multiple laboratories that TADs form by compartmentalization or loop extrusion. Multiple laboratories have, for example, demonstrated that cohesin depletion disrupts a large fraction of mammalian TADs. 

      Points made here and in #9 have been responded to in #1.1, #2.1 and #2.4 above.  We would suggest that the evidence for loop extrusion falls short of compelling (as it is based on the analysis of TAD neighborhoods, not TADs—that is forests, not trees) and given the results reported in Goel et al. (in particular Fig. 4 and Sup Fig. 8) is clearly suspect. This is not to mention the fact that cohesin loop-extrusion can’t generate circle-loops TADs, yet circle-loops clearly exist.  Likewise, as discussed in #2.4, it is not clear to us that the shared chromatin states, polymer co-segregation and co-repulsion account for the compartmental patchwork patterns of TAD;TAD interactions. The results from the  experimental manipulations in this paper and the accompanying paper, together with studies by others (e.g., Kyrchanova et al. (Kyrchanova et al. 2008), Mohana et al. (Mohana et al. 2023) would also seem to be at odds with the model for compartments as currently formulated.  

      The unique properties of Nhomie and Homie, namely the remarkable specificity with which they physically pair over large distances (Fujioka et al. 2016) may rather suggest that boundary pairing is a phenomenon restricted to special loci. Moreover, it has not yet been demonstrated that Nhomie or Homie are also able to pair with the TAD boundaries on their left or right, respectively.

      Points made here were discussed in detail in #1.2.  As described in detail in #1.2, It is not the case that nhomie and homie are in “unique” or “special.”  Other fly boundaries can do the same things.  As for whether nhomie and homie pair with their neighbors:  We haven’t done transgene experiments (e.g., testing by transvection or boundary bypass).  Likewise, in MicroC experiments there are no obvious dots at the apex of the neighboring TADs that would correspond to nhomie pairing with the neighboring boundary to the left and homie pairing with the neighboring boundary to the right. However, this is to be expected. As we discussed in in #1.3 above, only MNase resistant elements will generate dots in standard MicroC experiments.  On the other hand, when boundary:boundary interactions are analyzed by dHS-C (c.f., Author response image 4), there are dots at the apex of both neighboring TADs.  This would be direct evidence that nhomie pairs with the neighboring boundary to the left and homie pairs with the neighboring boundary to the right.

      (9) The comment in point 8 also applies to the concluding 2 sentences (lines 519-524) of the Discussion.

      See response to 8 above. Otherwise, the concluding sentences are completely accurate. Validation of the cohesin loop extrusion/CTCF roadblock model will required demonstrating a) that all TADs are either stem-loops or unanchored loops and b) that TAD endpoints are always marked by CTCF. 

      The likely presence of circle-loops and evidence that TAD boundaries that don’t have CTCF (c.f.,Goel et al. 2023) already suggests that this model can’t (either fully or not all) account for TAD formation in mammals. 

      (10) Figs. 3 and 6: It would be helpful to add the WT screenshot in the same figure, for direct comparison.

      It is easy enough to scroll between Figs-especially since nhomie forward looks just like WT.

      (11) Fig. 6: It would be helpful to show a cartoon view of a circle loop to the right of the Micro-C screenshot, as was done in Fig. 3.

      Good idea.   Added to the Fig.

      (12) Fig. 5: It would be helpful to standardize the labelling of the different genotypes throughout the figures and panels ("inverted" versus "reverse" versus an arrow indicating the direction).

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) The Micro-C data does not appear to be deposited in an appropriate repository. It would be beneficial to the community to make these data available in this way.

      This has been done.

      (2) Readers not familiar with Drosophila development would benefit from a gentle introduction to the stages analyzed and some brief discussion on how the phenomenon of somatic homolog pairing might influence the study, if at all.

      We included a rough description the stages that were analyzed for both the in situs and MicroC. We thought that an actual description of what is going on at each of the stages wasn’t necessary as the process of development is not a focus of this manuscript.  In other studies, we’ve found that there are only minor differences in MicroC patterns between the blastoderm stage and stage 12-16 embryos.  While these minor differences are clearly interesting, we didn’t discuss them in the text.   In all of experiments chromosomes are likely to be paired.  In NC14 embryos (the stage for visualizing eve stripes and the MicroC contact profiles in Fig. 2) replication of euchromatic sequences is thought to be quite rapid.  While homolog pairing is incomplete at this stage, sister chromosomes are paired.  In stage 12-16 embryos, homologs will be paired and if the cells are arrested in G2, then sister chromosome will also be paired.  So in all of experiments, chromosomes (sisters and/or homologs) are paired. However, since we don’t have examples of unpaired chromosomes, our experiments don’t provide any info on how chromosome pairing might impact MicroC/expression patterns.

      (3) "P > 0.01" appears several times. I believe the authors mean to report "P < 0.01".

      Fixed.  

      References for Response

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bing X, Ke W, Fujioka M, Kurbidaeva A, Levitt S, Levine M, Schedl P, Jaynes JB. 2024. Chromosome structure i: Loop extrusion or boundary:Boundary pairing? eLife.

      Blanton J, Gaszner M, Schedl P. 2003. Protein:Protein interactions and the pairing of boundary elements in vivo. Genes Dev. 17(5):664-675.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zincfinger-associated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk A, Kamalyan S, Mariasina S, Boyko K, Popov V, Maksimenko O, Georgiev P. 2020. Nterminal domain of the architectural protein ctcf has similar structural organization and ability to self-association in bilaterian organisms. Sci Rep. 10(1):2677.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):47-58.

      Fujioka M, Ke W, Schedl P, Jaynes JB. 2024. The homie insulator has sub-elements with different insulating and long-range pairing properties. bioRxiv. 2024.02.01.578481.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis‐regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Gaszner M, Vazquez J, Schedl P. 1999. The zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes Dev. 13(16):2098-2107.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Harris HL, Gu H, Olshansky M, Wang A, Farabella I, Eliaz Y, Kalluchi A, Krishna A, Jacobs M, Cauer G et al. 2023. Chromatin alternates between a and b compartments at kilobase scale for subgenic organization. Nat Commun. 14(1):3303.

      Hart CM, Zhao K, Laemmli UK. 1997. The scs' boundary element: Characterization of boundary element-associated factors. Mol Cell Biol. 17(2):999-1009.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539-553.e538.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Chetverina D, Maksimenko O, Kullyev A, Georgiev P. 2008. Orientation-dependent interaction between drosophila insulators is a property of this class of regulatory elements. Nucleic Acids Res. 36(22):7019-7028.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442. Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P. 2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab-7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Kyrchanova O, Zolotarev N, Mogila V, Maksimenko O, Schedl P, Georgiev P. 2017. Architectural protein pita cooperates with dctcf in organization of functional boundaries in bithorax complex. Development. 144(14):2663-2672.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Muravyova E, Golovnin A, Gracheva E, Parshikov A, Belenkaya T, Pirrotta V, Georgiev P. 2001. Loss of insulator activity by paired su(hw) chromatin insulators. Science. 291(5503):495498.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Zolotarev N, Fedotova A, Kyrchanova O, Bonchuk A, Penin AA, Lando AS, Eliseeva IA, Kulakovskiy IV, Maksimenko O, Georgiev P. 2016. Architectural proteins pita, zw5,and zipic contain homodimerization domain and support specific long-range interactions in drosophila. Nucleic Acids Res. 44(15):7228-7241.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript titled "Vangl2 suppresses NF-κB signaling and ameliorates sepsis by targeting p65 for NDP52-mediated autophagic degradation" by Lu et al, the authors show that Vangl2, a planner cell polarity component, plays a direct role in autophagic degradation of NFkB-p65 by facilitating its ubiquitination via PDLIM2 and subsequent recognition and autophagic targeting via the autophagy adaptor protein NDP52. Conceptually it is a wonderful study with excellent execution of experiments and controls. The concerns with the manuscript are mainly on two counts - First issue is the kinetics of p65 regulation reported here, which does not fit into the kinetics of the mechanism proposed here, i.e., Vangl2-mediated ubiquitination followed by autophagic degradation of p65. The second issue is more technical- an absolute lack of quantitative analyses. The authors rely mostly on visual qualitative interpretation to assess an increase or decrease in associations between partner molecules throughout the study. While the overall mechanism is interesting, the authors should address these concerns as highlighted below:

      Major points:

      (1) Kinetics of p65 regulation by Vangl2: As mentioned above, authors report that LPS stimulation leads to higher IKK and p65 activation in the absence of Vangl2. The mechanism of action authors subsequently work out is that- Vangl2 helps recruit E3 ligase PDLIM to p65, which causes K63 ubiquitination, which is recognised by NDP52 for autophagic targeting. Curiously, peak p65 activation is achieved within 30 minutes of LPS stimulation. The time scale of all other assays is way longer. It is not clear that in WT cells, p65 could be targeted to autophagic degradation in Vangl2 dependent manner within 30 minutes. The HA-Myc-Flag-based overexpression and Co-IP studies do confirm the interactions as proposed. However, they do not prove that this mechanism was responsible for the Vangl2-mediated modulation of p65 activation upon LPS stimulation. Moreover, the Vangl2 KO line also shows increased IKK activation. The authors do not show the cause behind increased IKK activation, which in itself can trigger increased p65 phosphorylation.

      We thank the reviewer for this valuable suggestion.

      Indeed, we agreed with the reviewer that peak p65 activation is achieved within 30 minutes of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only p65 phosphorylation was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) The other major concern is regarding the lack of quantitative assessments. For Co-IP experiments, I can understand it is qualitative observation. However, when the authors infer that there is an increase or decrease in the association through co-IP immunoblots, it should also be quantified, especially since the differences are quite marginal and could be easily misinterpreted.

      We are grateful to the reviewer for this suggestion. The quantitative analysis has been updated in the revised version.

      (3) Figure 4E and F: It is evident that inhibiting Autolysosome (CQ or BafA1) or autophagy (3MA) led to the recovery of p65 levels and inducing autophagy by Rapamycin led to faster decay in p65 levels. Did the authors also note/explore the possibility that Vangl2 itself may be degraded via the autophagy pathway? IB of WCL upon CQ/BAF/3MA or upon Rapa treatment does indicate the same. If true, how would that impact the dynamics of p65 activation?

      We thank the reviewer for this question. Previous studies have shown that Vangl2 is primarily degraded by the proteasome pathway, rather than by the autolysosomal pathway (doi: 10.1126/sciadv.abg2099; doi: 10.1038/s41598-019-39642-z). In our experiments, Vangl2 recruits E3 ligase PDLIM2 to enhance K63-linked ubiquitination on p65, which serves as a recognition signal for cargo receptor NDP52-mediated selective autophagic degradation. Vangl2 facilitated the interaction between p65 and NDP52, yet itself did not undergo significant autophagic degradation.

      (4) Autophagic targeting of p65 should also be shown through alternate evidence, like microscopy etc., in the LPS-stimulated WT cells.

      We thank the reviewer for this suggestion. We have added the data (co-localization of p65 and LC3 was detected by immunofluorescence) in the revised version (Fig. S4 H in the revised manuscript). (Page 9, lines 267-268)

      Reviewer #2 (Public Review):

      Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, mediates cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, Vangl2 was shown to interact with the autophagy regulator p62, and indeed, autophagic degradation limits the activity of inflammatory mediators such as p65/NF-κB. However, if Vangl2, per se, contributes to restraining aberrant p65/NF-kB activity remains unclear.

      In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitates the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes cause selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity.

      As such, the manuscript presents a substantial body of interesting work and a novel mechanism of NF-κB control. If found true, the proposed mechanism may expand therapeutic opportunities for inflammatory diseases. However, the current draft has significant weaknesses that need to be addressed.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      Specific comments

      (1) Vangl2 deficiency did not cause a discernible increase in the cellular level of total endogenous p65 (Fig 2A and Fig 2B) but accumulated also phosphorylated IKK.

      Even Fig 4D reveals that Vangl2 exerts a rather modest effect on the total p65 level and the figure does not provide any standard error for the quantified data. Therefore, these results do not fully support the proposed model (Figure 7) - this is a significant draw back. Instead, these data provoke an alternate hypothesis that Vangl2 could be specifically mediating autophagic removal of phosphorylated IKK and phosphorylated IKK, leading to exacerbated inflammatory NF-κB response in Vangl2-deficient cells. One may need to use phosphorylation-defective mutants of p65, at least in the over-expression experiments, to dissect between these possibilities.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Indeed, we agreed with the reviewer that Vangl2 deficiency did not cause a discernible increase in the cellular level of total p65 after a short time of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only phosphorylation of p65 and total endogenous p65 was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) Similarly, the stimulation time scale in Fig 4D was extended, and it was demonstrated that p65 was more stable in Vangl2-deficient cells.

      3) Moreover, we constructed phosphorylation-defective mutants of p65 (S536A), and found that Vangl2 could also promote the degradation of the p65 phosphorylation mutants (Fig. S4 A, B in the revised manuscript). Thus, Vangl2 promote the degradation of the basal/unphosphorylated p65. (Page 8, lines 237-240)

      (2) Fig 1A: The data indicates the presence of two subgroups within the sepsis cohort - one with high Vangl2 expressions and the other with relatively normal Vangl2 expression. Was there any difference with respect to NF-κB target inflammatory gene expressions between these subgroups?

      As suggested, we conducted an analysis of NF-kB target inflammatory gene expressions between the high and relatively low Vangl2 expression groups in sepsis patients. The results showed that the serum of the high Vangl2 expression group exhibited lower levels of IL-6, WBC, and CRP than the low Vangl2 expression group, which suggested an inverse correlation between Vangl2 and the inflammatory response (Fig. S1 A in the revised manuscript) (Page 5, lines 126-128).

      (3) The effect of Vangl2 deficiency was rather modest in the neutrophil. Could it be that Vangl2 mediates its effect mostly in macrophages?

      As showed in Fig. S1C-E, the induction of Vangl2 by LPS stimulation is more rapid in macrophages than in neutrophils. This may contribute to its dominant effect in macrophages. Consequently, we primarily focused our investigation on the role of Vangl2 in macrophages.

      (4) Fig 1D and Figure 1E: Data for unstimulated Vangl2 cells should be provided. Also, the source of the IL-1β primary antibody has not been mentioned.

      Thank you for the suggestion. We have updated the data for unstimulated cells in the revised manuscript (Fig. 1 D, E in the revised manuscript). Also, IL-1β primary antibody was purchased from Cell Signaling Technology and the information has been included in the Materials and Methods section (Table S1).

      (5) The relevance and the requirement of RNA-seq analysis are not clear in the present draft. Figure 1E already reveals upregulation of the signature NF-κB target inflammatory genes upon Vangl2 deficiency.

      We agreed with the reviewer that the data presented in Figure 1E demonstrated the upregulation of the signature NF-kB target inflammatory genes upon Vangl2 deficiency in a murine model of LPS induced sepsis. Subsequently, we proceeded to investigate the mechanism by which Vangl2 regulates NF-kB target inflammatory genes at the cellular level in Figure 2. To this end, we performed RNA-seq analysis to screen signal pathways involved in LPS-induced septic shock by comparing LPS-stimulated BMDMs from Vangl2ΔM and WT mice, and identified that TNF signaling pathway and cytokine-cytokine receptor interaction were found to be significantly enriched in Vangl2ΔM BMDMs upon LPS stimulation. This analysis provides further evidence that Vangl2 plays a role in regulating NF-kB signaling pathways and the release of related inflammatory cytokines.

      (6) Fig 2A reveals an increased accumulation of phosphorylated p65 and IKK in Vangl2-deficient macrophages upon LPS stimulation within 30 minutes. However, Vangl2 accumulates at around 60 minutes post-stimulation in WT cells. Similar results were obtained for neutrophils (Fig 2B). There appears to be a temporal disconnect between Vangl2 and phosphorylated p65 accumulation - this must be clarified.

      This concern has been addressed above (see response to questions 1 from reviewer #2). 

      (7) Figure 2E and 2F do not have untreated controls. Presentations in Fig 2E may be improved to more clearly depict IL6 and TNF data, preferably with separate Y-axes.

      Thank you for the suggestion. We have added untreated controls and separated Y-axes for IL-6 and TNF data in the revised manuscript (Fig. 2 E, F in the revised manuscript).

      (8) Line 219: "strongly with IKKα, p65 and MyD88, and weak" - should be revised.

      We have improved the manuscript as suggested in the revised manuscript (Page 7; Line 203).

      (9) It is not clear why IKKβ was excluded from interaction studies in Fig S3G.

      We added the Co-IP experiment and showed that HA-tagged Vangl2 only interacted with Flag-tagged p65, but not with Flag-tagged IKKb in 293T cells (Fig S3H). Furthermore, endogenous co-IP immunoblot analyses showed that Vangl2 did not associate with IKKb (Fig. S3I)

      (10) Fig 3F- In the text, authors mentioned that Vangl2 strongly associates with p65 upon LPS stimulation in BMDM. However, no controls, including input or another p65-interacting protein, were used.

      As reviewer suggested, we have added input and positive control (IkBa) in this experiment (Fig. 3F in the revised manuscript). The results demonstrated that the interaction between p65 and IkBa was attenuated, although the total IkBa did not undergo significant degradation over long-term course of LPS stimulation.

      (11) Figure 4D - Authors claim that Vangl2-deficient BMDMs stabilized the expression of endogenous p65 after LPS treatment. However, p65 levels were particularly constitutively elevated in knockout cells, and LPS signaling did not cause any further upregulation. This again indicates the role of Vangl2 in the basal state. The authors need to explain this and revise the test accordingly.

      Thank you for the reviewer's comments. We repeated the experiment to ascertain whether Vangl2 could stabilize the expression of endogenous p65 before and after LPS treatment. It was found that, due to the extremely low expression of Vangl2 in WT cells in the absence of stimulation, there was no observable difference on the basal level of p65 between WT and Vangl2DM cells. However, upon prolonged LPS stimulation, Vangl2 expression was induced, resulting in p65 degradation in WT cells. In contrast, p65 protein was more stable in Vangl2 deficient cells after LPS stimulation (Fig. 4D in the revised manuscript).

      Reviewer #3 (Public Review):

      Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, the findings are novel and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. …….Regardless, Vangl2 as a negative regulator of NF-kappaB is an important finding. There are, however, some concerns about methodology and statistics that need to be addressed.

      Thank you for your comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Whether PCP is anyway relevant or if this is a PCP-independent function of Vangl2 is not directly explored (the later appears more likely from the manuscript/discussion). PCP pathways intersect often with developmentally important pathways such as WNT, HH/GLI, Fat-Dachsous and even mechanical tension. It might be of importance to investigate whether Vangl2-dependent NF-kappaB is influenced by developmental pathways.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Moreover, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) Are Vangl2 phosphorylations (S5, S82 and S84) in anyway necessary for the observed effects on NF-kappaB or would a phospho-mutant (alanine substitution mutant) Vangl2 phenocopy WT Vangl2 for regulation of NF-kappaB?

      As suggested, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B in the revised manuscript), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation.

      (3) Another area to strengthen might be with regards to specificity of cell types where this phenomenon may be observed. LPS treatment in mice resulted in Vangl2 upregulation in spleen and lymph nodes, but not in lung and liver. What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? After all, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments.

      (1) LPS is an important mediator to trigger sepsis with excessive immune activation. As is well known, the spleen and lymph nodes are important peripheral immune organs, where immune cells (e.g., macrophages) are abundant and respond sensitively to LPS stimulation. Nevertheless, immune cells represent a minor fraction of the lungs and liver. Consequently, Vangl2 represents a pivotal regulator of immune function, exhibiting a more pronounced increase in the immune organs and cells.

      2) Induction of Vangl2 expression by LPS stimulation is cell specific. Given that different cells exhibit varying protein abundances, the molecular events involved may also differ. Moreover, we observed high Vangl2 expression in the liver at the basal state (Author response image 1), whereas it was not induced after 12 h of LPS stimulation. Therefore, the functional role of Vangl2 exhibits significant phenotype in macrophages and neutrophils/spleen and LN, rather than in liver or lung cells.

      Author response image 1.

      Vangl2 showed no significant changes in the liver after LPS treatment. Mice (n≥3) were treated with LPS (30 mg/kg, i.p.). Livers were collected at 12 h after LPS treatment. Immunoblot analysis of Vangl2.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      General points:

      Figure 4G- panels appear mislabeled. Pl correct.

      We have corrected this mislabeling as you suggested.

      The dynamics of Vangl2 interaction with p65 and autophagy adaptors is not clear/apparent. For example, Vangl2 expression destabilises p65 levels (as in Fig. 4), but in Fig. 5, it seems there is no decline in the p65 protein level, and a large fraction of it coprecipitates with NDP52.

      We appreciate the reviewer’s comments. In the co-IP assay, we used the lysosomal inhibitor CQ to inhibit p65 degradation to observe the interaction between p65 and NDP52 or Vangl2.

      Fig 5E- I would expect p65 levels to be lower in WT cells than Vangl2 KO cells. But as such, there is no difference between the two.

      We appreciate the reviewer’s comments. We repeated the experiments and updated the data. Firstly, Vangl2 was not induced in WT cells in the absence of LPS stimulation, thus there was no difference in p65 expression between the two groups at the basal level. Secondly, we used CQ/Baf-A1 to inhibit the degradation of Vangl2 in the co-IP assay to observe the interaction between p65 and other molecule.

      Reviewer #2 (Recommendations For The Authors):

      A few points that can be looked at and revised.

      (1) Quantification of the presented data is needed for Fig 4D and Fig 4E.

      We added the quantification analysis as suggested.  

      (2) The labeling of Fig 4G should be scrutinized.

      We have corrected this mislabeling as you suggested.

      (3) Fig 6B and Fig 6C should be explained in the result section more elaborately.

      We thank the reviewer for the suggestion, and we have rephrased this sentence to better describe the results. (Page 10, lines 306-313)

      (4) Line 85: "Vangl2 mediated downstream of Toll-like or interleukin (IL)-1" - unclear.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 3, lines 68)

      (5) Line 181: "mice. Differentially expression analysis" - this should be revised.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 11, lines 323)

      (6) Line 261-264- CHX-chase assay showed the degradation rate of p65 in Vangl2-deficient BMDM was slower compared with WT cells. However, Vangl2 is not induced in WT BMDMs upon CHX treatment (Fig. S4B).

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Fig. S4D).

      (7) Finally, some editing to provide data only critical for the conclusions could improve the ease of reading.

      We have further improved the manuscript as suggested in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Comments (general, please address at least in Discussion. Some experimental data, for example the role, if any, of Vangl2 phosphorylations will be very useful):

      (1) It might be interesting to explore whether there are any potential effects of developmental pathways on the observed effect mediated by Vangl2 or if the effects are entirely a PCP-independent function of Vangl2. Please see above public review.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Furthermore, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation. In addition, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? Afterall, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments. A similar question has been addressed above (refer to the response to question 3 of reviewer 3).

      (3) Another specificity-related question that comes to mind is whether the Vangl2 function in autolysomal/autophagic degradation is restricted to p65 as the exclusive substrate? The cytosolic targeting of p65 as opposed to the more well-known nuclear-targeting is interesting.

      Our previous finding demonstrated that Vangl2 inhibits antiviral IFN-I signaling by targeting TBK1 for autophagic degradation (doi: 10.1126/sciadv.adg2339), thereby indicating that p65 is not the sole substrate for Vangl2. However, in the NF-kB pathway, p65 is a specific substrate for Vangl2. Moreover, our findings indicate that the interaction between Vangl2 and p65 occurs predominantly in the cytoplasm, rather than in the nucleus (Fig. S4 C).

      (4) Pharmacological approach is used to tease apart autolysosome versus proteasome pathway. What is the physiological importance of autophagic degradation? It is interesting to note that Vangl2 was already previously implicated in degrading LAMP-2A and increasing chaperon-mediated autophagy (CMA)-lysosome numbers (PMID: 34214490).

      Previous literature has domonstrated that Vangl2 can inhibit CMA degradation (PMID: 34214490). However, in our study, we found that Vangl2 can promote the selective autophagic degradation of p65. It is important to note that CMA degradation and selective autophagic degradation are two distinct degradation modes, which is not contradictory.

      (5) Are these phenotypes discernable in heterozygotes or only when ablated in homozygosity? Any phenotypes recapitulated in the looptail heterozygote mice?

      We found that these phenotypes discernable only in homozygosity.

      (6) What is the conservation of the Vangl2 p65-interaction site between Vangl2 and Vangl1? PDLIM2 recruitment between Vangl2 and Vangl1?

      We appreciate the reviewer’s comments on our manuscript. Previous studies have shown that human Vangl1 and Vangl2 exhibit only 72% identity and exhibit distinct functional properties (doi: 10.1530/ERC-14-0141).Thus, the interaction of Vangl2 with p65 and PDLIM2 recruitment may not necessarily occur in Vangl1.

      Comments (specific to experiments and data analyses. Please address the following):

      (7) The patient population used in Fig 1 is not described in the Methods. This is a critical omission. Were age, sex etc. controlled for between healthy and disease? How was the diagnosis made? What times during sepsis were the samples collected? As presented, this data is impossible to evaluate and interpret.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised supplement materials. (Supplementary information, Page 12, lines 146-147)

      (8) In general, the statistical method should be described for each experiment presented in the figures. Comparisons should not be made only at the time point with maximal difference (such as in Fig 1F or Fig 2C, but at all time points using appropriate statistical methods). The sample size should also be included to allow determination appropriateness of parametric or non-parametric tests.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Figures 1F and 2C).

      (9) PCP pathways can activate p62/SQSTM1 or JNK via RhoA. JNK activation should be tested experimentally.

      According to the reviewer's comments, we further examined the effect of Vangl2 on the JNK pathway. The results showed that Vangl2 did not affect the JNK pathway (Author response image 2). This suggests that Vangl2 functions independently of the PCP pathway.

      Author response image 2.

      Vangl2 did not affect the JNK pathway. WT and Vangl2-deficient (n≥3) BMDMs were stimulated with LPS (100 ng/ml) for the indicated times. Immunoblot analysis of total and phosphorylated JNK.

      (10) Why are different cells such as A549, HEK293, CHO, 293T, THP-1 used during the studies for different experiments? Consistency would improve rigor. At least, logical explanation driving the cell type of choice for each experiment should be included in the manuscript. Nonetheless, one aspect of using a panel of cell lines indicate that the effect of Vangl2 on NF-kappa B is pleiotropic.

      We are grateful to the reviewer for their comments on our manuscript. A549, HEK293, CHO, and 293T cells are commonly utilized in protein-protein interaction studies. The selection of cell lines for overexpression (exogenous) experiment is dependent on their transfection efficiency and the ability to express TLR4 (the receptor for LPS). Additionally, we conducted endogenous experiments by using THP-1 and BMDMs, which are human macrophage cell lines and murine primary macrophages, respectively. Moreover, we generated Vangl2f/f lyz-cre mice by specifically knocking out Vangl2 in myeloid cells, and investigated the effect of Vangl2 on NF-kB signaling in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript examines the contribution of the dorsal and intermediate hippocampus to goal-directed navigation in a wide virtual environment where visual cues are provided by the scenery on the periphery of a wide arena. Among a choice of 2 reward zones located near the arena periphery, rats learn to navigate from the center of the arena to the reward zone associated with the highest reward. Navigation performance is largely assessed from the rats' body orientation when they leave the arena center and when they reach the periphery, as well as the angular mismatch between the reward zone and the site rats reach the periphery. Muscimol inactivation of the dorsal and intermediate hippocampus alters rat navigation to the reward zone, but the effect was more pronounced for the inactivation of the intermediate hippocampus, with some rat trajectories ending in the zone associated with the lowest reward. Based on these results, the authors suggest that the intermediate hippocampus is critical, especially for navigating to the highest reward zone.

      Strengths:

      -The authors developed an effective approach to study goal-directed navigation in a virtual environment where visual cues are provided by the peripheral scenery.

      - In general, the text is clearly written and the figures are well-designed and relatively straightforward to interpret, even without reading the legends.

      - An intriguing result, which would deserve to be better investigated and/or discussed, was that rats tended to rotate always in the counterclockwise direction. Could this be because of a hardware bias making it easier to turn left, some aspect of the peripheral landscape, or a natural preference of rats to turn left that is observable (or reported) in a real environment?

      Thank you for the insightful question. As the reviewer mentioned, the counterclockwise rotation behavior was intriguing and unexpected. To answer the reviewer’s question properly, we examined whether such stereotypical turning behavior appeared before the rats acquired the task rule and reward zones in the pre-surgical training phase of the task. Data from the last day of shaping and the first day of the pre-surgical main task day showed no significant difference in the number of trials in which the first body-turn was either clockwise or counterclockwise, suggesting that the rats did not have a bias toward a specific side (p=0.46 for Shaping; p=0.76 for the Main task, Wilcoxon signed-rank test). These results excluded the possibility that there was something in the apparatus's hardware that made the rats turn only to the left. Also, since we used the same peripheral landscape for the shaping and main task, we could assume that the peripheral landscape did not cause movement bias.

      Author response image 1.

      Although it remains inconclusive, we have noticed that some prior studies alluded to a phenomenon similar to this issue, framed as the topic of lateralization or spatial preference by comparing left and right biases. For example, Wishaw et al. (1992) suggested that there was natural lateralization in rats (“Most of the rats displayed either a strong right limb bias or a strong left limb bias.”) but no dominance to a specific side. Andrade et al. (2001) also claimed that “83% of Wistar rats spontaneously showed a clear preference for left or right arms in the T-maze.” However, to the best of our knowledge, there has been no direct evidence that rats have a dominant natural preference only to one side.

      Therefore, while the left-turning behavior remains an intriguing topic for further investigation, we find it difficult to pinpoint the reason behind the behavior in the current study. However, we would like to emphasize that this behavior did not interrupt testing our hypothesis. Nonetheless, we agree with the reviewer’s point that the counterclockwise rotation needs to be discussed more, so we revised the manuscript as follows:

      “To rule out the potential effect of hardware bias or any particular aspect of peripheral landscape to make rats turn only to one side, we measured the direction of the first body-turn in each trial on the last day of shaping and the first day of the main task (i.e., before rats learned the reward zones). There was no significant difference between the clockwise and counterclockwise turns (p=0.46 for shaping, p=0.76 for main task; Wilcoxon signed-rank test), indicating that the stereotypical pattern of counterclockwise body-turn appeared only after the rats learned the reward locations.” (p.6)

      - Another interesting observation, which would also deserve to be addressed in the discussion, is the fact that dHP/iHP inactivations produced to some extent consistent shifts in departing and peripheral crossing directions. This is visible from the distributions in Figures 6 and 7, which still show a peak under muscimol inactivation, but this peak is shifted to earlier angles than the correct ones. Such change is not straightforward to interpret, unlike the shortening of the mean vector length.

      Maybe rats under muscimol could navigate simply by using the association of reward zone with some visual cues in the peripheral scene, in brain areas other than the hippocampus, and therefore stopped their rotation as soon as they saw the cues, a bit before the correct angle. While with their hippocampus is intact, rats could estimate precisely the spatial relationship between the reward zone and visual cues.

      We agree with the possibility suggested by the reviewer. However, although not described in the original manuscript, we performed several different control experiments in a few rats using various visual stimulus manipulations to test how their behaviors change as a result. One of the experiments was the landmark omission test, where one of the landmarks was omitted. The landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. We observed that the omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zone.

      Author response image 2.

      Therefore, it is unlikely that rats used the spatial relationship between the reward zone and a specific visual cue to solve the task in our study. However, the result was based on an insufficient sample size (n=3), not permitting any meaningful statistical testing. Thus, we have now updated this information in the manuscript as an anecdotal result as follows:

      “Additionally, to investigate whether the rats used a certain landmark as a beacon to find the reward zones, we conducted the landmark omission test as a part of control experiments. Here, one of the landmarks was omitted, and the landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. The omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zones. The result can be reported anecdotally only because of an insufficient sample size (n=3), not permitting any meaningful statistical testing.” (p.9)

      Weaknesses:

      -I am not sure that the differential role of dHP and iHP for navigation to high/low reward locations is supported by the data. The current results could be compatible with iHP inactivation producing a stronger impairment on spatial orientation than dHP inactivation, generating more erratic trajectories that crossed by chance the second reward zone.

      To make the point that iHP inactivation affects the disambiguation of high and low reward locations, the authors should show that the fraction of trajectories aiming at the low reward zone is higher than expected by chance. Somehow we would expect to see a significant peak pointing toward the low reward zone in the distribution of Figures 6-7.

      We thank the reviewer for the valuable comments. We agree that it is difficult to rigorously distinguish the loss of value representation from spatial disorientation in our experiment. Since the trial ended once the rat touched either reward zone, it was difficult to specify whether they intended to arrive at the location or just moved randomly and arrived there by chance. Moreover, it is possible that the drug infusion did not completely inactivate the iHP but only partially did so.

      To investigate this issue further, we checked whether the distribution of the departure direction (DD) differed between the trials in which rats initially headed north (NW, N, NE) and south (SE, S, SW) at the start. In the manuscript, we demonstrated that DD aligned with the high-value zone, indicating that the rat remembered the scenes associated with the high-value zone (p.8). Based on the rats’ characteristic counterclockwise rotation, the reward zone rats would face first upon starting while heading north would be the high-value zone. On the other hand, the rat would face the low-value reward zone when starting while heading south. In this case, normal rats would inhibit leaving the start zone and rotate further until they face the high-value zone before finally departing the start location. If the iHP inactivation caused a more severe impairment in spatial orientation but not in value representation, it is likely that the iHP-inactivated rats in both north- and south-starting trials would behave similarly with the dHP-inactivated rats, but producing a larger deviation from the high-value zone. However, if the iHP inactivation affected the disambiguation of high and low reward locations, north and south-starting trials would show different DD distributions.

      The circular plots shown below are the DD distributions of dMUS and iMUS. We could see that when they started facing north, iHP-inactivated rats still aligned themselves towards the high-value zone and thus remained spatially oriented, similar to the dHP inactivation session. However, in the south-starting trials, the DD distribution was completely different from the north-starting trials; the rats failed in body alignment towards the high-value zone. Instead, they departed the start point while heading south in most trials. This pattern was not seen in dMUS sessions, even in their south-starting trials, illustrating the distinct deficit caused by iHP inactivation. Additionally, most of the rats with iHP inactivation visited the low-value zone more in south-headed starting trials than in the north-headed trials, except for one rat.

      Author response image 3.

      Furthermore, we would like to clarify that we do not limit the effect of iHP inactivation to the impairment in distinguishing the high and low reward zones. It is possible that iHP inactivation resulted in the loss of a global value-representing map, leading to the impairment in distinguishing both reward zones from other non-rewarded areas in the environment. Figures 6 and 7 implicated this possibility by showing that the peaks are not restricted only to the reward zones. Unfortunately, we cannot rigorously address this in the current study because of the limitations of our experimental design mentioned above.

      Nonetheless, we agree with the reviewer that this limitation needs to be addressed, so we now added how the current study needs further investigation to clarify what causes the behavioral change after the iHP inactivation in the Limitations section (p.21).

      Reviewer #2 (Public Review):

      Summary:

      The aim of this paper was to elucidate the role of the dorsal HP and intermediate HP (dHP and iHP) in value-based spatial navigation through behavioral and pharmacological experiments using a newly developed VR apparatus. The authors inactivated dHP and iHP by muscimol injection and analyzed the differences in behavior. The results showed that dHP was important for spatial navigation, while iHP was critical for both value judgments and spatial navigation. The present study developed a new sophisticated behavioral experimental apparatus and proposed a behavioral paradigm that is useful for studying value-dependent spatial navigation. In addition, the present study provides important results that support previous findings of differential function along the dorsoventral axis of the hippocampus.

      Strengths:

      The authors developed a VR-based value-based spatial navigation task that allowed separate evaluation of "high-value target selection" and "spatial navigation to the target." They were also able to quantify behavioral parameters, allowing detailed analysis of the rats' behavioral patterns before and after learning or pharmacological inactivation.

      Weaknesses:

      Although differences in function along the dorsoventral axis of the hippocampus is an important topic that has received considerable attention, differences in value coding have been shown in previous studies, including the work of the authors; the present paper is an important study that supports previous studies, but the novelty of the findings is not that high, as the results are from pharmacological and behavioral experiments only.

      We appreciate the reviewer's insightful comments. In response, we would like to emphasize that a very limited number of studies investigated the function of the intermediate hippocampus, especially in spatial memory tasks. We tested the differential functions of the dorsal and intermediate hippocampus using a within-animal design and used reversible inactivation manipulation (i.e., muscimol injection) to prevent potential compensation by other brain regions when using irreversible manipulation techniques (i.e., lesion). Also, very few studies have analyzed the navigation trajectories of animals as closely as in the current study. We emphasize the novelty of our study by comparing it with prior studies, as shown below in Table 1.

      Author response table 1.

      Comparison of our study with those from prior studies

      Moreover, to the best of our knowledge, the current manuscript is the first to investigate the hippocampal subregions along the long axis in a VR environment using a hippocampal-dependent spatial memory task. Nonetheless, we agree that the current study has a limitation as a behavior-only experiment. We now have added a comment on how other techniques, such as electrophysiology, would develop our findings in the Limitation section (p.21).

      Reviewer #3 (Public Review):

      Summary:

      The authors established a new virtual reality place preference task. On the task, rats, which were body-restrained on top of a moveable Styrofoam ball and could move through a circular virtual environment by moving the Styrofoam ball, learned to navigate reliably to a high-reward location over a low-reward location, using allocentric visual cues arranged around the virtual environment.

      The authors also showed that functional inhibition by bilateral microinfusion of the GABA-A receptor agonist muscimol, which targeted the dorsal or intermediate hippocampus, disrupted task performance. The impact of functional inhibition targeting the intermediate hippocampus was more pronounced than that of functional inhibition targeting the dorsal hippocampus.

      Moreover, the authors demonstrated that the same manipulations did not significantly disrupt rats' performance on a virtual reality task that required them to navigate to a spherical landmark to obtain reward, although there were numerical impairments in the main performance measure and the absence of statistically significant impairments may partly reflect a small sample size (see comments below).

      Overall, the study established a new virtual-reality place preference task for rats and established that performance on this task requires the dorsal to intermediate hippocampus. They also established that task performance is more sensitive to the same muscimol infusion (presumably - doses and volumes used were not clearly defined in the manuscript, see comments below) when the infusion was applied to the intermediate hippocampus, compared to the dorsal hippocampus, although this does not offer strong support for the authors claim that dorsal hippocampus is responsible for accurate spatial navigation and intermediate hippocampus for place-value associations (see comments below).

      Strengths:

      (1) The authors established a new place preference task for body-restrained rats in a virtual environment and, using temporary pharmacological inhibition by intra-cerebral microinfusion of the GABA-A receptor agonist muscimol, showed that task performance requires dorsal to intermediate hippocampus.

      (2) These findings extend our knowledge about place learning tasks that require dorsal to intermediate hippocampus and add to previous evidence that, for some place memory tasks, the intermediate hippocampus may be more important than other parts of the hippocampus, including the dorsal hippocampus, for goal-directed navigation based on allocentric place memory.

      (3) The hippocampus-dependent task may be useful for future recording studies examining how hippocampal neurons support behavioral performance based on place information.

      Weaknesses:

      (1) The new findings do not strongly support the authors' suggestion that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The authors base this claim on the differential effects of the dorsal and intermediate hippocampal muscimol infusions on different performance measures. More specifically, dorsal hippocampal muscimol infusion significantly increased perimeter crossings and perimeter crossing deviations, whereas dorsal infusion did not significantly change other measures of task performance, including departure direction and visits to the high-value location. However, these statistical outcomes offer only limited evidence that dorsal hippocampal infusion specifically affected the perimeter crossing, without affecting the other measures. Numerically the pattern of infusion effects is quite similar across these various measures: intermediate hippocampal infusions markedly impaired these performance measures compared to vehicle infusions, and the values of these measures after dorsal hippocampal muscimol infusion were between the values in the intermediate hippocampal muscimol and the vehicle condition (Figures 5-7). Moreover, I am not so sure that the perimeter crossing measures really reflect distinct aspects of navigational performance compared to departure direction and hit rate, and, even if they did, which aspects this would be. For example, in line 316, the authors suggest that 'departure direction and PCD [perimeter crossing deviation] [are] indices of the effectiveness and accuracy of navigation, respectively'. However, what do the authors mean by 'effectiveness' and 'accuracy'? Accuracy typically refers to whether or not the navigation is 'correct', i.e. how much it deviates from the goal location, which would be indexed by all performance measures.

      So, overall, I would recommend toning down the claim that the findings suggest that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The reviewer mentioned that the statistical outcomes offer limited evidence as the dHP inactivation results were always positioned between the results of the iHP inactivation and controls. However, we would like to emphasize that, projecting to each other, the two subregions are not completely segregated anatomically. It is highly likely this is also true functionally and there should be some overlap in their roles. Considering such relationships between the dHP and iHP, it could be natural to see an intermediate effect after inactivating the dHP, and that is why we focused on the “magnitude” of behavioral changes after inactivation instead of complete dissociation between the two subregions in our manuscript. Unfortunately, because of the nature of the drug infusion study, further dissociation would be difficult, requiring further investigation with different experimental techniques, such as physiological examinations of the neural firing patterns between the two regions. We mentioned this caveat of the current study in the Limitations as follows:

      “However, our study includes only behavioral results and further mechanistic explanations as to the processes underlying the behavioral deficits require physiological investigations at the cellular level. Neurophysiological recordings during VR task performance could answer, for example, the questions such as whether the value-associated map in the iHP is built upon the map inherited from the dHP or it is independently developed in the iHP.” (p.21)

      Regarding the reviewer’s comment on the meaning of measuring the perimeter crossing directions, we would like to draw the reviewer’s attention to the individual trajectories during the iMUS sessions described in Figure 5. Particularly when they were not confident with the location of the higher reward, rats changed their heading directions during the navigation, which resulted in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes toward the goal zone. In contrast, rats showing effective navigation hardly bumped into the wall or perimeter before hitting the goal zone. Thus, their PCDs matched DDs almost always. When considered together with DD, our PCD measure could tell whether rats not hitting the goal zone directly after departure were impaired in either maintaining the correct heading direction to the goal zone at the start location or orienting themselves to the target zone accurately from the start. Our results suggest that the latter is the case. We included the relevant explanation in the Discussion section as follows:

      “Particularly, rats changed their heading directions during the navigation when they were not confident with the location of the higher reward, resulting in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes. Therefore, when considered together with DD, our PCD measure could tell that the rats not hitting the goal zone directly after departure were impaired in orienting themselves to the target zone accurately from the start, not in maintaining the correct heading direction to the goal zone at the start location.” (p.19)

      Nonetheless, we agree with the reviewer that the term ‘accuracy’ might be confusing with performance accuracy, so we replaced the term with ‘precision’ throughout the manuscript, referring to the precise targeting of the reward zones.

      (2) The claim that the different effects of intermediate and dorsal hippocampal muscimol infusions reflect different functions of intermediate and dorsal hippocampus rests on the assumption that both manipulations inhibit similar volumes of hippocampal tissue to a similar extent, but at different levels along the dorso-ventral axis of the hippocampus. However, this is not a foregone conclusion (e.g., drug spread may differ depending on the infusion site or drug effects may differ due to differential expression of GABA-A receptors in the dorsal and intermediate hippocampus), and the authors do not provide direct evidence for this assumption. Therefore, a possible alternative account of the weaker effects of dorsal compared to intermediate hippocampal muscimol infusions on place-preference performance is that the dorsal infusions affect less hippocampal volume or less markedly inhibit neurons within the affected volume than the intermediate infusions. I would recommend that the authors briefly consider this issue in the discussion. Moreover, from the Methods, it is not clear which infusion volume and muscimol concentration were used for the different infusions (see below, 4.a.), and this must be clarified.

      We appreciate these insightful comments from the reviewer and agree that we do not provide direct evidence for the point raised by the reviewer. To the best of our knowledge, most of the behavioral studies on the long axis of the hippocampus did not particularly address the differential expression of GABA-A receptors along the axis. We could not find any literature that specifically introduced and compared the levels of expression of GABA-A receptors or the diffusion range of muscimol in the intermediate hippocampus to the other subregions. However, we found that Sotiriou et al. (2005) made such comparisons with respect to the expression of different GABA-A receptors. They concluded that the dorsal and ventral hippocampi have different levels of the GABA-A receptor subtypes. The a1/b2/g2 subtype was dominant in the dorsal hippocampus, while the a2/b1/g2 subtype was prevalent in the ventral hippocampus. Sotiriou and colleagues also mentioned the lower affinity of GABA-A receptor binding in the ventral hippocampus, and this result is consistent with the Papatheodoropoulos et al. (2002) study that showed a weaker synaptic inhibition in the ventral hippocampus compared to the dorsal hippocampus. Papatheodoropoulos et al. speculated differences in GABA receptors as one of the potential causes underlying the differential synaptic inhibition between the dorsal and ventral hippocampal regions. Based on these findings, the same volume of muscimol is more likely to cause a more severe effect on the ventral hippocampus than the dorsal hippocampus. Therefore, we do not believe that the less significant changes after the dorsal hippocampal inactivation were induced by the expression level of GABA-A receptors. Additionally, we have demonstrated in our previous study that muscimol injections in the dorsal hippocampus impair performance to the chance level in scene-based behavioral tasks (Lee et al., 2014; Kim et al., 2012).

      Nonetheless, we mentioned the possibility of differential muscimol expressions between the two target regions. Following the suggestion of the reviewer, we now included this information in the Discussion as follows:

      “Although there is still a possibility that the levels of expression of GABA-A receptors might be different along the longitudinal axis of the hippocampus, …” (p.20)

      Regarding the drug infusion volume and concentration, we included these details in the Methods. Please see our detailed response to 4.a. below.

      (3) It is good that the authors included a comparison/control study using a spherical beacon-guided navigation task, to examine the specific psychological mechanisms disrupted by the hippocampal manipulations. However, as outlined below (4.b.), the sample size for the comparison study was lower than for the main study, and the data in Figure 8 suggest that the comparison task may be affected by the hippocampal manipulations similarly to the place-preference task, albeit less markedly. This would raise the question as to which mechanisms that are common to the two tasks may be affected by hippocampal functional inhibition, which should be considered in the discussion.

      The sample size for the object-guided navigation task was smaller because we initially did not plan the experiment, but later in the study decided to conduct the control test. Therefore, the object-guided navigation task was added to the study design after finishing the first three rats, resulting in a smaller sample size than the place preference task. We included this detail in the manuscript, as follows:

      “Note the smaller sample size in the object-guided navigation task. This was because the task was later added to the study design.” (p.24)

      Regarding the mechanism behind the two different tasks, we did not perform the same heading direction analysis here as in the place preference task because the two tasks have different characteristics such as task complexity. The object-guided navigation task is somewhat similar to the visually guided (or cued) version of the water maze task, which is widely known as hippocampal-independent (Morris et al., 1986; Packard et al., 1989; also see our descriptions on p.15). Therefore, we would argue that the two tasks (i.e., place preference task and object-guided navigation task) used in the current manuscript do not share neural mechanisms in common. Additionally, we confirmed that several behavioral measurements related to motor capacity, such as travel distance and latency, along with the direct hit proportion provided in Figure 8, did not show any statistically significant changes across drug conditions.

      4. Several important methodological details require clarification:

      a. Drug infusions (from line 673):

      - '0.3 to 0.5 μl of either phosphate-buffered saline (PBS) or muscimol (MUS) was infused into each hemisphere'; the authors need to clarify when which infusion volume was used and why different infusion volumes were used.

      We thank the reviewer for carefully reading our manuscript. We were cautious about side effects, such as suppressed locomotion or overly aggressive behavior, since the iHP injection site was close to the ventricle. We were keenly aware that the intermediate to ventral hippocampal regions are sensitive to the drug dosage from our previous experiments. Thus, we observed the rat’s behavior for 20 minutes after drug injection in a clean cage. We started from 0.5 μl, based on our previous study, but if the injected rat showed any sign of side effects in the cage, we stopped the experiment for the day and tried with a lower dosage (i.e., 0.4 μl first, then 0.3 μl, etc.) until we found the right dosage under which the rat did not show any side effect. This procedure is necessary because cannula tip positions are slightly different from rat to rat. When undergoing this procedure, five out of eight rats received 0.4 μl, two received 0.3 μl, and one received 0.5 μl. Still, there was no significant difference in performance, including the high-value visit percentage, departing and perimeter crossing directions, across all dosages. This information is now added in the Methods section as follows:

      “If the rat showed any side effect, particularly sluggishness or aggression, we reduced the drug injection amount in the rat by 0.1 ml until we found the dosage with which there was no visible side effect. As a result, five of the rats received 0.4 ml, two received 0.3 ml, and one received 0.5 ml.” (p.25)

      - I could not find the concentration of the muscimol solution that was used. The authors must clarify this and also should include a justification of the doses used, e.g. based on previous studies.

      Thank you for the suggestion. We used the drug concentration of 1mg/ml, which was adapted from our previous muscimol study (Lee et al., 2014; Kim et al., 2012). The manuscript is now updated, as follows:

      “…or muscimol (MUS; 1mg/ml, dissolved in saline) was infused into each hemisphere via a 33-gauge injection cannula at an injection speed of 0.167 ml/min, based on our previous study (Lee et al., 2014; Kim et al., 2012).” (p.25)

      -  Please also clarify if the injectors and dummies were flush with the guides or by which distance they protruded from the guides.

      The injection and dummy cannula both protruded from the guide cannula by 1 mm, and this information is now added to the Methods section, as follows:

      “The injection cannula and dummy cannula extended 1 mm below the tip of the guide cannula.” (p.25)

      b. Sample sizes: The authors should include sample size justifications, e.g. based on considerations of statistical power, previous studies, practical considerations, or a combination of these factors. Importantly, the smaller sample size in the control study using the spherical beacon-guided navigation task (n=5 rats) limits comparability with the main study using the place-preference task (n=8). Numerically, the findings on the control task (Figure 8) look quite similar to the findings on the place-preference task, with intermediate hippocampal muscimol infusions causing the most pronounced impairment and dorsal hippocampal muscimol infusions causing a weaker impairment. These effects may have reached statistical significance if the same sample size had been used in the place-preference study.

      We set the current sample size for several reasons. First, based on our previous studies, we assumed that eight, or more than six, would be enough to achieve statistical power in a “within-animal design” study. Also, considering the ethical commitments, we tried to keep the number of animals used in the study to the least. Last, our paradigm required very long training periods (3 months on average per animal), so we could not increase the sample size for practical reasons. Regarding the reasons for the smaller sample size for the object-guided navigation task, please see the previous response to 3 above. The manuscript is now revised as follows:

      “Based on our prior studies (Park et al., 2017; Yoo and Lee, 2017; Lee et al., 2014), the sample size of our study was set to the least number to achieve the necessary statistical power in the current within-subject study design for ethical commitments and practical considerations (i.e., relatively long training periods).” (p.22)

      c. Statistical analyses: Why were the data of the intermediate and dorsal hippocampal PBS infusion conditions averaged for some of the analyses (Figure 5; Figure 6B and C; Figure 7B and C; Figure 8B) but not for others (Figure 6A and Figure 7A)?

      The reviewer is correct that we only illustrated the separate dPBS and iPBS data for Figures 6A and 7A. Since the directional analysis is the main focus of the current manuscript, we tried to provide better visualization and more detailed examples of how the drug infusion changed the behavioral patterns between the PBS and MUS conditions in each region. Except for the visualization of DD and PCD, we averaged the PBS sessions to increase statistical power, as described in p.9. We added a detailed description of the reasons for illustrating dPBS and iPBS data separately in the manuscript, as follows:

      “Note that dPBS and iPBS sessions were separately illustrated here for better visualization of changes in the behavioral pattern for each subregion.” (p.12)

      Reviewing Editor (Recommendations For The Authors):

      The strength of evidence rating in the assessment is currently noted as "incomplete." This can be improved following revisions if you amend your conclusions in the paper, including in the title and abstract, such that the paper's major conclusions more closely match what is shown in the Results.

      Following the suggestions of the reviewing editor, we have mentioned the caveats of our study in the Limitations section of our revised manuscript (p.21). In addition, the manuscript has been revised so that the conclusions in the paper match more closely to the experimental results as can been seen in some of the relevant sentences in the abstract and main text as follows:

      “Inactivation of both dHP and iHP with muscimol altered efficiency and precision of wayfinding behavior, but iHP inactivation induced more severe damage, including impaired place preference. Our findings suggest that the iHP is more critical for value-dependent navigation toward higher-value goal locations.” (Abstract; p.2)

      “Whereas inactivation of the dHP mainly affected the precision of wayfinding, iHP inactivation impaired value-dependent navigation more severely by affecting place preference.” (p.5)

      “The iHP causes more damage to value-dependent spatial navigation than the dHP, which is important for navigational precision” (p.12)

      However, we haven’t changed the title of the manuscript as it carries what we’d like to deliver in this study accurately.

      Reviewer #1 (Recommendations For The Authors):

      - What were the dimensions of the environment? What distance did rats typically run to reach the reward zone? A scale bar would be helpful in Figure 1.

      We used the same circular arena from the shaping session, which was 1.6 meters in diameter (p.23), and the shortest path between the start location and either reward zone was 0.62 meters. We revised the manuscript for clarification as follows:

      “For the pre-training session, rats were required to find hidden reward zones…, on the same circular arena from the shaping session.” (p.23)

      “Therefore, the shortest path length between the start position and the reward zone was 0.62 meters.” (p.23)

      We also added a scale bar in Figure 1C for a better understanding.

      - Line 169: "The scene rotation plot covers the period from the start of the trial to when the rat leaves the starting point at the center and the departure circle (Figure 2B)."

      The sentence is unclear. Maybe it should be "... from the start of the trial to when the rat leaves the departure circle”.

      The sentence has been revised following the reviewer's suggestion. (p.7)

      - Line 147: "First, they learned to rotate the spherical treadmill counterclockwise to move around in the virtual environment (presumably to perform energy-efficient navigation)."

      It is not clear from this sentence if rats naturally preferred the counterclockwise direction or if the counterclockwise direction was a task requirement.

      We now clarified in our revised manuscript that it was not a task requirement to turn counterclockwise, as follows:

      “First, although it was not required in the task, they learned to rotate the spherical treadmill counterclockwise…” (p.6)

      - Line 149: "Second, once a trial started, but before leaving the starting point at the center, the animal rotated the treadmill to turn the virtual environment immediately to align its starting direction with the visual scene associated with the high-value reward zone."

      The sentence is unclear. Maybe "Second, once a trial started, the animal rotated the treadmill immediately to align its starting direction with the visual scene associated with the high-value reward zone.”

      We have updated the description following the suggestion. (p.6)

      Reviewer #2 (Recommendations For The Authors):

      - There are some misleading descriptions of the conclusion of the results in this paper. In this study, the functions of (a) selection of high-value target and (b) spatial navigation to the target were assessed in the behavioral experiments. The results of the pharmacological experiments showed that dHP inactivation impaired (b) and iHP inactivation impaired both (a) and (b) (Figures 5 B & D). However, the last sentence of the abstract states that dHP is important for the functions of (a) and iHP for (b). There are several other similar statements in the main text. Since the separation of (a) and (b) is an important and original aspect of this study, the description should clearly show the conclusion that dHP is important for (a) and iHP is important for both (a) and (b).

      Related to the above, the paragraph title in the Discussion "The iHP may contain a value-associated cognitive map with reasonable spatial resolution for goal-directed navigation (536-537)" is also somewhat misleading: "with reasonable resolution for goal-directed behavior" seems to reflect the results of an object-guided navigation task (Figure 8). However, the term "goal-directed behavior" is also used for value-dependent spatial navigation (i.e., the main task), which causes confusion. I would like to suggest clarifying the wording on this point.

      First, we need to correct the reviewer’s statement regarding our descriptions of the results. As the reviewer mentioned, our results indicated that the dHP inactivation impaired (b) but not (a), while the iHP inactivation impaired both (a) and (b). Regarding the iHP inactivation result, we focused on the impairment of (a) since our aim was to investigate spatial-value association in the hippocampus. Also, it was more likely that (a) affected (b), but not the other way, because (a) remained intact when (b) was impaired after dHP inactivation. We emphasized this difference between dHP and iHP inactivation, which was (a). Therefore, we mentioned in the last sentence of the abstract that the dHP is important for (b), which is the precision of spatial navigation to the target location, and the iHP is critical for (a).

      Moreover, we would like to clarify that we were not referring to the object-guided navigation task in Figure 8 in the phrase ‘with a reasonable spatial resolution for goal-directed navigation.’ Please note that the object-guided navigation task did not require fine spatial resolution to find the reward. The phrase instead referred to the dHP inactivation result (Figure 5 and 6), where the rats could find the high-value zone even with dHP inactivation, although the navigational precision decreased. Nonetheless, we agree with the reviewer for the confusion that the title might cause, so now have updated the title as follows:

      “The iHP may contain a value-associated cognitive map with reasonable spatial resolution for value-based navigation” (p.19)

      - As an earlier study focusing on the physiology of iHP, Maurer et al, Hippocampus 15:841 (2005) is also a pioneering and important study, and I suggest citing it.

      Thank you for the suggestion. We included the Maurer et al. (2005) study in the Introduction section as follows:

      “…Specifically, there is physiological evidence that the size of a place field becomes larger as recordings of place cells move from the dHP to the vHP (Jung et al., 1994; Maurer et al., 2005; Kjelstrup et al., 2008; Royer et al., 2010).” (p.4)

      - One of the strengths of this paper is that we have developed a new control system for the VR navigation task device, but I cannot get a very detailed description of this system in the Methods section. Also, no information about the system control has been uploaded to GitHub. I would suggest adding a description of the manufacturer, model number, and size of components, such as a rotary encoder and ball, and information about the software of the control system, with enough detail to allow the reader to reconstruct the system.

      We have now added detailed descriptions of the VR system in the Methods section (see “2D VR system). (p.22)

      Reviewer #3 (Recommendations For The Authors):

      (1) Some comments on specific passages of text:

      Lines 87 to 89: 'Surprisingly, beyond the recognition of anatomical divisions, little is known about the functional differentiation of subregions along the dorsoventral axis of the hippocampus. Moreover, the available literature on the subject is somewhat inconsistent.'

      I would recommend to rephrase these statements. Regarding the first statement, there is substantial evidence for functional differentiation along the dorso-ventral axis of the hippocampus (e.g., see reviews by Moser and Moser, 1998, Hippocampus; Bannerman et al., 2004, Neurosci Biobehav Rev; Bast, 2007, Rev Neurosci; Bast, 2011, Curr Opin Neurobiol; Fanselow and Dong, 2010, Neuron; Strange et al., 2014, Nature Rev Neurosci). Regarding the second statement, the authors may consider being more specific, as the inconsistencies demonstrated seem to relate mainly to the hippocampal representation of value information, instead of functional differentiation along the dorso-ventral hippocampal axis in general.

      We agree with the reviewer that the abovementioned statements need further clarification. The manuscript is now revised as follows:

      “Surprisingly, beyond the recognition of anatomical divisions, the available literature on the functional differentiation of subregions along the dorsoventral axis of the hippocampus, particularly in the context of value representation, is somewhat inconsistent.” (p.4)

      Lines 92 to 93: 'Thus, it has been thought that the dHP is more specialized for precise spatial representation than the iHP and vHP.'

      I think 'fine-grained' may be the more appropriate term here. Also, check throughout the manuscript when referring to the differences of spatial representations along the hippocampal dorso-ventral axis.

      Thank you for the insightful suggestion. We changed the term to ‘fine-grained’ throughout the manuscript, as follows:

      “Thus, it has been thought that the dHP is more specialized for fine-grained spatial representation than the iHP and vHP.” (p.4)

      “Consequently, the fine-grained spatial map present in the dHP…” (p.20)

      Line 217: well-'trained' rats?

      We initially used the term ‘well-learned’ to focus on the effect of learning, not training. Please note that the rats were already adapted to moving freely in the VR environment during the Shaping sessions, but the immediate counterclockwise body alignment only appeared after they acquired the reward locations for the main task. Nonetheless, we agree that the term might cause confusion, so we revised the manuscript as the reviewer suggested, as follows:

      “This implies that well-trained rats aligned their bodies more efficiently…” (p.8)

      Lines 309 to 311: 'Taken together, these results indicate that iHP inactivation severely damages normal goal-directed navigational patterns in our place preference task.'

      Consider to mention that dHP inactivation also causes impairments, albeit weaker ones.

      We thank the reviewer for the suggestion. We revised the manuscript by mentioning dHP inactivation as follows:

      “Taken together, these results indicate that iHP inactivation more severely damages normal goal-directed navigational patterns than dHP inactivation in our place-preference task.” (p.11-12)

      Lines 550 to 552: 'The involvement of the iHP in spatial value association has been reported in several studies. For example, Bast and colleagues reported that rapid place learning is disrupted by removing the iHP and vHP, even when the dHP remains undamaged (Bast et al., 2009).'

      Bast et al. (2009) did not directly show the role of iHP in 'spatial value associations'. They suggested that the importance of iHP for behavioral performance based on rapid, one-trial, place learning may reflect neuroanatomical features of the intermediate region, especially the combination of afferents that could convey the required fine-grained visuo-spatial information with relevant afferent and efferent connections that may be important to translate hippocampal place memory into appropriate behavioral performance (this may include afferents conveying value information). More recent theoretical and empirical research suggests that projections to the (ventral) striatum may be relevant (see Tessereau et al., 2021, BNA and Bauer et al., 2021, BNA).

      We appreciate the reviewer for this insightful comment. We agree with the reviewer that Bast et al. (2009) did not directly mention spatial value association; however, learning a new platform location needs an update of value information in the spatial environment. Therefore, we thought the study, though indirectly, suggested how the iHP contributes to spatial value associations. Nonetheless, to avoid confusion, we revised the manuscript, as follows:

      “The involvement of the iHP in spatial value association has been reported or implicated in several studies” (p.20)

      (2) Figures and legends:

      Figure 2B: What do the numbers after novice and expert indicate?

      The numbers indicate the rat ID, followed by the session number. We added the details to the Figure legend, as follows:

      “The numbers after ‘Novice’ and ‘Expert’ indicate the rat and session number of the example.” (p.34)

      Figure 2C: Please indicate units of the travel distance and latency measurements.

      The units are now described in the Figure legends, as follows:

      “Mean travel distance in meters and latency in seconds are shown below the VR arena trajectory.” (p.34)

      Figure 3Aii: Here and in other figures - do the vector lengths have a unit (degree?)?

      No, the mean vector length is an averaged value of the resultant vectors, thus having no specific unit.

      Figure 5A: Please explain what the numbers on top of the individual sample trajectories indicate.

      The numbers are IDs for rats, sessions, and trials of specific examples. We added the explanation to the Figure legends, as follows:

      “Numbers above each trajectory indicate the identification numbers for rat, session, and trial.” (p.35)

      (3) Additional comments on some methodological details:

      a. Why was the non-parametric Wilcoxon signed-rank test used for the planned comparison between intermediate and dorsal hippocampal PBS infusions, whereas parametric ANOVA and post-hoc comparisons were used for other analyses? This probably doesn't make a big difference for the interpretation of the present data (as a parametric pairwise comparison would also not have revealed any significant difference between intermediate and dorsal hippocampal PBS infusions), but it would nevertheless be good to clarify the rationale for this.

      We used the non-parametric statistics since our sample size was rather small (n=8) to use the parametric statistics, although we used the parametric ANOVA for some of the results because it is the most commonly known and widely used statistical test in such comparisons. However, we also checked the statistics with the alternatives (i.e., non-parametric Wilcoxon signed-rank test to parametric paired t-test and parametric One-way RM ANOVA with Bonferroni post hoc test to non-parametric Friedman’s test with Dunn’s post hoc test), and the statistical significance did not change with any of the tests. We now added the explanation in the manuscript, as follows:

      “Although most of our statistics were based on the non-parametric tests for the relatively small sample size (n=8), we used the parametric RM ANOVA for comparing three groups (i.e., PBS, dMUS, and iMUS) because it is the most commonly known and widely used statistical test in such comparison. However, we also performed statistical tests with the alternatives for reference, and the statistical significances were not changed with any of the results.” (p.26)

      b. Single housing of rats:

      Why was this chosen? Based on my experience, this is not necessary for studies involving cannula implants and food restriction. Group housing is generally considered to improve the welfare of rats.

      We chose single housing of rats because our training paradigm required precise restrictions on the food consumption of individual rats, which could be difficult in group housing.

      c. Anesthesia:

      Why was pentobarbital used, alongside isoflurane, to anesthetize rats for surgery (line 663)? The use of gaseous anesthesia alone offers very good control of anesthesia and reduces the risk of death from anesthesia compared to the use of pentobarbital.

      Why was anesthesia used for the drug infusions (line 674)? If rats are well-habituated to handling by the experimenter, manual restraint is sufficient for intra-cerebral infusions. Therefore, anesthesia could be omitted, reducing the risk of adverse effects on the experimental rats.

      I do not think that points b. and c. are relevant for the interpretation of the present findings, but the authors may consider these points for future studies to improve further the welfare of the experimental rats.

      We appreciate the reviewer’s careful suggestions. For both the use of pentobarbital during surgery and anesthesia for the drug infusion, we chose to do so to avoid any risk of rats being awake and becoming anxious and to ensure safety during the procedures. They might not be necessary, but they were helpful for the experimenters to proceed with sufficient time to maintain precision. Nonetheless, we agree with the reviewer’s concern, which was the reason why we monitored the rats’ behavior for 20 minutes in the cage after drug infusion to minimize any potential influence on the task performance. We updated the relevant details in the Methods section, as follows:

      “The rat was kept in a clean cage to recover from anesthesia completely and monitored for side effects for 20 minutes, then was moved to the VR apparatus for behavioral testing.” (p.25)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      fMRI was used to address an important aspect of human cognition - the capacity for structured representations and symbolic processing - in a cross-species comparison with non-human primates (macaques); the experimental design probed implicit symbolic processing through reversal of learned stimulus pairs. The authors present solid evidence in humans that helps elucidate the role of brain networks in symbolic processing, however the evidence from macaques was incomplete (e.g., sample size constraints, potential and hard-to-quantify differences in attention allocation, motivation, and lived experience between species).

      Thank you very much for your assessment. We would like to address the potential issues that you raise point-by-point below.

      We agree that for macaque monkey physiology, sample size is always a constraint, due to both financial and ethical reasons. We addressed this concern by combining the results from two different labs, which allowed us to test 4 animals in total, which is twice as much as what is common practice in the field of primate physiology. (We discuss this now on lines 473-478.)

      Interspecies differences in motivation, attention allocation, task strategies etc. could also be limiting factors. Note that we did address the potential lack of attention allocation directly in Experiment 2 using implicit reward association, which was successful as evidenced by the activation of attentional control areas in the prefrontal cortex. We cannot guarantee that the strategies that the two species deploy are identical, but we tentatively suggest that this might be a less important factor in the present study than in other interspecies comparisons that use explicit behavioral reports. In the current study, we directly measured surprise responses in the brain in the absence of any explicit instructions in either species, which allowed us to  measure the spontaneous reversal of learned associations, which is a very basic element of symbolic representation. Our reasoning is that such spontaneous responses should be less dependent on attention allocation and task strategies. (We discuss this now in more detail on lines 478-485.)

      Finally, lived experience could be a major factor. Indeed, obvious differences include a lifetime of open-field experiences and education in our human adult subjects, which was not available to the monkey subjects, and includes a strong bias towards explicit learning of symbolic systems (e.g. words, letters, digits, etc). However, we have previously shown that 5-month-old human infants spontaneously generalize learning to the reversed pairs after a short learning in the lab using EEG (Kabdebon et al, PNAS, 2019). This indicates that also with very limited experience, humans spontaneously reverse learned associations. (We discuss this now in more detail on lines 478-485.) It could be very interesting to investigate whether spontaneous reversal could be present in infant macaque monkeys, as there might be a critical period for this effect. Although neurophysiology in awake infant monkeys is highly challenging, it would be very relevant for future work. (We discuss this in more detail on lines 493-498.)

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Kerkoerle and colleagues present a very interesting comparative fMRI study in humans and monkeys, assessing neural responses to surprise reactions at the reversal of a previously learned association. The implicit nature of this task, assessing how this information is represented without requiring explicit decision-making, is an elegant design. The paper reports that both humans and monkeys show neural responses across a range of areas when presented with incongruous stimulus pairs. Monkeys also show a surprise response when the stimuli are presented in a reversed direction. However, humans show no such surprise response based on this reversal, suggesting that they encode the relationship reversibly and bidirectionally, unlike the monkeys. This has been suggested as a hallmark of symbolic representation, that might be absent in nonhuman animals. 

      I find this experiment and the results quite compelling, and the data do support the hypothesis that humans are somewhat unique in their tendency to form reversible, symbolic associations. I think that an important strength of the results is that the critical finding is the presence of an interaction between congruity and canonicity in macaques, which does not appear in humans. These results go a long way to allay concerns I have about the comparison of many human participants to a very small number of macaques. 

      We thank the reviewer for the positive assessment. We also very much appreciate the point about the interaction effect in macaque monkeys – indeed, we do not report just a negative finding. 

      I understand the impossibility of testing 30+ macaques in an fMRI experiment. However, I think it is important to note that differences necessarily arise in the analysis of such datasets. The authors report that they use '...identical training, stimuli, and whole-brain fMRI measures'. However, the monkeys (in experiment 1) actually required 10 times more training. 

      We agree that this description was imprecise. We have changed it to “identical training stimuli” (line 151), indeed the movies used for training were strictly identical. Furthermore, please note that we do report the fMRI results after the same training duration. In experiment 1, after 3 days of training, the monkeys did not show any significant results, even in the canonical direction. However, in experiment 2, with increased attention and motivation, a significant effect was observed on the first day of scanning after training, as was found in human subjects (see Figure 4 and Table 3).

      More importantly, while the fMRI measures are the same, group analysis over 30+ individuals is inherently different from comparing only 2 macaques (including smoothing and averaging away individual differences that might be more present in the monkeys, due to the much smaller sample size). 

      Thank you for understanding that a limited sampling size is intrinsic to macaque monkey physiology. We also agree that data analysis in humans and monkeys is necessarily different. As suggested by the reviewer, we added an analysis to address this, see the corresponding reply to the ‘Recommendations for the authors’ section below.

      Despite this, the results do appear to show that macaques show the predicted interaction effect (even despite the sample size), while humans do not. I think this is quite convincing, although had the results turned out differently (for example an effect in humans that was absent in macaques), I think this difference in sample size would be considerably more concerning. 

      Thank you for noting this. Indeed, the interaction effect is crucial, and the task design was explicitly made to test this precise prediction, described in our manuscript as the “reversibility hypothesis”. The congruity effect in the learned direction served as a control for learning, while the corresponding congruity effect in the reversed direction tested for spontaneous reversal. The reversibility hypothesis stipulates that in humans there should not be a difference between the learned and the reversed direction, while there should be for monkeys. We already wrote about that in the result section of the original manuscript and now also describe this more explicitly in the introduction and beginning of the result section.

      I would also note that while I agree with the authors' conclusions, it is notable to me that the congruity effect observed in humans (red vs blue lines in Fig. 2B) appears to be far more pronounced than any effect observed in the macaques (Fig. 3C-3). Again, this does not challenge the core finding of this paper but does suggest methodological or possibly motivational/attentional differences between the humans and the monkeys (or, for example, that the monkeys had learned the associations less strongly and clearly than the humans). 

      As also explained in response to the eLife assessment above, we expanded the “limitations” section of the discussion, with a deeper description of the possible methodological differences between the two species (see lines 478-485).

      With the same worry in mind, we did increase the attention and motivation of monkeys in experiment 2, and indeed obtained a greater activation to the canonical pairs and their violation, -notably in the prefrontal cortex – but crucially still without reversibility.

      In the end, we believe that the striking interspecies difference in size and extent of the violation effect, even for purely canonical stimuli, is an important part of our findings and points to a more efficient species-specific learning system, that our experiment tentatively relates to a symbolic competence.

      This is a strong paper with elegant methods and makes a worthwhile contribution to our understanding of the neural systems supporting symbolic representations in humans, as opposed to other animals. 

      We again thank the reviewer for the positive review.

      Reviewer #2 (Public Review): 

      In their article titled "Brain mechanisms of reversible symbolic reference: a potential singularity of the human brain", van Kerkoerle et al address the timely question of whether non-human primates (rhesus macaques) possess the ability for reverse symbolic inference as observed in humans. Through an fMRI experiment in both humans and monkeys, they analyzed the bold signal in both species while observing audio-visual and visual-visual stimuli pairs that had been previously learned in a particular direction. Remarkably, the findings pertaining to humans revealed that a broad brain network exhibited increased activity in response to surprises occurring in both the learned and reverse directions. Conversely, in monkeys, the study uncovered that the brain activity within sensory areas only responded to the learned direction but failed to exhibit any discernible response to the reverse direction. These compelling results indicate that the capacity for reversible symbolic inference may be unique to humans. 

      In general, the manuscript is skillfully crafted and highly accessible to readers. The experimental design exhibits originality, and the analyses are tailored to effectively address the central question at hand.

      Although the first experiment raised a number of methodological inquiries, the subsequent second experiment thoroughly addresses these concerns and effectively replicates the initial findings, thereby significantly strengthening the overall study. Overall, this article is already of high quality and brings new insight into human cognition. 

      We sincerely thank the reviewer for the positive comments. 

      I identified three weaknesses in the manuscript: 

      - One major issue in the study is the absence of significant results in monkeys. Indeed, authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). 

      First, we disagree with the statement about “absence of significant results in monkeys”. We do report a significant interaction which, as noted by the referee, is a crucial positive finding.

      Second, we performed the suggested analysis for experiment 2, using the bilateral ROIs of the putative monkey MDN from previous literature (Mitchell, et al. 2016), which are based on the human study by Fedorenko et al. (PNAS, 2013). 

      Author response table 1.

      Congruity effect for monkeys in Experiment 2 within the ROIs of the MDN (n=3). Significance was assessed with one-sided one-sample t-tests.

      As can be seen, none of the regions within the monkey MDN showed an FDR-corrected significant difference or interaction. Although the absence of a canonical congruity effect makes it difficult to draw strong conclusions, it did approach significance at an uncorrected level in the lateral frontal posterior region, similar to  the large prefrontal effect we report in Figures 4 and 5. Furthermore, for the reversed congruity effect there was never even a trend at the uncorrected level, and the crucial interaction of canonicity and congruity again approached significance in the lateral prefrontal cortex.  

      We also performed an ANOVA  in the human participants of the VV experiment on the average betas across the 7 different fronto-parietal ROIs as used by Mitchell et al to define their equivalent to the monkey brain (Fig 1a, right in Mitchell et al. 2016) with congruity, canonicity and hemisphere (except for the anterior cingulate which is a bilateral ROI) as within-subject factors. We confirmed the results presented in the manuscript (Figure 4C) with notably no significant interaction between congruity and canonicity in any of these ROIs (all F-values (except insula) <1). A significant main effect of congruity was observed in the posterior middle frontal gyrus (MFG) and inferior precentral sulcus at the FDR corrected level. Analyses restricted to the canonical trials found a congruity effect in these two regions plus the anterior insula and anterior cingulate/presupplementary motor area, whereas no ROIs were significant at a FDR corrected level for reverse trials. There was a trend in the middle MFG and inferior precentral region for reversed trials. Crucially, there was not even a trend for the interaction between congruity and canonicity at the uncorrected level. The difference in the effect size between the canonical and reversed direction can therefore be explained by the larger statistical power due to the larger number of congruent trials (70%, versus 10% for the other trial conditions), not by a significant effect by the canonical and the reversed direction. 

      Author response table 2.

      Congruity effect for humans in Experiment 2 within the ROIs of the MDN (n=23).

      These results support our contention that the type of learning of the stimulus pairs was very different in the two species. We thank the reviewer for suggesting these relevant additional analyses.

      - While the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. 

      We agree that this is an interesting question, although it is also very open-ended. For instance, we could report each subjects’ individual whole-brain results, but this would take too much space (and the interested reader will be able to do so from the data that we make available as part of this publication). As a step in this direction, we provide below a figure showing the individual congruity effects, separately for each experiment and for each ROI of table 5, and for each of the 52 participants for whom an fMRI localizer was available:

      Author response image 1.

      Difference in mean betas between congruent and incongruent conditions in a-priori linguistic and mathematical ROIs (see definition and analyses in Table 5) in both experiments (experiment 1 = AV, left panel; experiment 2= VV, right panel). Dots correspond to participants (red: canonical trials, green reversed trials).The boxplot notch is located at the median and the lower and upper box hinges at the 25th and 75th centiles. Whiskers extend to 1.5 inter-quartile ranges on either side of the hinges. ROIs are ranked by the median of the Incongruent-Congruent difference across canonical and reversed order, within a given experiment. For purposes of comparison between the two experiments, we have underlined with colors the top-five common ROIs between the two experiments. N.s.: non-significant congruity effect (p>0.05)

      Several regions show a rather consistent difference across subjects (see, for instance, the posterior STS in experiment 1, left panel). Overall, only 3 of the 52 participants did not show any beta superior to 2 in canonical or reversed in any ROIs. The consistency is quite striking, given the limited number of test trials (in total only 16 incongruent trials per direction per participant), and the fact that these ROIs were selected for their responses to spoken or written  sentences, as part of a subsidiary task quite different from the main task.

      - Some details are missing in the methods.  

      Thank you for these comments, we reply to them point-by-point below.

      Reviewer #3 (Public Review): 

      This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their finding to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human. 

      This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing. 

      We thank the reviewer for the careful summary of the manuscript, and the positive comments.

      Methods - Design issues: 

      The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless). 

      Thank you for this comment. 

      Indeed, for experiment 1, the amount of training and testing was not equal for the humans and monkeys, as also mentioned by reviewer 2. We now describe in more detail how many training and imaging days we used for each experiment and each species, as well as the number of blocks per day and the number of trials per block (see lines 572-577). We also added the information on the amount of training receives to all of the legends of the Tables.

      We are sorry for giving the impression that we trained until the monkeys learned this. This was not the case. Based on previous literature, we actually anticipated that the short training would not be sufficient, and therefore planned additional training in advance. Specifically, Meyer & Olson (2011) had observed pair learning in the inferior temporal cortex of macaque monkeys after 816 exposures per pair. This is similar to the additional training we gave, about 80 blocks with 12 trials per pair per block. This is  now explained in more detail (lines 577-580).

      Furthermore, we strongly disagree with the pejorative term p-hacking. The aim of the experiment was not to show a congruency effect in the canonical direction in monkeys, but to track and compare their behavior in the same paradigm as that of humans for the reverse direction. It would have been unwise to stop after human-identical training and only show that humans learn better, which is a given. Instead, we looked at brain activations at both times, at the end of human-identical training and when the monkeys had learned the pairs in the canonical direction. 

      Finally, in experiment 2, monkeys were tested after the same 3 days of training as humans. We wrote: “Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations” (lines 252-253).

      (2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front. 

      Thank you for raising this important point. We already had a section on “limitations” in the manuscript, which we now extended (line 478-485). Indeed, this study is following a previous study in 5-month-old infants using EEG, in which we already showed that after learning associations between labels and categories, infants spontaneously generalize learning to the reversed pairs after a short learning period in the lab (Kabdebon et al, PNAS, 2019). We also cited preliminary results of the same paradigm as used in the current study but using EEG in 4-month-old infants (Ekramnia and Dehaene-Lambertz, 2019), where we replicated the results obtained by Kabdebon et al. 2019 showing that preverbal infants spontaneously generalize learning to the reversed pairs. 

      Functional MRI in awake infants remains a challenge at this age (but see our own work, DehaeneLambertz et al, Science, 2002), especially because the experimental design means only a few trials in the conditions of interest (10%) and thus a long experimental duration that exceed infants’ quietness and attentional capacities in the noisy MRI environment. (We discuss this on lines 493-496.)

      (3) Humans have big advantages in processing and discriminating spoken stimuli and associating them with visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still, it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences. 

      As the reviewer wrote, we deliberately performed Experiment 2 with visual shapes to control for various factors that might have explained the monkeys' failure in Experiment 1. 

      (4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do? 

      The referee is correct: our use of the word “reciprocally” was improper (although see Amalric et Dehaene, 2016 for significant differences in both directions when non-mathematical sentences concern specific knowledge). We changed the formulation to clarify this as follows: “In these ROIs, we recovered the subject-specific coordinates of each participant’s 10% best voxels in the following comparisons: sentences vs rest for the 6 language Rois ; reading vs listening for the VWFA ; and numerical vs non-numerical sentences for the 8 mathematical ROIs.” (lines 678-680).

      Methods - Analysis issues: 

      (5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative. 

      It is not clear to us which result the reviewer is referring to. In Tables 1-4, we report the values that we found significant in the whole brain analysis, we do not report additional statistical tests for this data. For Table 5, the subject-specific voxels were identified through a separate localizer experiment, which was designed to pinpoint the precise activation areas for each subject in the domains of oral and written language-processing and math. Subsequently, we compared the activation at these voxel locations across different conditions of the main experiment. Thus, the two datasets were distinct, and there was no double dipping. In both interpretations of the comment, we therefore disagree with the reviewer.

      Framing: 

      (6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions. 

      First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights into how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy. 

      We agree with the referee that the term “mechanism” is ambiguous and, for systems neuroscientists, may suggest more than we are able to do here with functional MRI. We changed the title to “Brain areas for reversible symbolic reference, a potential singularity of the human brain”. This title better describes our specific contribution: mapping out the areas involved in reversibility in humans, and showing that they do not seem to respond similarly in macaque monkeys.

      Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly, if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task. 

      As explained in the introduction, the reversibility test addressed a very minimal core property of symbolic reference. There cannot be a symbol if its attachment doesn’t operate in both directions. Thus, this property is necessary – but we agree that it is not sufficient. Indeed, more tests are needed to establish whether and how the learned symbols are used in further downstream compositional tasks (as discussed in our recent TICS papers, Dehaene et al. 2022). We added a sentence in the introduction to acknowledge this fact:

      “Such reversibility is a core and necessary property of symbols, although we readily acknowledge that it is not sufficient, since genuine symbols present additional referential and compositional properties that will not be tested in the present work.” (lines 89-92).

      Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above). 

      We have published an extensive review including a description of our use of the term “singularity” (Dehaene et al., TICS 2022). Here is a short except: “Humans are different even in domains such as drawing and geometry that do not involve communicative language. We refer to this observation using the term “human cognitive singularity”, the word singularity being used here in its standard meaning (the condition of being singular) as well as its mathematical sense (a point of sudden change). Hominization was certainly a singularity in biological evolution, so much so that it opened up a new geological age (the Anthropocene). Even if evolution works by small continuous change (and sometimes it doesn’t [4]), it led to a drastic cognitive change in humans.”

      We find the referee’s use of the pejorative term ”insinuate” quite inappropriate. From the title on, we are quite nuanced and refer only to a “potential singularity”. Furthermore, as noted above, we explicitly mention in the discussion the limitations of our study, and in particular the fact that only a single non-human species was tested (see lines 486-493). We are working hard to get chimpanzee data, but this is remarkably difficult for us, and we hope that our paper will incite other groups to collect more evidence on this point.

      (7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and DehaeneLambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").

      We fail to understand the putative circularity that the referee sees in our introduction. We urge him/her to re-read it, and hope that, with the changes that we introduced, it does boil down to his/her summary, i.e. “What is uniquely human? One possibility is spontaneous reversal of temporal associations."

      Reviewer #1 (Recommendations For The Authors): 

      In general, the manuscript was very clear, easy to read, and compelling. I would recommend the authors carefully check the text for consistency and minor typos. For example: 

      The sample size for the monkeys kept changing throughout the paper. E.g., Experiment 1: n = 2 (line 149); n = 3 (line 205).  

      Thank you for catching this error, we corrected it. The number of animals was indeed 2  for experiment 1, and 3 for experiment 2. (Animals JD and YS participated in experiment 1 and JD, JC and DN in experiment 2. So only JD participated in both experiments.)

      Similarly, the number of stimulus pairs is reported inconsistently (4 on line 149, 5 pairs later in the paper). 

      We’re sorry that this was unclear. We used 5 sets of 4 audio-visual pairs each. We now clarify this, on line 157 and on lines 514-516.

      At least one case of p>0.0001, rather than p < 0.0001 (I assume). 

      Thank you once again, we now corrected this.

      Reviewer #2 (Recommendations For The Authors): 

      One major issue in the study is the absence of significant results in monkeys. Indeed, the authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). In other words: what are the statistics for the MDN regarding congruity, canonicity, and interaction in both species? Since the authors have already performed this type of analysis for language and Math ROIs (table 5), it should be relatively easy for them to extend it to the MDN. Demonstrating that results in monkeys are far from significant could further convince the reader. 

      Furthermore, while the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. Specifically, it would be valuable to describe the proportion of human participants in which the effects of congruency, canonicity, and their interaction are significant. Additionally, stating the variability of the F-values for each effect would provide reassurance to the reader regarding the distinctiveness of humans in comparison to monkeys. Low variability in the results would serve to mitigate concerns that the observed disparity is merely a consequence of testing a unique subset of monkeys, which may differ from the general population. Indeed, this would be a greater support to the notion that the dissimilarity stems from a genuine distinction between the two species. 

      We responded to both of these points above.

      In terms of methods, details are missing: 

      - How many trials of each condition are there exactly? (10% of 44 trials is 4.4) : 

      We wrote: “In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials).”(See lines 596-600.)

      Thus, each block comprised 4 initial trials, 28 canonical congruent trials, 4 canonical incongruent, 4 reverse congruent and 4 reverse incongruent trials, i.e. 4+28+3x4=40 trials.

      - How long is one trial? 

      As written in the method section: “In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds” i.e. 700+100+700=1500, plus 3 to 4.75 seconds of blank between the trials (see lines 531-533).

      - How are the stimulus presentations jittered? 

      See : “The pairs were separated by a variable inter-trial-interval randomly chosen among eight different durations between 3 and 4.75 seconds (step=250 ms). The series of 8 intervals was randomized again each time it was completed.”(lines 533-535).

      - What is the statistical power achieved for humans? And for monkeys? 

      We know of no standard way to define power for fMRI experiments. Power will depend on so many parameters, including the fMRI signal-to-noise ratio, the attention of the subject, the areas being considered, the type of analysis (whole-brain versus ROIs), etc.

      - Videos are mentioned in the methods, is it the image and sound? It is not clear. 

      We’re sorry that it was unclear. Video’s were only used for the training of the human subjects. We now corrected this in the method section (lines 552-554).

      Reviewer #3 (Recommendations For The Authors): 

      The main recommendations are to adjust the framing (making it less bold and more connected to the empirical evidence) and to ensure independence in the statistical analyses of the fMRI data. 

      See our replies to the reviewer’s comments on “Framing” above. In particular, we changed the title of the paper from “Brain mechanisms of reversible symbolic reference” to “Brain areas for reversible symbolic reference”.

      References cited in this response

      Dehaene, S., Al Roumi, F., Lakretz, Y., Planton, S., & Sablé-Meyer, M. (2022). Symbols and mental programs : A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9), 751‑766. https://doi.org/10.1016/j.tics.2022.06.010.

      Dehaene-Lambertz, Ghislaine, Stanislas Dehaene, et Lucie Hertz-Pannier. Functional Neuroimaging of Speech Perception in Infants. Science 298, no 5600 (2002): 2013-15. https://doi.org/10.1126/science.1077066.

      Ekramnia M, Dehaene-Lambertz G. 2019. Investigating bidirectionality of associations in young infants as an approach to the symbolic system. Presented at the CogSci. p. 3449.

      Fedorenko E, Duncan J, Kanwisher N (2013) Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A 110:16616-16621.

      Kabdebon, Claire, et Ghislaine Dehaene-Lambertz. « Symbolic Labeling in 5-Month-Old Human Infants ». Proceedings of the National Academy of Sciences 116, no 12 (2019): 5805-10. https://doi.org/10.1073/pnas.1809144116.

      Mitchell, D. J., Bell, A. H., Buckley, M. J., Mitchell, A. S., Sallet, J., & Duncan, J. (2016). A Putative Multiple-Demand System in the Macaque Brain. Journal of Neuroscience, 36(33), 8574‑8585. https://doi.org/10.1523/JNEUROSCI.0810-16.2016

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, species such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for the bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These elucidations will be incorporated into our revised manuscript.

      Nevertheless, the term “core” in the manuscript title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we will replace the term with an expression more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species identified in the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions, such as the inoculation of specific species of wild flies onto fresh bananas, would be needed. Nevertheless, the microbes may potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources for microbial provenance also merit consideration. For example, microbial entities may be inherently present in unfermented bananas through the infiltration of peel injuries (lines 1141-1142 of the original manuscript). In addition, they could be introduced by insects other than flies, given that both rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. These possibilities will be incorporated into the 'MATERIALS AND METHODS' and 'DISCUSSION' sections of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. After adult flies were caught in each trap, we identified the species as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We will provide these descriptions in MATERIALS AND METHODS and DISCUSSION.

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of them showed low expression levels (refer to Author response table 1, which exhibits the RNA-seq data of the yeast-fed larvae; similar expression profiles were observed in the bacteria-fed larvae). While a subset of genes exhibited significantly elevated expression in the non-supportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than due to distinct nutritional environments. Therefore, it would be difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences between species within bacteria or fungi, or between bacteria and fungi. For example, the gene expression profiles of larvae fed on the various supporting microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on distinct non-supporting microbes.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisk in the label highlights a sample in a “LAB” condition (Leuconostoc mesenteroides), which clustered separately from the other “LAB” samples. Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; S. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; S. cer, S. cerevisiae BY4741 strain.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria, without any enrichment for specialized gene functions. Thus, it is challenging to discuss the potential differential impacts, if any, of yeast and bacteria on larval growth.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with those in food samples. We found that adult flies and early-stage food sources, as well as larvae and late-stage food sources, harbor similar microbial species (Figure 1F). Additionally, previous examinations of the gut microbiota in wild adult flies have identified microbial species or taxa congruent with those we isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we will cite the study by Dodge et al., 2023 in our revised manuscript and discuss the possibility that predominant microbes in adult flies may show a propensity for colonization.

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut Imd signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). According to our RNA-seq data, it seems unlikely that the supportive microbes stimulate the signaling pathway. Figures contained in Author response image 2 provide the statistical comparisons of expression levels for seven protease genes between the supportive and the non-supportive conditions. These genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response image 2A; Le mes + A. ori in Author response image 2B). Rather, they exhibited a tendency to be upregulated under the non-supportive microbes (St. bac or Pi. klu in Author response image 2A; La. pla in Author response image 2B).

      Author response image 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We will add this point in the DISCUSSION of the revised manuscript.

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While recognizing the importance of comprehensive mechanistic analysis, this study includes all experimentally feasible data. Elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be the subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second instar. This observation suggests that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We have discussed the hypothetical interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 402-405): LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates and the co-inoculating LAB mutant strains defective in lactate production with AABs, to assess both larval growth and continuous larval association with AABs. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations and will include additional descriptions regarding these aspects in the DISCUSSION section.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This time frame was determined by pilot experiments. A shorter collection time resulted in a greater likelihood of obtaining no-fly traps, whereas a longer collection time caused larval overcrowding, as well as adults’ deaths from drowning in the liquid seeping out of fruits. These procedural details will be delineated in the MATERIALS AND METHODS section of the revised manuscript.

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We will include these possibilities in the DISCUSSION section of the revised manuscript.

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestions regarding shifts in the adult microbiome. We plan to include in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages and that microbes obtained after eclosion could potentially form the adult gut microbiota.

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      We appreciate the reviewer's advice. Detailed methods of the metabolomic experiments will be included in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The modeling approaches are very sophisticated, and clearly demonstrate the selective nature of acute ketamine to reduce the impact of trial losses on subsequent performance, relative to neutral or gain outcomes. The authors then, not unreasonably, suggest that this effect is important in the context of the negative bias in interpreting events that is prominent in depression, in that if ketamine reduces the ability of negative outcomes to alter behavior, this may be a mechanism for its rapid acting antidepressant effects.

      However, there is a very strong assumption in this regard, as shown by the first sentence of the discussion which implies this is a systematic study of ketamine's acute antidepressant effects. In actuality, this is a study of the acute effects of ketamine on reinforcement learning (RL) modeled parameters. A primary concern here is that an effect presented as a "robust antidepressant-like behavioral effect" should be more enduring than just an alteration during the acute administration. As it is, the link to an "anti-depressant effect" is based solely on the selective effects on losses. This is not to say this is not an interesting observation, worthy of exploration. It is noted that a similar lack of enduring effects on outcome evaluation is observed in humans, as shown in supplemental fig. S4, but there is not accompanying citation for the human work.

      We agree with the reviewer that the way we linked the study results to ketamine’s antidepressant action can be misleading and based on a rather strong assumption which was not systematically tested in the study. We made the following changes to the manuscript:

      (1) These results constitute a rare report of a robust antidepressant-like behavioral effect produced by therapeutic doses of ketamine during acute phase (<1 hour) after injection (Introduction, 3rd paragraph, line 8-9 in the original manuscript).

      Changed to: These results constitute a rare report of an acute effect of therapeutic dose of ketamine on the processing of affectively negative events during dynamic decision-making.

      (2) We clarified in the Discussion that our study is to gain insights into, but not a systematic investigation of ketamine’s antidepressant action as follows:

      (2.1) A sentence was added (1st paragraph of Discussion): Using a token-based decision task and extensive computational modeling, we examined the behavioral modulation induced by therapeutic doses of ketamine to gain insights into possible early signs of ketamine’s antidepressant activity.

      (2.2) Consistent with the findings from humans, ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4) (Discussion, 2nd paragraph, line 6-7 in the original manuscript).

      Changed to: While ketamine’s antidepressant effect is reported to be sustained over a week of period (5), ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4). This discrepancy might be attributable to the possible differences in the state of brain network between healthy subjects and those with depression as well as the type of measures taken to assess ketamine’s effect.

      (2.3) A sentence was added (Discussion, last sentence of the 2nd paragraph) : Nevertheless, systematic studies are required to understand whether the reduced aversiveness to loss in our task might share the same mechanisms that underlie ketamine’s antidepressant action.

      One question that comes to mind in terms of the selectivity observed is whether similar work has been done to examine the acute effects of any other drugs. If ketamine is unique in this regard, that would be quite interesting.

      We think this is an interesting idea. However, comparing ketamine’s effect to that of other drugs is not the scope of the current study. We hope that we will be able to answer this question with future studies.

      Reviewer #2 (Public Review):

      Oemisch and Seo set out to examine the effects of low-dose ketamine on reinforcement learning, with the idea that alterations in reinforcement learning and/or motivation might inform our understanding of what alterations co-occur with potential antidepressant effects. Macaques performed a reinforced/punished matching pennies task while under effects of saline or ketamine administration and the data were fit to a series of reinforcement learning models to determine which model described behavior under saline most closely and then what parameters of this best-fitting model were altered by ketamine. They found a mixed effect, with two out of three macaques primarily exhibiting an effect of ketamine on processing of losses and one out of three macaques exhibiting an effect of ketamine on processing of losses and perseveration. They found that these effects of ketamine appeared to be dissociable from the nystagmus effects of the ketamine.

      The findings are novel and the data suggesting that ketamine is primarily having its effects on processing of losses (under the procedures used) are solid. However, it is unclear whether the connection between processing of losses and the antidepressant effects of ketamine is justified and the current findings may be more useful for those studying reinforcement learning than those studying depression and antidepressant effects. In addition, the co-occurrence of different behavioral procedures with different patterns of ketamine effects, with one macaque tested with different parameters than the other two exhibiting effects of ketamine that were best fit with a different model than the other two macaques, suggests that there may be difficulty in generalizing these findings to reinforcement learning more generally.

      (1) First, the authors should be more explicit and careful in the connection they are trying to make about the link between loss processing and depression. The authors call their effect a "robust antidepressant-like behavioral effect" but there are no references to support this or discussion of how the altered loss processing would relate directly to the antidepressant effects.

      We agree with the reviewer’s point on the way we made the connection between the study results and ketamine’s antidepressant action. This concern overlaps with the reviewer #1’s concern. Please refer to our response 2, 2-1, 2-2 and 2-3.

      (2) It appears that the monkey P was given smaller rewards and punishers than the other two monkeys and this monkey had an effect of ketamine on perseveration that was not observed in the other two monkeys. Is this believed to be due to the different task, or was this animal given a different task because of some behavioral differences that preceded the experiment? The authors should also discuss what these differences may mean for the generality of their findings. For example, might there be some set of parameters where ketamine would only alter perseveration and not processing of losses?

      Although the best-fitting ketamine model for monkey P includes an additional element – perseveration, we believe that monkey P’s baseline behavior and ketamine’s effect are not significantly different from the other two monkeys for the following reasons.

      First, monkey P was the first animal that we tested ketamine’s effect, and therefore we aimed to match the other two monkeys’ baseline behavior similar to monkey P’s behavior in order to reduce variability in ketamine’s effect potentially attributable to the difference in baseline behavior before pharmacological manipulation. We had to adjust the payoff matrix for the subsequent animals (Y and B) because these monkeys were more sensitive to loss, and seldom chose “risky” target (yielding loss). In order to make the other two monkeys’ behavior similar to that of monkey P, we adjusted the asymmetry between the risky and the safe target in the way that loss (neutral) outcome occurred from the safe (risky) target as well. Eventually, this adjustment made the baseline behavior similar across all three monkeys. The goal of the study was to reliably measure the ketamine’s effect, and not to study individual differences that can naturally occur with the same task parameters. Therefore, we believe that the adjustment of payoff matrix helped to reliably detect ketamine’s effect starting from the common baseline behavior.

      Second, the best-fitting model for monkey P (K-model 7) and that for the other two monkeys (K-model 4) make very similar predictions both qualitatively and quantitatively as are seen in the revised Figure 4. The parameters for outcome values estimated from these two models in monkey P are very similar as is seen in the revised Table 3. In addition, the difference in BIC between the model which includes only perseveration modulation (K-model 6) and the model incorporating outcome value modulation as well (K-model 7) is 441, whereas the difference in BIC between K-model 7 and the model that includes only outcome value modulation (K-model 4) is as small as 4. These BIC results indicate that the variability explained by ketamine’s modulation of outcome evaluation is remarkably larger that that explained by its modulation of perseveration in monkey P.

      Therefore, we conclude that ketamine’s effect was not significantly different between monkey P and the other two monkeys. We clarified this in the revised manuscript by adding the following paragraph in the Result section:

      “Unlike monkey Y and B, the best-fitting model for monkey P indicated that ketamine increased overall tendency to switch choice in addition to outcome-dependent modulation of outcome evaluation. However, BIC differed only slightly (dBIC = 3.99) between the best-fitting (K-model 7) and the second-best model (K-model 4) and the model predictions for choice behavior were very similar both qualitatively and quantitatively (Table 3, Figure 4). We conclude that the behavioral effects of ketamine were consistent across all three monkeys.”

      (3) The authors should discuss whether the plasma ketamine levels they observed are similar to those seen with rapid antidepressant ketamine or are higher or lower.

      We added a sentence in the first paragraph of the Result section as follows with a reference.

      “Plasma concentration and its time course over 60 minutes were also comparable to those measured after 0.5mg/kg in human subjects (35).”

      (35) Zarate CA, Brutsche N, Laje G, Luckenbaugh DA, Venkata SLV, Ramamoorthy A, et al (2012): Relationship of ketamine’s plasma metabolites with response, diagnosis, and side effects in major depression. Biol Psychiatry, 72: 331-338.

      (4) For Figure 4 or S3, the authors should show the data fitted to model 7, which was the best for one of the animals.

      We added the parameters and model predictions from both K-model 7 and K-model 4 for monkey P to help comparison between two models in Table 3, and Figure 4. Revised Table 3 and Figure 4 are as follows:

      Author response table 1.

      Maximum likelihood parameter estimates of the best models for saline and ketamine sessions.

      In all three animals, the model incorporating valence-dependent change in outcome evaluation best fit the choice data from ketamine sessions with (K-model 7 in the parenthesis, P) or without (K-model 4, P and Y/B) additional change in the tendency of choice perseveration (Figure 3, Table 3).

      Author response image 1.

      ketamine-induced behavioral modulation simulated with differential forgetting model (for saline session) and best-fitting K-model (for ketamine session).