10,000 Matching Annotations
  1. Mar 2025
    1. Reviewer #3 (Public review):

      In the report entitled "CXXC-finger protein 1 associates with FOXP3 to stabilize homeostasis and suppressive functions of regulatory T cells", the authors demonstrated that Cxxc1-deletion in Treg cells leads to the development of severe inflammatory disease with impaired suppressive function. Mechanistically, CXXC1 interacts with Foxp3 and regulates the expression of key Treg signature genes by modulating H3K4me3 deposition. Their findings are interesting and significant.

      Comments on revisions:

      In the revised manuscript, the authors have responded well to all the concerns reviewers raised. The manuscript has further improved.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work investigated the role of CXXC-finger protein 1 (CXXC1) in regulatory T cells. CXXC1-bound genomic regions largely overlap with Foxp3-bound regions and regions with H3K4me3 histone modifications in Treg cells. CXXC1 and Foxp3 interact with each other, as shown by co-immunoprecipitation. Mice with Treg-specific CXXC1 knockout (KO) succumb to lymphoproliferative diseases between 3 to 4 weeks of age, similar to Foxp3 KO mice. Although the immune suppression function of CXXC1 KO Treg is comparable to WT Treg in an in vitro assay, these KO Tregs failed to suppress autoimmune diseases such as EAE and colitis in Treg transfer models in vivo. This is partly due to the diminished survival of the KO Tregs after transfer. CXXC1 KO Tregs do not have an altered DNA methylation pattern; instead, they display weakened H3K4me3 modifications within the broad H3K4me3 domains, which contain a set of Treg signature genes. These results suggest that CXXC1 and Foxp3 collaborate to regulate Treg homeostasis and function by promoting Treg signature gene expression through maintaining H3K4me3 modification.

      Strengths:

      Epigenetic regulation of Treg cells has been a constantly evolving area of research. The current study revealed CXXC1 as a previously unidentified epigenetic regulator of Tregs. The strong phenotype of the knockout mouse supports the critical role CXXC1 plays in Treg cells. Mechanistically, the link between CXXC1 and the maintenance of broad H3K4me3 domains is also a novel finding.

      Weaknesses:

      (1) It is not clear why the authors chose to compare H3K4me3 and H3K27me3 enriched genomic regions. There are other histone modifications associated with transcription activation or repression. Please provide justification.

      Thank you for highlighting this important point. We chose to focus on H3K4me3 and H3K27me3 enriched genomic regions because these histone modifications are well-characterized markers of transcriptional activation and repression, respectively. H3K4me3 is predominantly associated with active promoters, while H3K27me3 marks repressed chromatin states, particularly in the context of gene regulation at promoters. This duality provides a robust framework for investigating the balance between transcriptional activation and repression in Treg cells. While histone acetylation, such as H3K27ac, is linked to enhancer activity and transcriptional elongation, our focus was on promoter-level regulation, where H3K4me3 and H3K27me3 are most relevant. Although other histone modifications could provide additional insights, we chose to focus on these two to maintain clarity and feasibility in our analysis. We have revised the text accordingly; please refer to Page 18, lines 353-356.

      (2) It is not clear what separates Clusters 1 and 3 in Figure 1C. It seems they share the same features.

      We apologize for not clarifying these clusters clearly. Cluster 1 and 3 are both H3K4me3 only group, with H3K4me3 enrichment and gene expression levels being higher in Cluster 1. At first, we divided the promoters into four categories because we wanted to try to classify them into four categories: H3K4me3 only, H3K27me3 only, H3K4me3-H3K27me3 co-occupied, and None. However, in actual classification, we could not distinguish H3K4me3-H3K27me3 co-occupied group. Instead, we had two categories of H3K4me3 only, with cluster 1 having a higher enrichment level for H3K4me3 and gene expression levels.

      (3) The claim, "These observations support the hypothesis that FOXP3 primarily functions as an activator by promoting H3K4me3 deposition in Treg cells." (line 344), seems to be a bit of an overstatement. Foxp3 certainly can promote transcription in ways other than promoting H3K3me3 deposition, and it also can repress gene transcription without affecting H3K27me3 deposition. Therefore, it is not justified to claim that promoting H3K4me3 deposition is Foxp3's primary function.

      Thank you for your insightful feedback. We agree that the statement in line 344 may have overstated the role of FOXP3 in promoting H3K4me3 deposition as its primary function. As you pointed out, FOXP3 is indeed a multifaceted transcription factor that regulates gene expression through various mechanisms. It can promote transcription independent of H3K4me3 deposition, as well as repress transcription without directly influencing H3K27me3 levels.

      To more accurately reflect the broader regulatory functions of FOXP3, we have revised the manuscript. The updated text (Page 19, lines 385-388) now reads:

      "These findings collectively support the conclusion that FOXP3 contributes to transcriptional activation in Treg cells by promoting H3K4me3 deposition at target loci, while also regulating gene expression directly or indirectly through other epigenetic modifications.

      (4) For the in vitro suppression assay in Figure S4C, and the Treg transfer EAE and colitis experiments in Figure 4, the Tregs should be isolated from Cxxc1 fl/fl x Foxp3 cre/wt female heterozygous mice instead of Cxxc1 fl/fl x Foxp3 cre/cre (or cre/Y) mice. Tregs from the homozygous KO mice are already activated by the lymphoproliferative environment and could have vastly different gene expression patterns and homeostatic features compared to resting Tregs. Therefore, it's not a fair comparison between these activated KO Tregs and resting WT Tregs.

      Thank you for raising this insightful point regarding the potential activation status of Treg cells in homozygous knockout mice. To address this concern, we performed additional experiments using Treg cells isolated from Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/fl</sup> (hereafter referred to as “het-KO”) female mice and their littermate controls, Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/+</sup> (referred to as “het-WT”) mice.

      The results of these new experiments are now included in the manuscript (Page25, lines 507–509, Figure 6E and Figure S6A-E):

      (1) In the in vitro suppression assay, Treg cells from het-KO mice exhibited reduced suppressive function compared to het-WT Treg cells. This finding underscores the intrinsic defect in Treg cells suppressive capacity attributable to the loss of one Cxxc1 allele.

      (2) In the experimental autoimmune encephalomyelitis (EAE) model, Treg cells isolated from het-KO mice also demonstrated impaired suppressive function.

      (5) The manuscript didn't provide a potential mechanism for how CXXC1 strengthens broad H3K4me3-modified genomic regions. The authors should perform Foxp3 ChIP-seq or Cut-n-Taq with WT and Cxxc1 cKO Tregs to determine whether CXXC1 deletion changes Foxp3's binding pattern in Treg cells.

      Thank you for raising this important point. To address your suggestion, we performed CUT&Tag experiments and found that Cxxc1 deletion does not alter FOXP3 binding patterns in Treg cells. Most FOXP3-bound regions in WT Treg cells were similarly enriched in KO Treg cells, indicating that Cxxc1 deficiency does not impair FOXP3’s DNA-binding ability. These results have been added to the revised manuscript (Page 28, lines 567-575, Figure S8A-B) and are further discussed in the Discussion (Pages 28-29, lines 581-587).

      Reviewer #2 (Public review):

      FOXP3 has been known to form diverse complexes with different transcription factors and enzymes responsible for epigenetic modifications, but how extracellular signals timely regulate FOXP3 complex dynamics remains to be fully understood. Histone H3K4 tri-methylation (H3K4me3) and CXXC finger protein 1 (CXXC1), which is required to regulate H3K4me3, also remain to be fully investigated in Treg cells. Here, Meng et al. performed a comprehensive analysis of H3K4me3 CUT&Tag assay on Treg cells and a comparison of the dataset with the FOXP3 ChIP-seq dataset revealed that FOXP3 could facilitate the regulation of target genes by promoting H3K4me3 deposition.

      Moreover, CXXC1-FOXP3 interaction is required for this regulation. They found that specific knockdown of Cxxc1 in Treg leads to spontaneous severe multi-organ inflammation in mice and that Cxxc1-deficient Treg exhibits enhanced activation and impaired suppression activity. In addition, they have also found that CXXC1 shares several binding sites with FOXP3 especially on Treg signature gene loci, which are necessary for maintaining homeostasis and identity of Treg cells.

      The findings of the current study are pretty intriguing, and it would be great if the authors could fully address the following comments to support these interesting findings.

      Major points:

      (1) There is insufficient evidence in the first part of the Results to support the conclusion that "FOXP3 functions as an activator by promoting H3K4Me3 deposition in Treg cells". The authors should compare the results for H3K4Me3 in FOXP3-negative conventional T cells to demonstrate that at these promoter loci, FOXP3 promotes H3K4Me3 deposition.

      Thank you for this insightful comment. We have already performed additional experiments comparing H3K4Me3 levels between FOXP3-positive Treg cells and FOXP3-negative conventional T cells (Tconv). Please refer to Pages 18, lines 361-368, and Figure 1C and Figure S1C for the results. Our results show that H3K4Me3 abundance is higher at many Treg-specific gene loci in Treg cells compared to Tconv cells. This supports our conclusion that FOXP3 promotes H3K4Me3 deposition at these loci.

      (2) In Figure 3 F&G, the activation status and IFNγ production should be analyzed in Treg cells and Tconv cells separately rather than in total CD4+ T cells. Moreover, are there changes in autoantibodies and IgG and IgE levels in the serum of cKO mice?

      Thank you for your valuable suggestions. In response to your comment, we reanalyzed the data in Figures 3F and 3G to assess the activation status and IFN-γ production in Tconv cells. The updated analysis revealed that Cxxc1 deletion in Treg cells leads to increased activation and IFN-γ production in Tconv cells. Additionally, we corrected the analysis of IL-17A and IL-4 expression, which were upregulated in Tconv cells. These updated results are now included in the revised manuscript (Page 21, lines 429-431, Figure 3I and Figure S3E-F).

      Additionally, we examined autoantibodies and immunoglobulin levels in the serum of Cxxc1 cKO mice. Our data show a significant increase in serum IgG levels, accompanied by elevated IgG autoantibodies, indicating heightened autoimmune responses. In contrast, serum IgE levels remained largely unchanged. The results are detailed in the revised manuscript (Page 21, lines 421-423, Figure 3E and Figure S3B).

      (3) Why did Cxxc1-deficient Treg cells not show impaired suppression than WT Treg during in vitro suppression assay, despite the reduced expression of Treg cell suppression assay -associated markers at the transcriptional level demonstrated in both scRNA-seq and bulk RNA-seq?

      Thank you for your thoughtful comment. The absence of impaired suppression in Cxxc1-deficient Treg cells from homozygous knockout (KO) mice during the in vitro suppression assay, despite the reduced expression of Treg-associated markers at the transcriptional level (as demonstrated by scRNA-seq), can likely be explained by the activated state of these Treg cells. In homozygous KO mice, Treg cells are already activated due to the lymphoproliferative environment, resulting in gene expression patterns that differ from those of resting Treg cells. This pre-activation may obscure the effect of Cxxc1 deletion on their suppressive function in vitro.

      To address this limitation, we used heterozygous Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/fl</sup> (het-KO) female mice, along with their littermate controls, Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/+</sup> (het-WT) mice. In these heterozygous mice, we observed an impairment in Treg cell suppressive function in vitro, which was accompanied by the downregulation of several key Treg-associated genes, as confirmed by RNA-Seq analysis.

      These updated findings, based on the use of het-KO mice, are now incorporated into the revised manuscript (Page 25, lines 507–509, Figure 6E).

      (4) Is there a disease in which Cxxc1 is expressed at low levels or absent in Treg cells? Is the same immunodeficiency phenotype present in patients as in mice?

      This is indeed a very meaningful and intriguing question, and we are equally interested in understanding whether low or absent Cxxc1 expression in Treg cells is associated with any human diseases. However, despite an extensive review of the literature and available data, we found no reports linking Cxxc1 deficiency in Treg cells to immunodeficiency phenotypes in patients comparable to those observed in mice.

      Reviewer #3 (Public review):

      In the report entitled "CXXC-finger protein 1 associates with FOXP3 to stabilize homeostasis and suppressive functions of regulatory T cells", the authors demonstrated that Cxxc1-deletion in Treg cells leads to the development of severe inflammatory disease with impaired suppressive function. Mechanistically, CXXC1 interacts with Foxp3 and regulates the expression of key Treg signature genes by modulating H3K4me3 deposition. Their findings are interesting and significant. However, there are several concerns regarding their analysis and conclusions.

      Major concerns:

      (1) Despite cKO mice showing an increase in Treg cells in the lymph nodes and Cxxc1-deficient Treg cells having normal suppressive function, the majority of cKO mice died within a month. What causes cKO mice to die from severe inflammation?

      Considering the results of Figures 4 and 5, a decrease in the Treg cell population due to their reduced proliferative capacity may be one of the causes. It would be informative to analyze the population of tissue Treg cells.

      Thank you for your insightful observation regarding the mortality of cKO mice despite increased Treg cells in lymph nodes and the normal suppressive function of Cxxc1-deficient Treg cells.

      As suggested, we hypothesized that the reduction of tissue-resident Treg cells could be a key factor. Additional experiments revealed a significant decrease in Treg cell populations in the small intestine lamina propria (LPL), liver, and lung of cKO mice. These findings highlight the critical role of tissue-resident Treg cells in preventing systemic inflammation.

      This reduction aligns with Figures 4 and 5, which demonstrate impaired proliferation and survival of Cxxc1-deficient Treg cells. Together, these defects lead to insufficient Treg populations in peripheral tissues, escalating localized inflammation into systemic immune dysregulation and early mortality.

      These additional results have been incorporated into the revised manuscript (Page21, lines 424-427, Figure 3G and Figure S3C).

      (2) In Figure 5B, scRNA-seq analysis indicated that the Mki67+ Treg subset is comparable between WT and Cxxc1-deficient Treg cells. On the other hand, FACS analysis demonstrated that Cxxc1-deficient Treg shows less Ki-67 expression compared to WT in Figure 5I. The authors should explain this discrepancy.

      Thank you for pointing out the apparent discrepancy between the scRNA-seq and FACS analyses regarding Ki-67 expression in Cxxc1-deficient Treg cells.

      In Figure 5B, the scRNA-seq analysis identified the Mki67+ Treg subset as comparable between WT and Cxxc1-deficient Treg cells. This finding reflects the overall proportion of cells expressing Mki67 transcripts within the Treg population. In contrast, the FACS analysis in Figure 5I specifically measures Ki-67 protein levels, revealing reduced expression in Cxxc1-deficient Treg cells compared to WT.

      To resolve this discrepancy, we performed additional analyses of the scRNA-seq data to directly compare the expression levels of Mki67 mRNA between WT and Cxxc1-deficient Treg cells. The results revealed a consistent reduction in Mki67 transcript levels in Cxxc1-deficient Treg cells, aligning with the reduced Ki-67 protein levels observed by FACS.

      These new analyses have been included in the revised manuscript (Author response image 1) to clarify this point and demonstrate consistency between the scRNA-seq and FACS data.

      Author response image 1.

      Violin plots displaying the expression levels of Mki67 in T<sub>reg</sub> cells from Foxp3<sup>cre</sup> and Foxp3<sup>cre</sup>Cxxc1<sup>fl/fl</sup> mice.

      In addition, the authors concluded on line 441 that CXXC1 plays a crucial role in maintaining Treg cell stability. However, there appears to be no data on Treg stability. Which data represent the Treg stability?

      Thank you for your valuable comment. We agree that our wording in line 441 may have been too conclusive. Our data focus on the impact of Cxxc1 deficiency on Treg cell homeostasis and transcriptional regulation, rather than directly measuring Treg cell stability. Specifically, the downregulation of Treg-specific suppressive genes and upregulation of pro-inflammatory markers suggest a shift in Treg cell function, which points to disrupted homeostasis rather than stability.

      We have revised the manuscript to clarify that CXXC1 plays a crucial role in maintaining Treg cell function and homeostasis, rather than stability (Page 24, lines 489-491).

      (3) The authors found that Cxxc1-deficient Treg cells exhibit weaker H3K4me3 signals compared to WT in Figure 7. This result suggests that Cxxc1 regulates H3K4me3 modification via H3K4 methyltransferases in Treg cells. The authors should clarify which H3K4 methyltransferases contribute to the modulation of H3K4me3 deposition by Cxxc1 in Treg cells.

      We appreciate the reviewer’s insightful comment regarding the role of H3K4 methyltransferases in regulating H3K4me3 deposition by CXXC1 in Treg cells.

      CXXC1 has been reported to function as a non-catalytic component of the Set1/COMPASS complex, which includes the H3K4 methyltransferases SETD1A and SETD1B—key enzymes responsible for H3K4 trimethylation(1-4). Based on these findings, we propose that CXXC1 modulates H3K4me3 levels in Treg cells by interacting with and stabilizing the activity of the Set1/COMPASS complex.

      These revisions are further discussed in the Discussion (Page 30-31, lines 624-632).

      Furthermore, it would be important to investigate whether Cxxc1-deletion alters Foxp3 binding to target genes.

      Thank you for raising this important point. To address your suggestion, we performed CUT&Tag experiments and found that Cxxc1 deletion does not alter FOXP3 binding patterns in Treg cells. Most FOXP3-bound regions in WT Treg cells were similarly enriched in KO Treg cells, indicating that Cxxc1 deficiency does not impair FOXP3’s DNA-binding ability. These results have been added to the revised manuscript (Page 28, lines 567-575, Figure S8A-B) and are further discussed in the Discussion (Pages 28-29, lines 581-587).

      (4) In Figure 7, the authors concluded that CXXC1 promotes Treg cell homeostasis and function by preserving the H3K4me3 modification since Cxxc1-deficient Treg cells show lower H3K4me3 densities at the key Treg signature genes. Are these Cxxc1-deficient Treg cells derived from mosaic mice? If Cxxc1-deficient Treg cells are derived from cKO mice, the gene expression and H3K4me3 modification status are inconsistent because scRNA-seq analysis indicated that expression of these Treg signature genes was increased in Cxxc1-deficient Treg cells compared to WT (Figure 5F and G).

      Thank you for your insightful comment. To clarify, the Cxxc1-deficient Treg cells analyzed for H3K4me3 modifications in Figure 7 were derived from Cxxc1 conditional knockout (cKO) mice, not mosaic mice.

      Regarding the apparent inconsistency between reduced H3K4me3 levels and the increased expression of Treg signature genes observed in scRNA-seq analysis (Figure 5F and G), we believe this discrepancy can be attributed to distinct mechanisms regulating gene expression. H3K4me3 is an epigenetic mark that facilitates chromatin accessibility and transcriptional regulation, reflecting upstream chromatin dynamics. However, gene expression levels are influenced by a combination of factors, including transcriptional activators, downstream compensatory mechanisms, and the inflammatory environment in cKO mice.

      The upregulation of Treg signature genes in scRNA-seq data likely reflects an activated or pro-inflammatory state of Cxxc1-deficient Treg cells in response to systemic inflammation, as previously described in the manuscript. This contrasts with the intrinsic reduction in H3K4me3 levels at these loci, indicating a loss of epigenetic regulation by CXXC1.

      To further support this interpretation, RNA-seq analysis of Treg cells from Foxp3<sup>Cre/+</sup> Cxxc1<sup>fl/fl</sup> (“het-KO”) and their littermate Foxp3<sup>Cre/+</sup> Cxxc1<sup>fl/+</sup> (“het-WT”) female mice (Figure S6C) revealed a significant reduction in key Treg signature genes such as Icos, Ctla4, Tnfrsf18, and Nt5e in het-KO Treg cells. These results align with the diminished H3K4me3 modifications observed in cKO Treg cells, further underscoring the role of CXXC1 as an epigenetic regulator.

      In summary, while the gene expression changes observed in scRNA-seq may reflect adaptive responses to inflammation, the reduced H3K4me3 modifications directly highlight the critical role of CXXC1 in maintaining the epigenetic landscape essential for Treg cell homeostasis and function.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In Figure 7E, the y-axis scale for H3K4me3 peaks at the Ctla4 locus should be consistent between WT and cKO samples.

      We thank the reviewer for pointing out the inconsistency in the y-axis scale for the H3K4me3 peaks at the Ctla4 locus in Figure 7E. We have carefully revised the figure to ensure that the y-axis scale is now consistent between the WT and cKO samples.

      We appreciate the reviewer’s attention to this detail, as it enhances the rigor of the data presentation. Please find the updated Figure 7E in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In lines 455 and 466, the name of Treg signature markers validated by flow cytometry should be written as protein name and capitalized.

      Thank you for pointing this out. We have carefully reviewed lines 455 and 466 and have revised the text to ensure that the Treg signature markers validated by flow cytometry are referred to using their protein names, with proper capitalization.

      Reviewer #3 (Recommendations for the authors):

      (1) On line 431, "Cxxc1-deficient cells" should be Cxxc1-deficient Treg cells".

      We thank the reviewer for highlighting this oversight. On line 431, we have revised "Cxxc1-deficient cells" to "Cxxc1-deficient Treg cells" to provide a more accurate and specific description. We appreciate the reviewer's attention to detail, as this correction improves the precision of our manuscript.

      (2) In Figure 4H, negative values should be removed from the y-axis.

      Thank you for your observation. We have revised Figure 4H to remove the negative values from the y-axis, as requested. This adjustment ensures a more accurate and meaningful representation of the data.

      (3) It is better to provide the lists of overlapping genes in Figure 7C.

      Thank you for your suggestion. We agree that providing the lists of overlapping genes in Figure 7C would enhance the clarity and reproducibility of the results. We have now included the gene lists as supplementary information (Supplementary Table 3) accompanying Figure 7C.

      (1) Lee, J. H. & Skalnik, D. G. CpG-binding protein (CXXC finger protein 1) is a component of the mammalian set1 histone H3-Lys4 methyltransferase complex, the analogue of the yeast Set1/COMPASS complex. Journal of Biological Chemistry 280, 41725-41731, doi:10.1074/jbc.M508312200 (2005).

      (2) Thomson, J. P., Skene, P. J., Selfridge, J., Clouaire, T., Guy, J., Webb, S., Kerr, A. R. W., Deaton, A., Andrews, R., James, K. D., Turner, D. J., Illingworth, R. & Bird, A. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082-U1162, doi:10.1038/nature08924 (2010).

      (3) Shilatifard, A. in Annual Review of Biochemistry, Vol 81 Vol. 81 Annual Review of Biochemistry (ed R. D. Kornberg)  65-95 (2012).

      (4) Brown, D. A., Di Cerbo, V., Feldmann, A., Ahn, J., Ito, S., Blackledge, N. P., Nakayama, M., McClellan, M., Dimitrova, E., Turberfield, A. H., Long, H. K., King, H. W., Kriaucionis, S., Schermelleh, L., Kutateladze, T. G., Koseki, H. & Klose, R. J. The SET1 Complex Selects Actively Transcribed Target Genes via Multivalent Interaction with CpG Island Chromatin. Cell Reports 20, 2313-2327, doi:10.1016/j.celrep.2017.08.030 (2017).

    1. eLife Assessment

      The authors use single molecule imaging and in vivo loop-capture genomic approaches to investigate estrogen mediated enhancer-target gene activation in human cancer cells. These potentially important results suggest that ER-alpha can, in a temporal delay, activate a non-target gene TFF3, which is in proximity to the main target gene TFF1, even though the estrogen responsive enhancer does not loop with the TFF3 promoter. To explain these results, the authors invoke a transcriptional condensate model. The claim of a temporal delay and effects of the target gene transcription on the non-target gene expression are supported by solid evidence but there is no direct evidence of the role of a condensate in mediating this effect. The reviewers appreciate that the authors have done a lot of work to strengthen the study. This work will be of interest to those studying transcriptional gene regulation and hormone-aggravated cancers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But the authors have greatly improved the manuscript during the revision work.

      Comments on latest version:

      The authors have done a lot of work for the revision. The manuscript has been greatly improved.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript Bohra et al. measure the effects of estrogen responsive gene expression upon induction on nearby target genes using a TAD containing the genes TFF1 and TFF3 as a model. The authors propose that there is a sort competition for transcriptional machinery between TFF1 (estrogen responsive) and TFF3 (not responsive) such that when TFF1 is activated and machinery is recruited, TFF3 is activated after a time delay. The authors attribute this time delay to transcriptional machinery that was being sequestered at TFF1 becomes available to the proximal TFF3 locus. The authors demonstrate that this activation is not dependent on contact with the TFF1 enhancer through deletion, instead they conclude that it is dependent on a phase-separated condensate which can sequester transcriptional machinery. Although the manuscript reports an interesting observation that there is a dose dependence and time delay on the expression of TFF1 relative to TFF3, there is much room for improvement in the analysis and reporting of the data. Most importantly there is no direct test of condensate formation at the locus in the context of this study: i.e. dissolution upon the enhancer deletion, decay in a temporal manner, and dependence of TFF1 expression on condensate formation. Using 1,6' hexanediol to draw conclusion on this matter is not adequate to draw conclusions on the effect of condensates on a specific genes activity given current knowledge on its non-specificity and multitude of indirect effects. Thus, in my opinion the major claim that this effect of a time delayed expression of TFF3 being dependent on condensates in not supported by the current data.

      Strengths:

      The depends of TFF1 expression on a single enhancer and the temporal delay in TFF3 is a very interesting finding.

      The non-linear dependence of TFF1 and TTF3 expression on ER concentration is very interesting with potentially broader implications.

      The combined use of smFISH, enhancer deletion, and 4C to build a coherent model is a good approach.

      Weaknesses:

      There is no direct observation of a condensate at the TFF1 and TFF3 locus and how this condensate changes over time after E2 treatment, upon enhancer deletion, whether transcriptional machinery is indeed concentrated within it, and other claims on condensate function and formation made in the manuscript. The use of 1,6' HD is not appropriate to test this idea given how broadly it acts.

      Comments on latest version:

      I don't think the response to Reviewer 2's comment on LLPS condensates on TFF1 are adequate and given this point is essential to the claims of the manuscript they must be addressed. Namely, the data from Saravavanan, 2020 actually suggest that condensate formation at the locus is not very predictive and barely enriched over random spots. The claims in the manuscript on the dependence of the condensate being responsible for sequestering transcriptional machinery are quite strong and the crux of the current model. To continue to make this claim (which I don't think is necessary since there are other possible models) the authors must test if the condensate at his locus (1) shows time dependent behavior, (2) is not present or weakened at the locus in cells that show high TFF3 expression, (3) is indeed enriched for transcriptional machinery when TFF1 peaks. The use of 1,6 hexanediol is not appropriate as pointed out by reviewer 2 and is no longer considered as an appropriate experiment by many as the whole notion of LLPS forming nuclear condensates is now under question. Such condensates can form through a variety of mechanisms as reviewed for example by Mittaj and Pappu (A conceptual framework for understanding phase separation and addressing open questions and challenges, Molecular Cell, 2022). Furthermore, given the distance between TFF1 and TFF3 it is hard to imagine that if a condensate that concentrates machinery in a non-stoichiometric manner was forming how it would not boost expression on both genes and be just specific to one. There must be another mechanism in my opinion.

      I would recommend the authors remove this aspect of their manuscript/model and simply report their interesting findings that are actually supported by data: The temporal delay of TFF3 expression, the dependence on ER concentration, and the enhancer dependence.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. eLife Assessment

      This important study developed a mathematical model to predict biological age by leveraging physiological traits across multiple organ systems. The results presented are convincing, utilizing comprehensive data-driven approaches. However, additional external validation could further strengthen its generalizability. The model provides a way to identify environmental and genetic factors impacting aging and lifespan, revealing new factors potentially affecting aging. It also shows promise for evaluating therapeutics aimed at prolonging a healthy lifespan.

    2. Reviewer #1 (Public review):

      In this study, the authors developed a mathematical model to predict human biological ages using physiological traits. This model provides a way to identify environmental and genetic factors that impact aging and lifespan.

      Strength:

      (1) The topic addressed by the authors - human age predication using physiological traits - is an extremely interesting, important, and challenging question in the aging field. One of the biggest challenges is the lack of well-controlled data from a large number of humans. However, the authors took this challenge and tried their best to extract useful information from available data.<br /> (2) Some of the findings can provide valuable guidelines for future experimental design for human and animal studies. For example, it was found that this mathematical model can best predict age when all different organ and physiological systems are sampled. This finding makes scenes in general, but can be, and have been, neglected when people use molecular markers to predict age. Most of those studies have used only one molecular trait or different traits from one tissue.

      Weakness:

      (1) As I mentioned above, the Biobank data used here are not designed for this current study, so there are many limitations for model development using these data, e.g., missing data points and irrelevant measurements for aging. This is a common caveat for human studies and has been discussed by the authors.<br /> (2) There is no validation dataset to verify the proposed model. The authors suggested that human biological age can be predicted with a high accuracy using 12 simple physiological measurements. It will be super useful and convincing if another biobank dataset containing those 12 traits can be applied to the current model.

      Comments on revisions:

      In this revision, the authors improved the manuscript by adding discussion of two main weaknesses about human data limitation and model validation. My several other specific concerns and suggestions are all properly resolved.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed a mathematical model to predict human biological ages using physiological traits. This model provides a way to identify environmental and genetic factors that impact aging and lifespan.

      Strengths:

      (1) The topic addressed by the authors - human age predication using physiological traits - is an extremely interesting, important, and challenging question in the aging field. One of the biggest challenges is the lack of well-controlled data from a large number of humans. However, the authors took this challenge and tried their best to extract useful information from available data.

      Authors thank an anonymous reviewer for agreeing that physiological clock building and analysis is an interesting and important even though challenging task.

      (2) Some of the findings can provide valuable guidelines for future experimental design for human and animal studies. For example, it was found that this mathematical model can best predict age when all different organ and physiological systems are sampled. This finding makes sense in general but can be, and has been, neglected when people use molecular markers to predict age. Most of those studies have used only one molecular trait or different traits from one tissue.

      Authors thank an anonymous reviewer for highlighting the importance of the approach we employ to sample traits for biological age prediction from multiple organs and systems, which ultimately provides more wholistic information

      Weaknesses:

      (1) As I mentioned above, the Biobank data used here are not designed for this current study, so there are many limitations for model development using these data, e.g., missing data points and irrelevant measurements for aging. This is a common caveat for human studies and has been discussed by the authors.

      Thank you for pointing out the caveats. Indeed, most databases and datasets including the UKBB that we use here have missing or inaccurate entries. We do discuss it in the text, as well as suggest and employ strategies to mitigate these caveats. We now updated the text to highlight these issues even further. Specifically, in the second paragraph of the “Results” section, we added the following text: “Most large human databases and datasets, including UKBB, have certain limitations, such as incomplete or missing data points. Therefore, before proceeding to modelling aging, we needed to address the following three issues:”

      (2) There is no validation dataset to verify the proposed model. The authors suggested that human biological age can be predicted with high accuracy using 12 simple physiological measurements. It will be super useful and convincing if another biobank dataset containing those 12 traits can be applied to the current model.

      Thank you for this comment. Indeed, having a replication cohort would be quite valuable. As of today, there is no comparable dataset to verify performance of the clock model or to attempt to validate GWAS results. The closest possible is the NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using a small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but this won’t ultimately address the accuracy of the wholistic physiological clock presented here. We hope academic labs will utilize our clock-modeling approach and apply it to datasets currently unavailable to us and publish their findings.

      To strengthen the credentials of our biological clock, we would like to remind the reviewer that we performed 10 rounds of validation, where, in each round, 10% of the data were left out from the model training such that the clock was created using remaining 90%. The model was subsequently tested on the 10% that was left out. Over 10 rounds, different 10% of data were left out and statistics for this 10-fold cross-validation age available in the supplementary materials. We have now updated the text to make this validation more apparent.

      Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph, the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      Additionally, the details of this cross-validation are described in detail in supplementary methods.

      Additionally, we compared published GWAS results obtained for human aging clocks using modalities that were different yet relevant to human health. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS that we presented in our manuscript. We now describe the results of this comparison in our manuscript. Briefly, there is no overlap between GWAS results for any two of these published clocks built via different modalities – retina, DNA methylation, or physiological functions (between each other or with our model). However, there is a significant genetic overlap (p<10E-8) between clocks built using human phenotypic measures in a cohort of National Health and Nutrition Examination Survey (NHANES) III in the United States (7 variables) and ∆Age from Physiological clock from UKBB that we describe here (121 variables), further validating our approach. It is interesting to consider the reasons why genetic associations for human aging built using different modalities do not appear to have common genetic corelates, something we also now discuss in our manuscript.

      Specifically, we added to the "Results” section, “Genetic loci associated with biological age” subsection, third paragraph, the following text: “Additionally, we compared our ∆Age GWAS association results with similar GWAS studies that were performed for other biological clocks. For example, (McCartney et al., 2021) used DNA methylation data on 40,000 individuals to compute biological age called GrimAge. After that they calculated an intrinsic epigenetic age acceleration (IEAA, a value similar to ∆Age, which measured a deviation of biological age from chronological age) and performed GWAS.” Additionally, we added to the “Discussion” section, “Broader implications of the model for physiological aging” subsection, fourth paragraph, the following text: “To further analyze the meaning of genetic associations with ∆Age that we described above, we compared several published GWAS results obtained for human aging clocks using different health modalities. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS we presented in our manuscript. Surprisingly, we discovered that there is no overlap between GWAS results for any two of these clocks built via different modalities – retina, DNA methylation, or physiological functions. However, there is a significant genetic overlap between clocks built using human phenotypic measures and our ∆Age model we describe. For example, the Biological Age Clock Acceleration calculated using HbA1c, Albumin, Cholesterol, FEV, Urea nitrogen, SBP, and Creatinine (Levine, 2013) in a US cohort [from National Health and Nutrition Examination Survey (NHANES)] yielded 16 significant hits in the GWAS analysis, five of which were also significant in our GWAS for UKBB based ∆Age. These five common loci were close to the following genes - APOB, PIK3CG, TRIB1, SMARCA4, and APOE. The significance of this overlap is p < 10<sup>-8</sup>, suggesting that the ∆Age model we propose might be translatable to other cohorts of people.

      An interesting question to consider is why GWAS results from other clock modalities, such as DNA methylation and retinal imaging do not yield any genetic similarities to each other or to physiological and biological clocks. It is possible that these modalities of age assessment depend on completely genetically independent biological processes. For example, in a simplified manner - blood composition might be heavily weighted for DNA methylation, vascular structure for retinal scans, and muscle/bone/kidney health for physiological clocks. Data from model organisms suggest the master regulators of aging exist, and APOE is the best genetic variant known to influence human aging. Interestingly, only the biological and physiological clock models that we propose here pick it up as a hit. Alternatively, it is also possible that the true master regulators of aging rate are under stringent purifying selection; for example, due to an important role in development, and therefore, do not have genetic variability in human populations examined. As such, they could not be identified as hits in any GWAS studies.”

      Reviewer #2 (Public Review):

      In this manuscript, Libert et al. develop a model to predict an individual's age using physiological traits from multiple organ systems. The difference between the predicted biological age and the chronological age -- ∆Age, has an effect equivalent to that of a chronological year on Gompertz mortality risk. By conducting GWAS on ∆Age, the authors identify genetic factors that affect aging and distinguish those associated with age-related diseases. The study also uncovers environmental factors and employs dropout analysis to identify potential biomarkers and drivers for ∆Age. This research not only reveals new factors potentially affecting aging but also shows promise for evaluating therapeutics aimed at prolonging a healthy lifespan. This work represents a significant advancement in data-driven understanding of aging and provides new insights into human aging. Addressing the points raised would enhance its scientific validity and broaden its implications.

      Thank you!

      Major points:

      (1) Enhance the description and clarity of model evaluation.

      The manuscript requires additional details regarding the model's evaluation. The authors have stated "To develop a model that predicts age, we experimented with several algorithms, including simple linear regression, Gradient Boosting Machine (GBM) and Partial Least Squares regression (PLS). The outcomes of these approaches were almost identical". It is currently unclear whether the 'almost identical outcomes' mentioned refer to the similarity in top contribution phenotypes, the accuracy of age prediction, or both. To resolve this ambiguity, it would be beneficial to include specific results and comparisons from each of these models.

      Thank you for this comment. We now describe details of the model selection and provide data on outcome caparisons. Briefly, different approaches have different advantages and limitations; however, we chose one approach, and did not develop and analyze several independent models in parallel in order to not artificially inflate our False Discovery Rate (FDR). However, we now provide rationale and comparative performance of these three approaches. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, first paragraph the following text: “Different approaches have different advantages and limitations; however, we decided to choose one approach, and not develop and analyze several independent models in parallel in order to not artificially inflate the False Discovery Rate (FDR). We ultimately selected PLS regression because it enabled us to determine the number and composition of components required to predict age optimally from the data, which provides additional insights into the biology of human aging. But before making this selection, we compared the performance of the three approaches. The outcomes of PLS and linear regression were almost identical (R-squared between ∆Age values derived by these two methods was 0.99, meaning that if one model were to predict an individual was 62 years old, the other model would have the same prediction). This similarity is likely due to the small number of predictors (121 phenotypes) and comparatively large number of participants (over 400,000). The correlation between GBM model outcomes and PLS (and linear regression) was slightly smaller (R-squared = 0.87). The reason for the lower correlation is likely the need for imputation in PLS and linear regression models. The GBM model tolerates missing data, whereas linear regression and PLS methods require imputation or removal of individuals with too many datapoints missing, an approach we describe in more detail below.”

      Additionally, after we obtained associations of ∆Age values with genetical loci, which formed the candidate base for gene targets to influence human aging (figure 5b), we verified the top association obtained via the PLS model in Linear and GBM models. All the top candidates that we verified had statistically significant associations in all the models of ∆Age (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1). The precise strengths of the associations were different, but that is to be expected given that linear datasets had some data imputed while GBM model was built with missing values. We believe that due to small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences the three models introduced to final outcomes were quite small.

      To convey this message, we added to the "Discussion” section, “Broader implications of the model for physiological aging” subsection, 7th paragraph, the following text: “It is interesting to note that the three approaches we used to generate age prediction model (PLS, GBM, and linear regression) yielded very similar or identical results in performance. We chose to settle on one approach (PLS) to not artificially inflate the False Discovery Rate (FDR); however, we verified that the top genetic loci associations obtained via the PLS model were also obtained in the GBM and linear models. Specifically, the top candidates (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1) identified in the PLS approach had statistically significant associations in all the models of ∆Age. It is likely that due to the small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences that these models introduce to final outcomes are quite small, which increases our confidence in the results.”

      Furthermore, the authors mention "to test for overfitting, a PLS model had been generated on randomly selected 90% of individuals and tested on the remaining 10% with similar results". To comprehensively assess the model's performance, it is crucial to provide detailed results for both the test and validation datasets. This should at least include metrics such as correlation coefficients and mean squared error for both training and test datasets.

      Thank you for bringing up this point. The detailed description, details and statistics of cross-validation procedure is described in supplementary computational methods. Briefly, across 10 rounds of validation the Root Mean Square Error of Prediction (RMSEP) did not exceed 4.81 for females when all 9 PLS components were considered, and RMSEP form males was 5.1 when all 11 components were considered. The variation of RMSEP between different datasets was less than 0.1. We have now updated the text to make this validation more apparent. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      (2) External validation and generalization of results

      To enhance the robustness and generalizability of the study's findings, it is crucial to perform external validation using an independent population. Specifically, conducting validation with the participants of the 'All of Us' research program offers a unique opportunity. This diverse and extensive cohort, distinct from the initial study group, will serve as an independent validation set, providing insights into the applicability of the study's conclusions across varied demographics.

      Thank you for this comment. As we mentioned above, we agree that having a replication cohort would be very valuable for this study, as well as many other studies that stem from UKBB dataset. However, yet, there is no comparable dataset to verify performance of the clock or to attempt to validate GWAS results. The closest possible is NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using the small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but that approach would not ultimately be informative about the accuracy of the complete physiological clock presented here. We hope academic labs will utilize our clock approach and apply it to datasets currently unavailable to us and publish their findings. For the detailed response on this issue, please see the response to the second comment of the first reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific questions/suggestions:<br /> - It looks like the ages of participants are enriched around 60 years (Fig. 1, Fig 3b). Can authors clarify whether age distribution affects the correlation tests (e.g. correlation in Fig 2)?

      Indeed, the distribution of people by age is enriched by 60–65-year-olds and is depleted at younger and older ages. Such a distribution influences the uncertainty of correlations that we compute, with error bars being larger for 40- and 70-year-olds and lower for 50- and 60-year-olds. The example of this can be seen on figure 1F. Figures 2a,b,g,h mostly deal with the correlation of phenotypes with each other and thus are not influenced by age. For other computations, such age prediction, it is theoretically possible that if age determinants among 65-year-olds differ from those for 40- or 80-year-olds, the calculated contributions would be skewed to increase accuracy in the middle of distribution at the expense of the ends. ∆Age, however, was explicitly normalized for each age cohort (Fig. 3a) to avoid “birth cohort” bias, therefore minimizing the effect of uneven distribution on further analysis, such as GWAS. We now acknowledge and describe this feature of UKBB dataset in the first paragraph of the “Results” section.

      - Phenotypic variation usually increases during aging. However, the authors showed that delta-age and age are not correlated (Figure 3a), suggesting that biological variation does not increase during aging in their analysis. Can authors provide more evidence supporting their findings? Is this phenomenon affected by their normalization method?

      Thank you for this comment. We find that there is no strict rule for phenotypic variation change with age. Certain phenotypes, such as blood pressure (Fig. 1a) or SHGB (Fig. 1d), indeed increase in variation with advanced age, however many others, such as grip strength (Fig. 1b) and BMI do not change in variation, and certain phenotypes even decrease their variation with age. As we stated above, in order to minimize the possible effect of “birth cohort” bias on subsequent analysis, as well as uneven distribution of people across ages, ∆Age was normalized per age cohort. Additionally, purifying selection likely also limits how far most physiological factors can deviate. For example, people with too high or too low blood pressures would simply perish, which would limit continuous increase in variation. 

      - Authors correlate GWAS data with delta-age (Figure 4). It would be important to show whether the delta-age from young and old participants correlates with GWAS patterns in a similar manner. If not, the authors have to consider how age differences affect delta-age and the GWAS correlation. For example, the authors mentioned that APOE genotype influences age-delta even in the 40-year-old group (Figure 4f). If the APOE genotype already shows high delta-age in the 40-year-old group, how does aging affect the delta-age distribution?

      Thank you for this comment. It is an interesting question to understand how age influences GWAS hits identified through ∆Age. At the same time, one must remember that our dataset is cross-sectional in nature and “different age” in reality is a subset of different people, which lived in different times with different exposures to environments and different standards of medical care (which are evolving over time). We specifically attempted to factor age and this “cohort effect” out of our analysis and presented Figure 4f simply as an illustration that APOE variants seem to influence human aging at any age, which challenges the theory proposed by previous studies that APOE is implicated in aging simply because APOE4 carriers likely die from Alzheimer disease and are thus excluded from the oldest cohorts. To investigate the question raised by the reviewer it is possible to do GWAS on age, however one must keep in mind the limitations associated with interpreting those results; as “age” in reality (in this cross-sectional cohort) also represents changes in population composition, changes in the environment, food quality, early life care, medical care, social habits, and other parameters associated with changing society.

      - For the discussion part, it would be great if the authors could add one section to provide guidelines for future human and lab animal studies based on observations from the current study. For example, what physiological traits are most useful, and what can be further added when collecting human data?

      Thank you for the great suggestion. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging.

      - In line 479, I found the statement "It is possible that synapse function accounts for the association of computer gaming with ΔAge" came from nowhere, and suggest removing it.

      Done—thank you.

      - Minor. Line 155. Is it a wrong citation of table S2c, 2d as there are only 2a and 2b?<br />

      Thank you, corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) Between lines 300-305, there is a missing reference to Figure 3e.

      Thank you, corrected.

      (2) For Figures 4a and 4c, please add the lambda statistic to the QQ plots.

      Thank you, we have added lambda inflation factors to the QQ plots.

      (3) In line 384, the p-value cut-off is mentioned as 10-9. However, this does not seem to be consistently represented in Figures 4b and 4d, where the gray lines do not align with this threshold. Please adjust these figures to accurately reflect the mentioned p-value cut-off.

      Thank you, corrected.

      (4) Clarification for Figure 5a. Add titles and correlation coefficients to Figure 5a to clearly define what the clusters represent. Please also add a discussion to explain why the cluster 10 (general health) dropout model can affect ∆Age compared to the full model, with some individuals showing a 5-year difference. Furthermore, despite the substantial effect of removing cluster 10 on ΔAge, all the top loci remain unchanged in terms of effect sizes and p-values compared to the full model.

      We have added the titles and correlation coefficients to the Figure 5a. Thank you for these suggestions, it makes the presentation of data much clearer. It is an interesting observation that whereas dropping out cluster 10 resulted in quite significant changes of ∆Age distribution, the genetic signature as determined by GWAS did not change much. The most obvious explanation is that many parameters in this category are influenced by environment more than by genetics, therefore genetic signature did not change much after the cluster removal. We now mention this observation in the text. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “Another interesting observation is that degree by which certain cluster contributes to the model does not necessarily correlate with how much this cluster contributes to genetic signature of human aging. For example, while dropping out cluster 10 (General Health) resulted in quite significant changes of ∆Age distribution (R<sup>2</sup>=0.88), the genetic signature as determined by GWAS did not change substantially. The most likely explanation is that many parameters in this category are influenced by environment more strongly than by genetics; for example, not as much as caused by cluster 1 (muscle-related) removal.”

      (5) Discussion on drivers and markers. Given the theoretical nature of the study, it would be beneficial to propose potential experimental validations for your findings. Even if these validations have not been performed, suggesting them would greatly enhance the value of the discussion.

      Thank you, it is a great idea. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “To definitively distinguish whether a gene is a driver or a marker of aging, an experiment would need to be performed. It is possible that certain gene activities are influenced by existing FDA-approved medications, and retrospective analyses of human cohorts who take certain medications can be performed. More likely, however, an animal model would need to be employed, where animals with candidate genes modified via genetic means are investigated for lifespan and onset and progression of age-associated conditions. For example, one can engineer a mouse with a conditional allele of Cystatin-C and evaluate how changes in dosage of this protein influence various phenotypes of aging.”

    1. eLife Assessment

      This potentially useful study introduces an orthogonal approach for detecting RNA modification, without chemical modification of RNA, which often results in RNA degradation and therefore loss of information. Compared to previous versions, the most recent one is improved and sufficiently aligned with the standards of the field to merit consideration by the research community, making the evidence solid according to said standards. Nevertheless, uncertainty regarding false positive and false negative rates remains, as it does for some of the alternative approaches. With more rigorous validation, the approach might be of particular interest for sites in RNA molecules where modifications are rare.

    2. Reviewer #2 (Public review):

      The fledgling field of epitranscriptomics has encountered various technical roadblocks with implications as to the validity of early epitranscriptomics mapping data. As a prime example, the low specificity of (supposedly) modification-specific antibodies for the enrichment of modified RNAs, has been ignored for quite some time and is only now recognized for its dismal reproducibility (between different labs), which necessitates the development of alternative methods for modification detection. Furthermore, early attempts to map individual epitranscriptomes using sequencing-based techniques are largely characterized by the deliberate avoidance of orthogonal approaches aimed at confirming the existence of RNA modifications that have been originally identified.

      Improved methodology, the inclusion of various controls, and better mapping algorithms as well as the application of robust statistics for the identification of false-positive RNA modification calls have allowed revisiting original (seminal) publications whose early mapping data allowed making hyperbolic claims about the number, localization and importance of RNA modifications, especially in mRNA. Besides the existence of m6A in mRNA, the detectable incidence of RNA modifications in mRNAs has drastically dropped.

      As for m5C, the subject of the manuscript submitted by Zhou et al., its identification in mRNA goes back to Squires et al., 2012 reporting on >10.000 sites in mRNA of a human cancer cell line, followed by intermittent findings reporting on pretty much every number between 0 to > 100.000 m5C sites in different human cell-derived mRNA transcriptomes. The reason for such discrepancy is most likely of a technical nature. Importantly, all studies reporting on actual transcript numbers that were m5C-modified relied on RNA bisulfite sequencing, an NGS-based method, that can discriminate between methylated and non-methylated Cs after chemical deamination of C but not m5C. RNA bisulfite sequencing has a notoriously high background due to deamination artifacts, which occur largely due to incomplete denaturation of double-stranded regions (denaturing-resistant) of RNA molecules. Furthermore, m5C sites in mRNAs have now been mapped to regions that have not only sequence identity but also structural features of tRNAs. Various studies revealed that the highly conserved m5C RNA methyltransferases NSUN2 and NSUN6 do not only accept tRNAs but also other RNAs (including mRNAs) as methylation substrates, which in combination account for most of the RNA bisulfite-mapped m5C sites in human mRNA transcriptomes. Is m5C in mRNA only a result of the Star activity of tRNA or rRNA modification enzymes, or is their low stoichiometry biologically relevant?

      In light of the short-comings of existing tools to robustly determine m5C in transcriptomes, other methods, like DRAM-seq, aiming to map m5C independently of ex situ RNA treatment with chemicals, are needed to arrive at a more solid "ground state", from which it will be possible to state and test various hypotheses as to the biological function of m5C, especially in lowly abundant RNAs such as mRNA.

      Importantly, the identification of >10.000 sites containing m5C increases through DRAM-Seq, increases the number of potential m5C marks in human cancer cells from a couple of 100 (after rigorous post-hoc analysis of RNA bisulfite sequencing data) by orders of magnitude. This begs the question, whether or not the application of these editing tools results in editing artefacts overstating the number of actual m5C sites in the human cancer transcriptome.

      [Editors' note: earlier reviews have been provided here: https://doi.org/10.7554/eLife.98166.3.sa1; https://doi.org/10.7554/eLife.98166.2.sa1; https://doi.org/10.7554/eLife.98166.1.sa1]

    3. Author response:

      The following is the authors’ response to the original reviews.

      Responses to Reviewer’s Comments:  

      To Reviewer #2:

      (1) The use of two m<sup>5</sup>C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m<sup>5</sup>C. 

      To substantiate the author's claim that ALYREF or YBX1 binds m<sup>5</sup>C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m<sup>5</sup>Cmodified RNAs, it would be recommended to provide data on the affinity of these, supposedly proven, m<sup>5</sup>C readers to non-modified versus m<sup>5</sup>C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). However, using dot blots like in so many published studies to show modification of a specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein, encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research. It becomes a pertinent problem if used as a platform for base editing similar to the work presented in this manuscript.

      The authors have tried to address the point made by this reviewer. However, rather than performing an experiment with recombinant ALYREF-fusions and m<sup>5</sup>C-modified to unmodified RNA oligos for testing the enrichment factor of ALYREF in vitro, the authors resorted to citing two manuscripts. One manuscript is cited by everybody when it comes to ALYREF as m<sup>5</sup>C reader, however none of the experiments have been repeated by another laboratory. The other manuscript is reporting on YBX1 binding to m<sup>5</sup>C-containing RNA and mentions PARCLiP experiments with ALYREF, the details of which are nowhere to be found in doi: 10.1038/s41556-019-0361-y.

      Furthermore, the authors have added RNA pull-down assays that should substitute for the requested experiments. Interestingly, Figure S1E shows that ALYREF binds equally well to unmodified and m<sup>5</sup>C-modified RNA oligos, which contradicts doi:10.1038/cr.2017.55, and supports the conclusion that wild-type ALYREF is not specific m<sup>5</sup>C binder. The necessity of including always an overexpression of ALYREF-mut in parallel DRAM experiments, makes the developed method better controlled but not easy to handle (expression differences of the plasmid-driven proteins etc.) 

      Thank you for pointing this out. First, we would like to correct our previous response: the binding ability of ALYREF to m<sup>5</sup>C-modified RNA was initially reported in doi: 10.1038/cr.2017.55, (and not in doi: 10.1038/s41556-019-0361-y), where it was observed through PAR-CLIP analysis that the K171 mutation weakens its binding affinity to m<sup>5</sup>C -modified RNA.

      Our previous experimental approach was not optimal: the protein concentration in the INPUT group was too high, leading to overexposure in the experimental group. Additionally, we did not conduct a quantitative analysis of the results at that time. In response to your suggestion, we performed RNA pull-down experiments with YBX1 and ALYREF, rather than with the pan-DRAM protein, to better validate and reproduce the previously reported findings. Our quantitative analysis revealed that both ALYREF and YBX1 exhibit a stronger affinity for m<sup>5</sup>C -modified RNAs. Furthermore, mutating the key amino acids involved in m<sup>5</sup>C recognition significantly reduced the binding affinity of both readers. These results align with previous studies (doi: 10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y), confirming that ALYREF and YBX1 are specific readers of m<sup>5</sup>C -modified RNAs. However, our detection system has certain limitations. Despite mutating the critical amino acids, both readers retained a weak binding affinity for m<sup>5</sup>C, suggesting that while the mutation helps reduce false positives, it is still challenging to precisely map the distribution of m<sup>5</sup>C modifications. To address this, we plan to further investigate the protein structure and function to obtain a more accurate m<sup>5</sup>C sequencing of the transcriptome in future studies. Accordingly, we have updated our results and conclusions in lines 294-299 and discuss these limitations in lines 109114.

      In addition, while the m<sup>5</sup>C assay can be performed using only the DRAM system alone, comparing it with the DRAM<sup>mut</sup> control enhances the accuracy of m<sup>5</sup>C region detection. To minimize the variations in transfection efficiency across experimental groups, it is recommended to use the same batch of transfections. This approach not only ensures more consistent results but also improve the standardization of the DRAM assay, as discussed in the section added on line 308-312.

      (2) Using sodium arsenite treatment of cells as a means to change the m<sup>5</sup>C status of transcripts through the downregulation of the two major m<sup>5</sup>C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m<sup>5</sup>C sites to be detected by the fusion proteins.

      The authors have not addressed the point made by this reviewer. Instead the authors state that they have not addressed that possibility. They claim that they have revised the results section, but this reviewer can only see the point raised in the conclusions. An experiment would have been to purify base editors via the HA tag and then perform some kind of binding/editing assay in vitro before and after arsenite treatment of cells.

      We appreciate the reviewer’s insightful comment. We fully agree with the concern raised. In the original manuscript, our intention was to use sodium arsenite treatment to downregulate NSUN mediated m<sup>5</sup>C levels and subsequently decrease DRAM editing efficiency, with the aim of monitoring m<sup>5</sup>C dynamics through the DRAM system. However, as the reviewer pointed out, sodium arsenite may inactivate both NSUN proteins and the base editor fusion proteins, and any such inactivation would likely result in a reduced DRAM editing.

      This confounds the interpretation of our experimental data.

      As demonstrated in Author response image 1A, western blot analysis confirmed that sodium arsenite indeed decreased the expression of fusion proteins. In addition, we attempted in vitro fusion protein purificationusing multiple fusion tags (HIS, GST, HA, MBP) for DRAM fusion protein expression, but unfortunately, we were unable to obtain purified proteins. However, using the Promega TNT T7 Rapid Coupled In Vitro Transcription/Translation Kit, we successfully purified the DRAM protein (Author response image 1B). Despite this success, subsequent in vitro deamination experiments did not yield the expected mutation results (Author response image 1C), indicating that further optimization is required. This issue is further discussed in line 314-315.

      Taken together, the above evidence supports that the experiment of sodium arsenite treatment was confusing and we determined to remove the corresponding results from the main text of the revised manuscript.

      Author response image 1.

      (3) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way then excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

      The authors have not addressed the point made by this reviewer. Figure 3F shows the screening process for DRAM-seq assays and principles for screening highconfidence genes rather than the data contained in Supplementary Tables 2 and 3 of the former version of this manuscript.

      Thank you for your valuable suggestion. We have visualized the data from Supplementary Tables 2 and 3 in Figure 4A as a circlize diagram (described in lines 213-216), illustrating the distribution of mutation sites detected by the DRAM system across each chromosome. Additionally, to improve the presentation and clarity of the data, we have revised Supplementary Tables 2 and 3 by adding column descriptions, merging the DRAM-ABE and DRAM-CBE sites, and including overlapping m<sup>5</sup>C genes from previous datasets.

      Responses to Reviewer’s Comments:  

      To Reviewer #3:

      The authors have again tried to address the former concern by this reviewer who questioned the specificity of both m<sup>5</sup>C reader proteins towards modified RNA rather than unmodified RNA. The authors chose to do RNA pull down experiments which serve as a proxy for proving the specificity of ALYREF and YBX1 for m<sup>5</sup>C modified RNAs. Even though this reviewer asked for determining the enrichment factor of the reader-base editor fusion proteins (as wildtype or mutant for the identified m<sup>5</sup>C specificity motif) when presented with m<sup>5</sup>C-modified RNAs, the authors chose to use both reader proteins alone (without the fusion to an editor) as wildtype and as respective m<sup>5</sup>C-binding mutant in RNA in vitro pull-down experiments along with unmodified and m<sup>5</sup>C-modified RNA oligomers as binding substrates. The quantification of these pull-down experiments (n=2) have now been added, and are revealing that (according to SFigure 1 E and G) YBX1 enriches an RNA containing a single m<sup>5</sup>C by a factor of 1.3 over its unmodified counterpart, while ALYREF enriches by a factor of 4x. This is an acceptable approach for educated readers to question the specificity of the reader proteins, even though the quantification should be performed differently (see below).

      Given that there is no specific sequence motif embedding those cytosines identified in the vicinity of the DRAM-edits (Figure 3J and K), even though it has been accepted by now that most of the m<sup>5</sup>C sites in mRNA are mediated by NSUN2 and NSUN6 proteins, which target tRNA like substrate structures with a particular sequence enrichment, one can conclude that DRAM-Seq is uncovering a huge number of false positives. This must be so not only because of the RNA bisulfite seq data that have been extensively studied by others, but also by the following calculations: Given that the m<sup>5</sup>C/C ratio in human mRNA is 0.02-0.09% (measured by mass spec) and assuming that 1/4 of the nucleotides in an average mRNA are cytosines, an mRNA of 1.000 nucleotides would contain 250 Cs. 0.02- 0.09% m<sup>5</sup>C/C would then translate into 0.05-0.225 methylated cytosines per 250 Cs in a 1000 nt mRNA. YBX1 would bind every C in such an mRNA since there is no m<sup>5</sup>C to be expected, which it could bind with 1.3 higher affinity. Even if the mRNAs would be 10.000 nt long, YBX1 would bind to half a methylated cytosine or 2.25 methylated cytosines with 1.3x higher affinity than to all the remaining cytosines (2499.5 to 2497.75 of 2.500 cytosines in 10.000 nt, respectively). These numbers indicate a 4999x to 1110x excess of cytosine over m<sup>5</sup>C in any substrate RNA, which the "reader" can bind as shown in the RNA pull-downs on unmodified RNAs. This reviewer spares the reader of this review the calculations for ALYREF specificity, which is slightly higher than YBX1. Hence, it is up to the capable reader of these calculations to follow the claim that this minor affinity difference allows the unambiguous detection of the few m<sup>5</sup>C sites in mRNA be it in the endogenous scenario of a cell or as fusion-protein with a base editor attached? 

      We sincerely appreciate the reviewer’s rigorous analysis. We would like to clarify that in our RNA pulldown assays, we indeed utilized the full DRAM system (reader protein fused to the base editor) to reflect the specificity of m<sup>5</sup>C recognition. As previously suggested by the reviewer, to independently validate the m<sup>5</sup>C-binding specificity of ALYREF and YBX1, we performed separate pulldown experiments with wild-type and mutant reader proteins (without the base editor fusion) using both unmodified and m<sup>5</sup>C-modified RNA substrates. This approach aligns with established methodologies in the field (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y). We have revised the Methods section (line 230) to explicitly describe this experimental design.

      Although the m<sup>5</sup>C/C ratios in LC/MS-assayed mRNA are relatively low (ranging from 0.02% to 0.09%), as noted by the reviewer, both our data and previous studies have demonstrated that ALYREF and YBX1 preferentially bind to m<sup>5</sup>C-modified RNAs over unmodified RNAs, exhibiting 4-fold and 1.3-fold enrichment, respectively (Supplementary Figure 1E–1G). Importantly, this specificity is further enhanced in the DRAM system through two key mechanisms: first, the fusion of reader proteins to the deaminase restricts editing to regions near m<sup>5</sup>C sites, thereby minimizing off-target effects; second, background editing observed in reader-mutant or deaminase controls (e.g., DRAM<sup>mut</sup>-CBE in Figure 2D) is systematically corrected for during data analysis.

      We agree that the theoretical challenge posed by the vast excess of unmodified cytosines. However, our approach includes stringent controls to alleviate this issue. Specifically, sites identified in NSUN2/NSUN6 knockout cells or reader-mutant controls are excluded (Figure 3F), which significantly reduces the number of false-positive detections. Additionally, we have observed deamination changes near high-confidence m<sup>5</sup>C methylation sites detected by RNA bisulfite sequencing, both in first-generation and high-throughput sequencing data. This observation further substantiates the validity of DRAM-Seq in accurately identifying m<sup>5</sup>C sites.

      We fully acknowledge that residual false positives may persist due to the inherent limitations of reader protein specificity, as discussed in line 299-301 of our manuscript. To address this, we plan to optimize reader domains with enhanced m<sup>5</sup>C binding (e.g., through structure-guided engineering), which is also previously implemented in the discussion of the manuscript.

      The reviewer supports the attempt to visualize the data. However, the usefulness of this Figure addition as a readable presentation of the data included in the supplement is up to debate.

      Thank you for your kind suggestion. We understand the reviewer's concern regarding data visualization. However, due to the large volume of DRAM-seq data, it is challenging to present each mutation site and its characteristics clearly in a single figure. Therefore, we chose to categorize the data by chromosome, which not only allows for a more organized presentation of the DRAM-seq data but also facilitates comparison with other database entries. Additionally, we have updated Supplementary Tables 2 and 3 to provide comprehensive information on the mutation sites. We hope that both the reviewer and editors will understand this approach. We will, of course, continue to carefully consider the reviewer's suggestions and explore better ways to present these results in the future.

      (3) A set of private Recommendations for the Authors that outline how you think the science and its presentation could be strengthened

      NEW COMMENTS to TEXT:

      Abstract:

      "5-Methylcytosine (m<sup>5</sup>C) is one of the major post-transcriptional modifications in mRNA and is highly involved in the pathogenesis of various diseases."

      In light of the increasing use of AI-based writing, and the proof that neither DeepSeek nor ChatGPT write truthfully statements if they collect metadata from scientific abstracts, this sentence is utterly misleading.

      m<sup>5</sup>C is not one of the major post-transcriptional modifications in mRNA as it is only present with a m<sup>5</sup>C/C ratio of 0.02- 0.09% as measured by mass-spec. Also, if m<sup>5</sup>C is involved in the pathogenesis of various diseases, it is not through mRNA but tRNA. No single published work has shown that a single m<sup>5</sup>C on an mRNA has anything to do with disease. Every conclusion that is perpetuated by copying the false statements given in the many reviews on the subject is based on knock-out phenotypes of the involved writer proteins. This reviewer wishes that the authors would abstain from the common practice that is currently flooding any scientific field through relentless repetitions in the increasing volume of literature which perpetuate alternative facts.

      We sincerely appreciate the reviewer’s insightful comments. While we acknowledge that m<sup>5</sup>C is not the most abundant post-transcriptional modification in mRNA, we believe that research into m<sup>5</sup>C modification holds considerable value. Numerous studies have highlighted its role in regulating gene expression and its potential contribution to disease progression. For example, recent publications have demonstrated that m<sup>5</sup>C modifications in mRNA can influence cancer progression, lipid metabolism, and other pathological processes (e.g., PMID: 37845385; 39013911; 39924557; 38042059; 37870216).

      We fully agree with the reviewer on the importance of maintaining scientific rigor in academic writing. While m<sup>5</sup>C is not the most abundant RNA modification, we cannot simply draw a conclusion that the level of modification should be the sole criterion for assessing its biological significance. However, to avoid potential confusion, we have removed the word “major”.

      COMMENTS ON FIGURE PRESENTATION:

      Figure 2D:

      The main text states: "DRAM-CBE induced C to U editing in the vicinity of the m<sup>5</sup>C site in AP5Z1 mRNA, with 13.6% C-to-U editing, while this effect was significantly reduced with APOBEC1 or DRAM<sup>mut</sup>-CBE (Fig.2D)." The Figure does not fit this statement. The seq trace shows a U signal of about 1/3 of that of C (about 30%), while the quantification shows 20+ percent

      Thank you for your kind suggestion. Upon visual evaluation, the sequencing trace in the figure appears to suggest a mutation rate closer to 30% rather than 22%. However, relying solely on the visual interpretation of sequencing peaks is not a rigorous approach. The trace on the left represents the visualization of Sanger sequencing results using SnapGene, while the quantification on the right is derived from EditR 1.0.10 software analysis of three independent biological replicates. The C-to-U mutation rates calculated were 22.91667%, 23.23232%, and 21.05263%, respectively. To further validate this, we have included the original EditR analysis of the Sanger sequencing results for the DRAM-CBE group used in the left panel of Figure 2D (see Author response image 2). This analysis confirms an m<sup>5</sup>C fraction (%) of 22/(22+74) = 22.91667, and the sequencing trace aligns well with the mutation rate we reported in Figure 2D. In conclusion, the data and conclusions presented in Figure 2D are consistent and supported by the quantitative analysis.

      Author response image 2.

      Figure 4B: shows now different numbers in Venn-diagrams than in the same depiction, formerly Figure 4A

      We sincerely thank the reviewer for pointing out this issue, and we apologize for not clearly indicating the changes in the previous version of the manuscript. In response to the initial round of reviewer comments, we implemented a more stringent data filtering process (as described in Figure 3F and method section) : "For high-confidence filtering, we further adjusted the parameters of Find_edit_site.pl to include an edit ratio of 10%–60%, a requirement that the edit ratio in control samples be at least 2-fold higher than in NSUN2 or NSUN6knockout samples, and at least 4 editing events at a given site." As a result, we made minor adjustments to the Venn diagram data in Figure 4A, reducing the total number of DRAM-edited mRNAs from 11,977 to 10,835. These changes were consistently applied throughout the manuscript, and the modifications have been highlighted for clarity. Importantly, these adjustments do not affect any of the conclusions presented in the manuscript.

      Figure 4B and D: while the overlap of the DRAM-Seq data with RNA bisulfite data might be 80% or 92%, it is obvious that the remaining data DRAM seq suggests a detection of additional sites of around 97% or 81.83%. It would be advised to mention this large number of additional sites as potential false positives, unless these data were normalized to the sites that can be allocated to NSUN2 and NSUN6 activity (NSUN mutant data sets could be substracted).

      Thank you for pointing this out. The Venn diagrams presented in Figure 4B and D already reflect the exclusion of potential false-positive sites identified in methyltransferasedeficient datasets, as described in our experimental filtering process, and they represent the remaining sites after this stringent filtering. However, we acknowledge that YBX1 and ALYREF, while preferentially binding to m<sup>5</sup>C-modified RNA, also exhibit some affinity for unmodified RNA. Although we employed rigorous controls, including DRAM<sup>mut</sup> and deaminase groups, to minimize false positives, the possibility of residual false positives cannot be entirely ruled out. Addressing this limitation would require even more stringent filtering methods, as discussed in lines 299–301 of the manuscript. We are committed to further optimizing the DRAM system to enhance the accuracy of transcriptome-wide m<sup>5</sup>C analysis in future studies.

      SFigure 1: It is clear that the wild type version of both reader proteins are robustly binding to RNA that does not contain m<sup>5</sup>C. As for the calculations of x-fold affinity loss of RNA binding using both ALYREF -mut or YBX1 -mut, this reviewer asks the authors to determine how much less the mutated versions of the proteins bind to a m<sup>5</sup>C-modified RNAs. Hence, a comparison of YBX1 versus YBX1 -mut (ALYREF versus ALYREF -mut) on the same substrate RNA with the same m<sup>5</sup>C-modified position would allow determining the contribution of the so-called modification binding pocket in the respective proteins to their RNA binding. The way the authors chose to show the data presently is misleading because what is compared is the binding of either the wild type or the mutant protein to different RNAs.

      We appreciate the reviewer’s valuable feedback and apologize for any confusion caused by the presentation of our data. We would like to clarify the rationale behind our approach. The decision to present the wild-type and mutant reader proteins in separate panels, rather than together, was made in response to comments from Reviewer 2. Below, we provide a detailed explanation of our experimental design and its justification.

      First, we confirmed that YBX1 and ALYREF exhibit stronger binding affinity to m<sup>5</sup>Cmodified RNA compared to unmodified RNA, establishing their role as m<sup>5</sup>C reader proteins. Next, to validate the functional significance of the DRAM<sup>mut</sup> group, we demonstrated that mutating key amino acids in the m<sup>5</sup>C-binding pocket significantly reduces the binding affinity of YBX1<sup>mut</sup> and ALYREF<sup>mut</sup> to m<sup>5</sup>C-modified RNA. This confirms that the DRAM<sup>mut</sup> group effectively minimizes false-positive results by disrupting specific m<sup>5</sup>C interactions.

      Crucially, in our pull-down experiments, both the wild-type and mutant proteins (YBX1/YBX1<sup>mut</sup> and ALYREF/ALYREF<sup>mut</sup>) were incubated with the same RNA sequences. To avoid any ambiguity, we have included the specific RNA sequence information in the Methods section (lines 463–468). This ensures a assessment of the reduced binding affinity of the mutant versions relative to the wild-type proteins, even though they are presented in separate panels.

      We hope this explanation clarifies our approach and demonstrates the robustness of our findings. We sincerely appreciate the reviewer’s understanding and hope this addresses their concerns.

      SFigure 2C: first two panels are duplicates of the same image.

      Thank you for pointing this out. We sincerely apologize for incorrectly duplicating the images. We have now updated Supplementary Figure 2C with the correct panels and have provided the original flow cytometry data for the first two images. It is important to note that, as demonstrated by the original data analysis, the EGFP-positive quantification values (59.78% and 59.74%) remain accurate. Therefore, this correction does not affect the conclusions of our study. Thank you again for bringing this to our attention.

      Author response image 3.

      SFigure 4B: how would the PCR product for NSUN6 be indicative of a mutation? The used primers seem to amplify the wildtype sequence.

      Thank you for your kind suggestion. In our NSUN6<sup>-/-</sup> cell line, the NSUN6 gene is only missing a single base pair (1bp) compared to the wildtype, which results in frame shift mutation and reduction in NSUN6 protein expression. We fully agree with the reviewer that the current PCR gel electrophoresis does not provide a clear distinction of this 1bp mutation. To better illustrate our experimental design, we have included a schematic representation of the knockout sequence in SFigure 4B. Additionally, we have provided the original sequencing data, and the corresponding details have been added to lines 151-153 of the manuscript for further clarification.

      Author response image 4.

      SFigure 4C: the Figure legend is insufficient to understand the subfigure.

      Thank you for your valuable suggestion. To improve clarity, we have revised the figure legend for SFigure 4C, as well as the corresponding text in lines 178-179. We have additionally updated the title of SFigure 4 for better clarity. The updated SFigure 4C now demonstrates that the DRAM-edited mRNAs exhibit a high degree of overlap across the three biological replicates.

      SFigure 4D: the Figure legend is insufficient to understand the subfigure.

      Thank you for your kind suggestion. We have revised the figure legend to provide a clearer explanation of the subfigure. Specifically, this figure illustrates the motif analysis derived from sequences spanning 10 nucleotides upstream and downstream of DRAMedited sites mediated by loci associated with NSUN2 or NSUN6. To enhance clarity, we have also rephrased the relevant results section (lines 169-175) and the corresponding discussion (lines 304-307).

      SFigure 7: There is something off with all 6 panels. This reviewer can find data points in each panel that do not show up on the other two panels even though this is a pairwise comparison of three data sets (file was sent to the Editor) Available at https://elife-rp.msubmit.net/elife-rp_files/2025/01/22/00130809/02/130809_2_attach_27_15153.pdf

      Response: We thank the reviewer for pointing this out. We would like to clarify the methodology behind this analysis. In this study, we conducted pairwise comparisons of the number of DRAM-edited sites per gene across three biological replicates of DRAM-ABE or DRAM-CBE, visualized as scatterplots. Each data point in the plots corresponds to a gene, and while the same gene is represented in all three panels, its position may vary vertically or horizontally across the panels. This variation arises because the number of mutation sites typically differs between replicates, making it unlikely for a data point to occupy the exact same position in all panels. A similar analytical approach has been used in previous studies on m6A (PMID: 31548708). To address the reviewer’s concern, we have annotated the corresponding positions of the questioned data points with arrows in Author response image 5.

      Author response image 5.

    1. eLife Assessment

      The research presents valuable findings on the impact of FRMD8 loss on tumor progression and resistance to tamoxifen therapy. Through a series of convincing and systematic experiments, the author thoroughly investigates the role of FRMD8 in breast cancer and its underlying regulatory mechanisms. The study confirms that FRMD8 holds potential as a therapeutic target for reversing tamoxifen resistance, offering helpful insights for future treatment strategies.

    2. Reviewer #1 (Public review):

      Summary:

      Tamoxifen resistance is a common problem in partially ER-positive patients undergoing endocrine therapy, and this manuscript has important research significance as it is based on clinical practical issues. The manuscript discovered that the absence of FRMD8 in breast epithelial cells can promote the progression of breast cancer, thus proposing the hypothesis that FRMD8 affects tamoxifen resistance and validated this hypothesis through a series of experiments. The manuscript has certain theoretical reference value.

      Strengths:

      At present, research on the role of FRMD8 in breast cancer is very limited. This manuscript leverages the MMTV-Cre+;Frmd8fl/fl;PyMT mouse model to study the role of FRMD8 in tamoxifen resistance, and single-cell sequencing technology discovered the interaction between FRMD8 and ESR1. At the mechanistic level, this manuscript has demonstrated two ways in which FRMD8 affects ERα, providing some new insights into the development of ER-positive breast cancer in patients who are resistant to tamoxifen.

      Limitations:

      Whether FRMD8 can become a biomarker should be verified in large clinical samples or clinical data.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 could serve as a potential target to revere tamoxifen resistance.

      The research is logically coherent and persuasive. The results support their conclusions and have achieved the research objectives.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tamoxifen resistance is a common problem in partially ER-positive patients undergoing endocrine therapy, and this manuscript has important research significance as it is based on clinical practical issues. The manuscript discovered that the absence of FRMD8 in breast epithelial cells can promote the progression of breast cancer, thus proposing the hypothesis that FRMD8 affects tamoxifen resistance and validating this hypothesis through a series of experiments. The manuscript has a certain theoretical reference value.

      Strengths:

      At present, research on the role of FRMD8 in breast cancer is very limited. This manuscript leverages the MMTV-Cre+;Frmd8fl/fl;PyMT mouse model to study the role of FRMD8 in tamoxifen resistance, and single-cell sequencing technology discovered the interaction between FRMD8 and ESR1. At the mechanistic level, this manuscript has demonstrated two ways in which FRMD8 affects ERα, providing some new insights into the development of ER-positive breast cancer in patients who are resistant to tamoxifen.

      Weaknesses:

      This manuscript repeatedly emphasizes the role of FRMD8/FOXO3A in tamoxifen resistance in ER-positive breast cancer, but the specific mechanisms have not yet been fully elucidated. Whether FRMD8 can become a biomarker should be verified in large clinical samples or clinical data.

      We appreciate your recognition and valuable suggestions. The proliferation of ERα-positive breast cancer cells is contingent upon the expression of ERα. Tamoxifen, a selective estrogen receptor modulator, competitively binds to ERα, thereby inhibiting the activation of the proliferation signaling pathway. Previous studies have demonstrated that the downregulation of ERα expression results in a reduction in the sensitivity of breast cancer cells to tamoxifen (PMID: 15894097; PMID: 922747). Our study revealed the molecular mechanism by which FRMD8 regulates ERα expression through FOXO3A and UBE3A, and thus FRMD8 deficiency is a cause of tamoxifen treatment resistance. 

      In this study, our results showed that low expression of FRMD8 predicts poor prognosis in breast cancer patients. We agree with this reviewer and will validate the role of FRMD8 in more patient samples and expand its application in different cancer types.

      Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 could serve as a potential target to revere tamoxifen resistance.

      Strengths:

      The majority of the research is logically clear, smooth, and persuasive.

      Weaknesses:

      Some research in the article lacks depth and some sentences are poorly organized.

      Thank you for your helpful suggestion. We have carefully revised the manuscript again. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      This manuscript suggests that the resistance of tamoxifen in breast cancer is linked to the loss of function of FRMD8. This is a relatively good and valuable contribution. However, there are several points that confused me.

      (1) The subfigures with important conclusions should include quantitative analysis, for example, Figure 4D, 4E, and 6A. In Figure 6F, which subtypes of normal and tumor tissues were investigated.

      Thank you for your helpful suggestions. We have quantified the bands in Figure 4D, 4E, and 6A and labelled them in the figures. 

      We have also provided details of the tumor samples in Table S3 and the “Materials and Methods” section. The majority of tumor tissues are invasive ductal carcinomas.

      (2) In the luminal epithelium-specific Frmd8 knockout mice (MMTV-Cre+; Frmd8fl/fl), the authors demonstrated that the loss of FRMD8 promotes the growth of breast tumors. In Figure 3A, the expression of ERα and PR in tumors is nearly negative. However, why was the validation of the mechanism performed in breast tumor cell lines and not in epithelial cells?

      Thanks for the question. Early-stage mammary tumors in MMTV-PyMT mice express ERα, while ERα is negative in advanced tumors of MMTV-PyMT mice. Figure 3A shows the results of tumors from four-month-old mice. Meanwhile, our supplementary results showed that loss of Frmd8 decreased ERα expression also in normal and atypical hyperplasia mammary tissues from 7-week-old MMTV-PyMT mice, when the mice had no palpable tumors and ERα is positive (Fig. S3E). We believe that the absence of FRMD8 contributes to the acceleration of the malignant progression during the dynamic evolution of breast cancer. Limited by the difficulty of transfection in breast normal epithelial cell line (MCF10A), we explored the subsequent mechanisms mainly in breast cancer cells and HEK293, a human embryonic kidney cell line. Besides, Figure S3E also showed the regulation of ERα expression by Frmd8 in mouse mammary

      epithelial cells.

      (3) To explore the mechanism by which FRMD8 inhibits ERα degradation, what is the reason for choosing HEK293A?

      Thank you for the good question. HEK293 cell line is commonly used in mechanistic studies. We also employed the breast cancer cell line T47D to verify the observations in HEK293 cells. Furthermore, the mass spectrometry result of HEK293A cells presented in Figure 5E was an additional experiment performed when we were exploring the regulation of the cell cycle by FRMD8, which is published in Cell Reports (PMID: 37527040). Based on the mass spectrometry result, we assumed that FRMD8 may influence ERα degradation mediated by UBE3A.

      Reviewer #2 (Recommendations for the authors):

      Introduction

      (1) In order for the reader to better understand the content of the article, it is better to briefly describe the role of ERα in the progression of breast cancer.

      Thank you for your suggestion. We have provided a brief description of the role of ERα in the introduction of revised manuscript:

      “ERα is a ligand-activated transcription factor that is activated by oestrogen, and promotes cell proliferation during breast cancer development (Harbeck et al., 2019).”

      (2) As ESR1 is mentioned in the second paragraph, a brief description of the relationship between ESR1 and ERα can make the article more logical.

      Thank you for the suggestion. We have added the description in the introduction:

      “Multiple transcription factors, such as AP-2γ, FOXO3, FOXM1, and GATA3, have been reported to bind to the promoter region of ESR1, the gene encoding ERα, and participate in transcriptional regulation of ESR1(Jia et al., 2019; Koš et al., 2001).”

      (3) In the text, there are two variations of the term FRMD8: 'FRMD8' and 'Frmd8'. It is best to standardize on one form throughout the document.

      We apologize for any confusion. The terms "FRMD8" and "Frmd8" are used to indicate proteins derived from human and mouse, respectively.

      Results

      (4) In Figure 2L, there is no noticeable difference in the expression levels of Pgr and Esr1 between the Cre+ tumor and Cre- tumor groups. Figure S2E is more suitable for inclusion in the main text compared to Figure 2L.

      Thank you for this suggestion. ERα and PR are positive in early-stage mammary tumors of MMTV-PyMT mice, while ERα and PR are gradually lost as the tumor progresses. In figure 2, mammary tumors from 4-month-old MMTV-PyMT mice were subjected to scRNA-seq analysis. Since the expression of ERα was very low in tumor cells at this time, there appears to be no difference between the two groups. We have exchanged Figure 2L and Figure S2E in the manuscript.

      (5) The CNV score can be used to assess the malignancy of cells, it would be better to compare the malignancy levels between the two groups.

      This is a very good suggestion. However, copy number variations usually occur randomly and have a high degree of heterogeneity. Due to the limited sample size in our study, we did not compare the difference between the two groups.

      (6) Enrichment analysis is crucial for single-cell sequencing studies. It is recommended to perform differential gene analysis and enrichment analysis between the Cre+ and Cre- groups to further explore the impact of FRMD8 deficiency on the functions of malignant cells.

      Thank you for your suggestion. We have performed differential gene analysis and biological process enrichment analysis on the results of scRNA sequence using the gene ontology (GO) database. Our results showed that upregulated genes in luminal progenitor (Lp) epithelial cells were enriched in epithelial cell proliferation and transmembrane receptor protein serine/threonine kinase signaling pathways, suggesting that Frmd8 deficiency significantly promotes epithelial cells proliferation in MMTV-PyMT mice.

      Author response image 1.

      (7) The coherent logic in lines 300 to 308 should be that FRMD8 is expressed at higher levels in normal Hsd epithelial cells in mice, hence further verification was conducted to examine the expression levels of FRMD8 in various human breast cancer cell lines.

      We have revised the figures and text as suggested.  

      Discussion

      (8) In lines 352 to 360, the background narrative in the first half seems to have little connection with the research findings in the second half; it is suggested to reorganize the language of this section.

      Thank you for the advice. We have rewritten this paragraph in the manuscript:

      “In MMTV-PyMT mice, early-stage mammary tumors express ERα and PR, but these receptors are gradually lost as the tumor progresses (Lapidus et al., 1998). Our scRNA-seq results revealed that mammary tumor epithelial cells in MMTV-PyMT mice fall into four clusters, with only Hsd epithelial cells showing ERα and PR expression. Additionally, Hsd epithelial cells exhibited the lowest CNV score, indicating a closer resemblance to normal epithelial cells. The loss of Frmd8 reduced the proportion of Hsd epithelial cells and led to a downregulation of ERα and PR expression, implying that Frmd8 deficiency promotes the loss of luminal features in the mammary gland and accelerates mammary tumor progression.”

      (9) As stated in the result section, the depletion of FRMD8 may lead to the decrease of the Hsd epithelial cells proportion, it might be beneficial to discuss the significance of this finding.

      We have added a discussion of the Hsd epithelial cell proportion in the third paragraph of this section (please refer to the above question (8) ).

      Figures

      (10) The structural layout of Figure 4 should be reorganized to make it more aesthetically pleasing.

      Thank you for this suggestion. We have rearranged Figure 4 as suggested.

    1. eLife Assessment

      This study presents valuable findings on the control of survival and maintenance of a specific set of brain resident immune cells. The authors generate a new animal model to enable sophisticated analysis of cell function in vivo. The sophisticated knock-in/knock-out alleles are compelling, although the work would ultimately be strengthened with further mechanistic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The article entitled "Pu.1/Spi1 dosage controls the turnover and maintenance of microglia in zebrafish and mammals" by Wu et al., identifies a role for the master myeloid developmental regulator Pu.1 in the maintenance of microglial populations in the adult. Using a non-homologous end joining knock-in strategy, the authors generated a pu.1 conditional allele in zebrafish, which reports wildtype expression of pu.1 with EGFP and truncated expression of pu.1 with DsRed after Cre-mediated recombination. When crossed to existing pu.1 and spi-b mutants, this approach allowed the authors to target a single allele for recombination and induce homozygous loss-of-function microglia in adults. This identified that although there is no short-term consequence to loss of pu.1, microglia lacking any functional copy of pu.1 are depleted over the course of months, even when spi-b is fully functional. The authors go on to identify reduced proliferation, increased cell death, and higher expression of tp53 in the pu.1 deficient microglia, as compared to the wild-type EGFP+ microglia. To extend these findings to mammals, the authors generated a conditional Pu.1 allele in mice and performed similar analyses, finding that loss of a single copy of Pu.1 resulted in similar long-term loss of Pu.1-deficient microglia. The conclusions of this paper are overall well supported by the data.

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

    3. Reviewer #2 (Public review):

      Summary:

      In the presented work by Wu et al, the authors investigate the role of the transcription factor Pu.1 in the survival and maintenance of microglia, the tissue-resident macrophage population in the brain. To this end, they generated a sophisticated new conditional pu.1 allele in zebrafish using CRISPR-mediated genome editing which allows visual detection of expression of the mutant allele through a switch from GFP to dsRed after Cre-mediated recombination. Using EdU pulse-chase labelling, they first estimated the daily turnover rate of microglia in the adult zebrafish brain which was found to be higher than rates previously estimated for mice and humans. After conditional deletion of pu.1 in coro1a positive cells, they do not find a difference in microglia number at 2 and 8 days or 1-month post-injection of Tamoxifen. However, at 3 months post-injection, a strong decrease in mutant microglia could be detected. While no change in microglia number was detected at 1mpi, an increase in apoptotic cells and decreased proliferation as observed. RNA-seq analysis of WT and mutant microglia revealed an upregulation of tp53, which was shown to play a role in the depletion of pu.1 mutant microglia as deletion in tp53-/- mutants did not lead to a decrease in microglia number at 3mpi. Through analysis of microglia number in pU.1 mutants, the authors further show that the depletion of microglia in the conditional mutants is dependent on the presence of WT microglia. To show that the phenomenon is conserved between species, similar experiments were also performed in mice.

      This work expands on previous in vitro studies using primary human microglia. The majority of conclusions are well supported by the data, addition of controls and experimental details would strengthen the conclusions and rigor of the paper.

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown.

    4. Author response:

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. A previous study by Tschan et al(1) has shown that PU.1 attenuates the transcriptional activity of the p53 tumor suppressor family through direct binding to the DNA-binding and/or the oligomerization domains of p53/p73 proteins. We will discuss this point in the revised manuscript and cite this paper accordingly. Moreover, to further investigate the interaction between Pu.1 and Tp53 in zebrafish, we intend to perform a comprehensive analysis of the tp53 promoter region utilizing bioinformatic prediction tools. This approach aims to identify potential Pu.1 binding sites, thereby providing insights into the direct regulatory interactions between Pu.1 and the tp53 promoter in zebrafish. 

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      In the revised manuscript, we will elaborate on the methodological details of the RNA analysis. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We will explicitly acknowledge this technical constraint in the revised manuscript to ensure methodological transparency.

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Author response image 1). Microglial death occurs only when Pu.1 is disrupted in the spi-b mutant background, in both embryonic and adult brains. The blebbing morphology of some microglia after pu.1 conditional knock out in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic (Figure S4) and adult stages Author response image 2). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Figure 2) versus conditional pu.1 ablation. Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We plan to include this clarification in the revised manuscript.

      Author response image 1.

      Conditional depletion of Pu.1 in embryonic microglia had no effect for their short-term survival. (A) Schematics of 4-OHT treatment for pu.1<sup>KI/WT</sup> Tg(coro1a:CreER) and pu.1<sup>KI/Δ839</sup> Tg(coro1a:CreER) at embryonic stage. (B) Representative images of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 5 dpf. (C) Quantification of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 3 dpf and 5 dpf. Values represent means ± SD, n.s., P >0.05.

      Author response image 2. Simultaneous inactivation of Pu.1 and Spi-b lead to microglia death in adult zebrafish. (A) The experimental setup for pu.1 conditional knockout in adult spi-b<sup>Δ232/Δ232</sup> mutants (B) the representative images of the midbrain cross section of adult pu.1<sup>KI/+</sup>;spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) and pu.1<sup>KI/WT</sup>spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) fish at 2 dpi. The white arrow indicates microglia with blebbing morphology.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of SPI-B expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineagespecific roles, becoming absent in microglia. We will expand on this evolutionary divergence and its implications for microglial regulation in the revised manuscript.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We plan to represent our data as mean ± SD in the revised manuscript.

      Reference:

      (1) Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE. PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene. 2008 May 29;27(24):3489-93.

    1. eLife Assessment

      Du et al. present a valuable study on neural activation in medial prefrontal cortex (mPFC) subpopulations projecting to the basolateral amygdala (BLA) and nucleus accumbens (NAc) during behavioral tasks assessing anxiety, social preference, and social dominance. The study has innovative approaches and solid in vivo calcium imaging data, but the evidence linking neural physiology to behavioral outcomes is incomplete. Addressing these gaps would significantly enhance the understanding of how distinct mPFC→BLA and mPFC→NAc pathways influence anxiety, exploration, and social behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that neurons in the medial prefrontal cortex (mPFC) are involved in higher cognitive functions such as executive planning, motivational processing, and internal state-mediated decision-making. These internal states often correlate with the emotional states of the brain. While several studies point to the role of mPFC in regulating behavior based on such emotional states, the diversity of information processing in its sub-populations remains a less explored territory. In this study, the authors try to address this gap by identifying and characterizing some of these sub-populations in mice using a combination of projection-specific imaging, function-based tagging of neurons, multiple behavioral assays, and ex-vivo patch clamp recordings.

      Strengths:

      The authors targeted mPFC projections to the nucleus accumbens (NAc) and basolateral amygdala (BLA). Using the open field task (OFT), the authors identified four relevant behavioral states as well as neurons active while the animal was in the center region ("center-ON neurons"). By characterizing single-unit activity and using dimensionality reduction, the authors show differentiated coding of behavioral events at both the projection and functional levels. They further substantiate this effect by showing higher sensitivity of mPFC-BLA center-ON neurons during time spent in the open arms of the elevated plus maze (EPM). The authors then pivoted to the three-chamber social interaction (SI) assay to show the different subsets of neurons encode preference for social stimulus over non-social. This reveals an interesting diversity in the function of these sub-populations on multiple levels. Lastly, the authors used the tube test as a manipulation of the anxiety state of mice and compared behavioral differences before/after the OFT and social interaction tasks. This experiment revealed that "losers" of the tube test spend less time in the center of the open field while "winners" show a stronger preference for the familiar mouse over the object. Using patch-clamp experiments, the authors also found that "winners" exhibit stronger synaptic transmission in the mPFC-NAc projection while "losers" exhibit stronger synaptic transmission in the mPFC-BLA projection. Given the popularity of the tube test assay in rank determination, this provides useful insights into possible effects on anxiety levels and synaptic plasticity. Overall, the many experiments performed by the authors reveal interesting differences in mPFC neurons relative to their involvement in high or low anxiety behaviors, social preference, and social rank.

      Weaknesses:

      The authors focused primarily on female mice without commenting on the effect that sex differences would have on their results. While the authors have identified relevant behavioral states across the various behavioral tasks, there is still a missing link between them and "emotional states" - the phrase used by them emphatically throughout the manuscript. The authors have neither provided adequate references to satisfy this gap nor shared any data pertaining to relevant readouts such as cortisol levels. Both the projection-specific recordings and patch-clamp experiments, including histology reports in the manuscript, would provide essential information for anyone trying to replicate the results, especially since it's known that sub-populations in the BLA and NAc can have vastly different functions. The population-level analysis in the manuscript requires more rigor to reduce bias and statistical controls for establishing the significance of their results. Lastly, the tube test is used as a manipulation of the "emotional state" in several of the experiments. While the tube test can cause a temporary spike in anxiety of the participating mice, it is not known to produce a sustained effect - unless there are additional interventions such as forced social defeat. Thus, additional controls for these experiments are essential to support claims based on changes in the emotional state of mice. Apart from the methodology, the manuscript could also be improved with the addition of clear scatter points in all the plots along with detailed measures of the statistical tests such as exact p values and size of groups being compared.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA ) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appears particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within each region. The electrophysiological data have significant experimental confounds and more methodological information is required to support other conclusions related to these data.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the distinct contributions of mPFC→BLA and mPFC→NAc pathways in emotional regulation, with implications for understanding anxiety, exploration, and social preference behaviors. Using Ca2+ imaging, optogenetics, and patch-clamp recording, the authors demonstrate pathway-specific roles in encoding emotional states of opposite valence. They further identify subsets of neurons ("center-ON") with heightened activity under anxiety-inducing conditions. These findings challenge the traditional view of functional similarity between these pathways and provide valuable insights into neural circuit dynamics relevant to emotional disorders.

      The study is well-designed and addresses an important topic, but several methodological and interpretational issues require clarification to strengthen the conclusions.

      Weaknesses:

      Major Weaknesses:

      (1) The manuscript does not clearly and consistently specify the sex of the mice used for behavioral and imaging experiments. Given the known influence of sex on emotional behaviors and neural activity, this omission raises concerns about the generalizability of the findings. The authors should make clear throughout the manuscript whether male, female, or mixed-sex cohorts were used and provide a rationale for their choice. If only one sex was used, the potential limitations of this approach should be explicitly discussed.

      (2) Mice lacking "center-ON" neurons were excluded from analysis, yet the manuscript draws broad conclusions about the encoding of emotional states by mPFC pathways. It is critical to justify this exclusion and discuss how it may limit the generalizability of the findings. The inclusion of data or contextualization for animals without center-ON neurons would strengthen the interpretation.

      (3) The manuscript lacks baseline activity comparisons for mPFC→BLA and mPFC→NAc pathways across subjects. Providing baseline data would contextualize the observed activity changes during behavior testing and help rule out inter-individual variability as a confounding factor.

      (4) Extensive behavioral testing across multiple paradigms may introduce stress and fatigue in the animals, which could confound the induction of emotional states. The authors should describe the measures taken to minimize these effects (e.g., recovery periods, randomized testing order) and discuss their potential impact on the results.

      (5) Grooming is described as a "non-anxiety" behavior, which conflicts with its established role as a stress-relieving behavior that may indicate anxiety. This discrepancy requires clarification, as the distinction is central to the conclusions about the mPFC→BLA pathway's role in differentiating anxiety-related and non-anxiety behaviors.

      (6) While the study highlights pathway-specific neural activity, it lacks a cohesive integration of these findings with the behavioral data. Quantifying the overlap or decorrelation of neuronal activity patterns across tasks would solidify claims about the specialization of mPFC→NAc and mPFC→BLA pathways. Likewise, the discussion should be expanded to place these findings in light of prior studies that have probed the roles of these pathways in social/emotion/valence-related behaviors.

      Minor Weaknesses:

      (1) The manuscript does not explicitly state whether the same mice were used across all behavioral assays. This information is critical for evaluating the validity of group comparisons. Additionally, more detail on sample sizes per assay would improve the manuscript's transparency.

      (2) In Figure 2G, the difference between BLA and NAc activity during exploratory behaviors (sniffing) is difficult to discern. Adjusting the scale or reformatting the figure would better illustrate the findings.

      (3) While the characteristics of the first social stimulus (M1) are specified, there is no information about the second social stimulus (M2). This omission makes it difficult to fully interpret the findings from the three-chamber test.

      (4) The methods section lacks detailed information about statistical approaches and animal selection criteria. Explicitly outlining these procedures would improve reproducibility and clarity.

    5. Author response:

      Reviewing editor comments:

      Overall, the reviewers found the imaging data to be strong but identified the physiology experiments as the weakest aspect of the study. Please consider either removing Figures 7 and 8 from the manuscript or significantly revising the data. If you choose to revise these figures, refer to the specific reviewer comments addressing them. Additionally, several reviewers noted that the prior literature was not adequately cited, so please consider addressing this concern.

      As noted below, we will work to strengthen the physiological side of the study and ensure that we are more scrupulous in citing the prior literature. Below we summarize the major concerns of each reviewer and outline our proposed response.

      Reviewer #1:

      (1) Sex differences and generalizability

      Various studies have shown sex differences in emotional responses and neural activity in mice, but to study both male and female mice would have required much larger numbers of mice than we could accommodate for practical reasons, so we chose to use only female mice to lay a solid foundation for future studies that compare (and perhaps contrast) males.

      We will:

      Make clear in the main text that we used only females.

      Cite literature on sex-specific mPFC-BLA/NAc functions in the Discussion.

      (2) Missing link between behavioral states and "emotional states"...relevant readouts such as cortisol

      We appreciate the reviewer pointing out this inadvertent conceptual slippage. We will:

      Include corticosterone measurements using an ELISA kit from archived plasma samples (collected before and after OFT/EPM tests) to correlate with behavioral and neural activity (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Be more precise in our language to differentiate behavioral correlates from inferred emotional states.

      Carefully review the literature on OFT center time, EPM open-arm exploration, and tube test outcomes as anxiety/social hierarchy indicators and decide the best interpretation for our findings.

      (3) Improve methodological detail and rigor of population-level analysis

      We will:

      Expand the methods section with electrophysiology parameters (e.g., access resistance criteria, stimulus protocols).

      Add detailed histology figures (viral targeting, electrode placements) for mPFC-BLA/NAc projections.

      Include raw data points in all plots and report exact p-values, effect sizes, and group sizes (e.g., n = 12 cells from 4 mice).

      To enhance statistical rigor, we will provide clearer scatter plots with individual data points, report exact p-values, and specify group sizes in all figures.

      (4) Acute vs. sustained effects after tube test and additional controls

      We would like to clarify that we used repeated tube tests (3 times a day and continuing for 7 days) for assessing sustained rank effects. To address concerns about sustained emotional state changes post-tube test, we will:

      Assess corticosterone levels pre/post-tube test (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Discuss the transient nature of hierarchy effects and cite studies using repeated tube tests for sustained rank effects.

      Reviewer #2:

      (1) Sub-region targeting in BLA/NAc

      Although different subregions within the BLA and NAc receive distinct inputs and exhibit diverse functions, comparing neuronal activity across these subregions is beyond the scope of this paper. Our primary focus is on mPFC projections, emphasizing presynaptic activity rather than postsynaptic activity within the NAc and BLA. We focused on the PL-NAc shell and PL-BLA (BA) regions because PL-to-NAc shell projections in mice are well-documented, particularly in studies utilizing viral tracers and optogenetic tools (Britt et al., Neuron, 2012; Bossert et al., J. Neurosci., 2012). These projections regulate aversive behaviors, stress responses, and motivational states and are implicated in drug-seeking behaviors and emotional valence encoding (Jocelyn & Berridge, Biol. Psychiatry, 2013; Fetcho et al., Nat. Commun., 2023; Capuzzo & Floresco, J. Neurosci., 2020; Xie et al., BioRxiv., 2025; Domingues et al., Nat Commun., 2025). The PL-BLA projection in turn sends topographically organized projections to BLA subregions, primarily targeting the basal (BA) nuclei of the BLA (McGarry & Carter, J. Neurosci., 2016; Hoover & Vertes, Brain Struct. Funct., 2007). Both the recorded NAc shell and BLA subregions are involved in emotional valence encoding.

      A detailed comparison of neuronal activity across different NAc shell and BLA subregions or comparing different cell types, such as NAc shell D1- and D2-medium spiny neurons, could each be the subject of a whole other study. Nevertheless,

      We will discuss how sub-region connectivity could contribute to observed heterogeneity in the discussion, citing relevant studies, and make sure we clarify our rationale for our experimental design.

      (2) Electrophysiological confounds

      To strengthen the rationale for our patch-clamp recordings, we will:

      Clarify in methods that recordings were performed in acute slices from behaviorally naive mice (post-tube test) to isolate synaptic changes.

      Include access resistance and cell health criteria (e.g., resting membrane potential, input resistance ranges), along with precise optogenetic stimulus protocols.

      Add example traces of mEPSCs/mIPSCs and quantify exclusion rates.

      Reviewer #3:

      (1) Specify the sexes used throughout the manuscript.

      We will make this clear throughout the paper.

      (2) Exclusion of mice lacking "center-ON" neurons

      We will:

      Explain the exclusion of mice that lacked center-ON neurons. We will also discuss the potential interpretations (e.g., floor effects in anxiety tasks) in the limitations section.

      (3) Baseline activity comparisons

      We will:

      Add baseline neuronal activity comparison between mPFC-BLA and mPFC-NAc neurons.

      (4) Stress from repeated behavioral testing

      We will:

      Clarify our experimental design to state how we tried to minimize the stress caused by multiple behavioral assays.

      Include pre-test habituation protocols in methods.

      Discuss potential cumulative stress effects in limitations.

      (5) Grooming classification

      While the reviewer is correct that grooming can be a stress-relieving behavior, it also obviously has many other functions, from the pragmatic to the social. In our study grooming occurred primarily in the periphery of the open field test, where it was exhibited as a behavior corresponding to neural activity patterns that differed from that which occurred in the center. As we classify the behavior in the center zone of the open field test as anxiety-like, we interpreted the peripheral grooming as indicative of the animal's adjustment to a novel environment, as suggested by previous work (Estanislau et al., Neurosci. Res., 2013; Rojas-Carvajal et al., Animal Behaviour, 2018). The nature of the grooming was primarily rostral body-licking, which accords with what Rojas-Carvajal et al. calls a “de-arousal inhibition system” that subserves novelty habituation. The duration and nature of this behavior are, interestingly enough, influenced by whether the mouse or rat lived in an enriched environment prior to the OFT (enriched environments made them quicker to explore a new environment but also quicker to get bored - no surprise, really).

      We did not explain any of this in the manuscript, however, so in our revision, we will make sure to discuss these nuances and cite the relevant literature.

      (6) Integrate neuronal activity and behavioral data

      We will:

      Include additional analyses quantifying neuronal activity overlap across tasks and refine our Discussion to better integrate these findings with prior literature.

      Perform cross-correlation analyses to quantify activity overlap between OFT, EPM, and SI tasks.

      Minor weaknesses

      - Clarify the cohorts of mice that were used for each behavioral assay.

      - Adjust Figure 2G scale and add insets to highlight sniffing differences.

      - Specify that M1/M2 were age-/sex-matched unfamiliar mice in the three-chamber test.

      - Detail statistical tests (e.g., mixed-effects models) and animal selection criteria in methods.

      We believe these revisions will address the reviewers’ major concerns and significantly improve the manuscript. We welcome further feedback on these plans and will provide updated figures/data for the resubmission.

    1. eLife Assessment

      The authors of this important study investigate how telomere length regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter, while short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. There is convincing support for the claims and the findings should be of broad interest for cell biologists and those working in fields where telomeres alter function, such as cancer and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

    3. Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      The cell proliferation and morphology of the engineered cells were monitored during experiments. With a doubling time within 16-18 hours, all the cancer cell line pairs used in the study were counted and seeded equally before experiments.

      No significant difference in morphology or cell count (before harvesting for experiments) was noted for the stable cell lines, namely, HT1080 ST-HT1080 LT, HCT116 p53 null scrambled control-HCT116 p53 null hTERC knockdown.

      MDAMB 231 cells which were treated with guanine-rich telomere repeats (GTR) over a period of 12 days, as per the protocol mentioned in Methods. Due to the alternate day of GTR treatment in serum-free media followed by replenishment with serum-supplemented media, we noted that cells would undergo periodic delay in their proliferation (or transient arrest) aligning with the GTR oligo-feeding cycles and appeared somewhat larger in comparison to their parental untreated cells.

      Next, the cells with Cas9-telomeric sgRNA mediated telomere trimming were maintained transiently (till 3 days after transfection). During this time, no significant change in morphology or cell proliferation was observed in any of the cell lines, namely HCT116 or HEK293T Gaussia Luciferase reporter cells. iPSCs were also monitored. However, no change in morphology or cellular proliferation was observed during the 5 days post-transfection and antibiotic selection.  

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      The reviewer correctly highlights (as we also acknowledge in the Discussion) that our study primarily utilizes engineered cell lines with artificially elongated or shortened telomeres. We agree that using ALT-negative cells with naturally short versus long telomeres would provide additional perspective in testing our hypothesis. However, a key challenge in this experimental setup is the inherent variation in TRF2 protein levels among these cell types—a parameter central to our hypothesis. Comparing observations across such non-isogenic cell line pairs would require extensive normalization for multiple factors and could introduce additional complexities, potentially raising more questions among scientific readers.

      We had also explored primary cells, specifically foreskin fibroblasts and MRC5 lung fibroblasts, as suggested by the reviewer. However, we encountered two significant challenges. To achieve a notable telomere length difference of at least 20%, these primary cells had to undergo a minimum of 25 passages. During this period, we observed a substantial decline in their proliferation capacity and an increased tendency toward replicative senescence. Additionally, we noted a significant reduction in TRF2 protein levels as the primary cells aged, consistent with findings from Fujita K et al., 2010 (Nat Cell Biol.), which reported p53-induced, Siah-1-mediated proteasomal degradation of TRF2. Due to these practical limitations, we focused on cancerous cell lines with an isogenic background, ensuring a controlled experimental framework. This, in turn, opens new avenues for future research to explore broader implications. Investigating other primary cell types that may not present these challenges could be a valuable direction for future studies.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      In this study, we utilized a Doxycycline-inducible hTERT expression system to modulate telomere length in cancer cells, aiming to capture any gradual changes that might occur upon steady telomerase induction or overexpression—an event frequently observed in cancer progression. We monitored telomere length and telomerase activity at regular intervals (Supplementary Figure 2), noting a gradual increase until a characteristic threshold was reached, followed by a reversal to the initial telomere length.

      While this model provides interesting insights in context of cancer cells, it does not replicate the conditions of aging or therapeutic intervention. We agree that exploring telomere length-dependent regulation of hTERT in normal aging cells is an important avenue for future research. Investigating TRF2 occupancy on the hTERT promoter in response to telomere length alterations through therapeutic interventions—such as telomestatin or imetelstat (telomerase inhibitors) and 6-thio-2’-deoxyguanosine (telomere damage inducer)—would provide valuable insights and warrants further exploration.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      In our previous study (Sharma et al., 2021, Cell Reports), we have experimentally demonstrated that GABPA and TRF2 do not compete for binding at the mutant hTERT promoter (Figure 4M-R). Silencing GABPA in various mutant hTERT promoter cells did not increase TRF2 binding. While GABPA has been reported to show increased binding at the mutant promoter compared to the wild-type (Bell et al., 2015, Science), no telomere length (TL) sensitivity has been noted yet. This manuscript shows that telomere alterations in hTERT mutant cells do not significantly increase TRF2 occupancy at the promoter, reinforcing our earlier findings that G-quadruplex formation is crucial for TRF2 recruitment. Since TRF2 binding does not increase significantly at the mutant promoter and does not compete with GABPA, TL-sensitive TRF2 binding is unlikely to directly influence ETS binding by GABPA. Hence, increased GABPA binding to the mutant promoter as reported in the literature, remains independent of TL-sensitive TRF2 binding. However, an experimental demonstration of the above observation-based speculation would be ideal to answer the query in the future.

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      We agree with the reviewer’s suggestion that stabilizing G-quadruplex (G4) structures in mutant promoter cells under ST and LT conditions would further strengthen our hypothesis. From our ChIP experiments on hTERT promoter mutant cells following G4 stabilization with ligands, as reported in Sharma et al. 2021 (Figure 5G), we observed that TRF2 occupancy was regained in the telomere-length unaltered versions of -124G>A and -146G>A HEK293T Gaussia luciferase cells (referred to as LT cells in the current manuscript).

      Based on these published findings, we anticipate a similar restoration of TRF2 binding in the short telomere (ST) versions, given the increased availability of TRF2 protein molecules, as proposed in our Telomere Sequestration Partitioning model.

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      In this study, we employed both telomerase-dependent and independent methods for telomere elongation.

      HT1080 model: Telomere elongation resulted from constitutive overexpression of hTERC and hTERT, leading to a direct correlation with telomerase activity.

      HCT116 (p53-null) model: hTERC silencing in ST cells, a known limiting factor for telomerase activity, resulted in significantly lower telomerase activity and a 1.5-fold telomere length difference.

      MDAMB231 model: Guanine-rich telomeric repeat (GTR) feeding induced telomere elongation through recombinatorial mechanisms (Wright et al., 1996), leading to significant telomere length gain but no notable change in telomerase activity.

      HCT116 Cas9-telomeric sgRNA model: Telomere shortening occurred without modifying telomerase components, resulting in a minor, insignificant increase in telomerase activity (Figure 2A, S1).

      Regarding xenograft-derived HT1080 ST and LT cells (Figure 4B, S3), the observed variability in telomere length and telomerase activity may stem from infiltrating mouse cells, which naturally have longer telomeres and higher telomerase activity than human cells. Since in the reported assay tumour masses were not sorted to exclude mouse cells, using species-specific markers or fluorescently labelled HT1080 cells in future experiments would minimize bias. However, even though telomere length and telomerase activity assays cannot differentiate for cross-species differences, mRNA analysis and ChIP experiments performed specifically for hTERT and hTERC mRNA levels, TRF2 occupancy, and H3K27me3 enrichment on hTERT promoter (Figure 4B–E) strongly support our conclusions.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

      The current study provides experimental evidence that TRF2, a well-characterized telomere-binding protein, mediates crosstalk between telomeres and the regulatory region of the hTERT gene in a telomere length-dependent manner. Given the observed link between hTERT expression and telomere length, it is likely that additional telomere-associated proteins and regulatory pathways contribute to this regulation.

      The remaining shelterin complex components—POT1, hRap1, TRF1, TIN2, and TPP1—may play crucial roles in this context, as they are integral to telomere maintenance and protection. Additionally, several DNA damage response (DDR) proteins, which interact with telomere-binding factors and help preserve telomere integrity, could potentially influence hTERT regulation in a telomere length-dependent manner. However, direct interactions or regulatory roles would require further experimental validation. Another group of proteins with potential relevance in this mechanism are the sirtuins, which directly associate with telomeres and are known to positively regulate telomere length, undergoing repression upon telomere shortening. Notably, SIRT1 has been reported to interact with telomerase (Lee SE et al., 2024, Biochem Biophys Res Commun.), while SIRT6 has been implicated in TRF2 degradation and telomerase activation. Given their roles in telomere homeostasis, sirtuins may serve as key mediators of telomere length-dependent hTERT regulation.

      Beyond protein-mediated mechanisms like the Telomere Sequestration partitioning model, telomere length-dependent regulation of hTERT may also involve chromatin architecture. The Telomere Position Effect—Over Long Distances (TPE-OLD), a phenomenon whereby telomere conformation influences gene expression at distant loci, has been reviewed extensively (Kim et al., 2018, Differentiation).

      Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      We appreciate the reviewer’s recognition of the resource-intensive nature of our experiments, and we are confident in the robustness of the observed results. Due to the project’s timeline constraints and the need for consistency across experiments, we have reported findings based on 3 biological replicates with appropriate statistical analysis.

      Regarding the fibroblast-iPSC model, we would like to clarify that we have presented data from two independent biological replicates, each consisting of a fibroblast and its derived iPS cell pair, rather than a single sample. Additionally, the Tel-FACS assay involved analyzing at least 10,000 events, ensuring statistical significance in all cases. Alongside this, we also conducted qRT-PCR-based telomere length determination assays. While both assays were performed, we chose to report the more sensitive Tel-FACS data in the manuscript to provide a clearer representation of the results.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      We thank the reviewer for their valuable feedback. In response to the comment about the control group and error calculation, we would like to clarify our approach. In our previous analysis, we set the control group (Day 0) as 1 to calculate the fold change and did not include error bars, as there was no variation in the control group (since all values were normalized to 1). However, as per the reviewer’s suggestion, we will now include error bars on the Day 0 control group. These error bars will be calculated based on the standard deviation (SD) of the Ct values across the biological replicates for the control group. For the Day 10 and Day 24 time points, we retain the error bars that reflect the variance in fold change across replicates, as originally reported.

      This adjustment would allow for a clearer representation of the data and variance in the control group. We believe this addresses the reviewer’s concerns about the error calculation, and we shall update the figure legend and methods to reflect these changes. Statistical analysis, including ANOVA, was already applied as indicated in the figure.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      We appreciate the reviewer's thoughtful feedback on the presentation of the luciferase assay data in Figure 5. The data for the wild-type hTERT promoter (capable of forming G4 structures) was previously reported in Figure 2G-K. To avoid redundancy in data presentation, we initially chose to report the results of the mutated promoter separately. However, we recognize that directly comparing the wild-type and mutated promoter constructs within the same figure would provide clearer context and strengthen the interpretation of the results. In light of this, we will revise Figure 5 in the updated manuscript to include the data for both constructs, ensuring a more comprehensive and informative comparison.

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

      We sincerely thank the reviewer for their constructive feedback on the formatting and clarity of the figures. We appreciate the time and effort taken to suggest ways to enhance the visual presentation and readability of the manuscript. We agree that clearer differentiation of the experimental groups would help avoid confusion, and we will consider ways to improve the visual organization, as much as possible. Additionally, we will work on restructuring the graphs for greater consistency in labeling and alignment, especially in Figure 2, to improve readability and reduce the need for cross-referencing between the figures, figure legends, and methods section. We will also ensure the hTERT promoter GAPDH (-ve control) label appears under all relevant graphs for consistency. We will make revisions to the figures in line with these suggestions to improve the overall clarity and flow of the manuscript, as much as possible.

    1. eLife Assessment

      This study provides an important method to model the statistical biases of hypermutations during the affinity maturation of antibodies. The authors show convincingly that their model outperforms previous methods with fewer parameters; this is made possible by the use of machine learning to expand the context dependence of the mutation bias. They also show that models learned from nonsynonymous mutations and from out-of-frame sequences are different, prompting new questions about germinal center function. Strengths of the study include an open-access tool for using the model, a careful curation of existing datasets, and a rigorous benchmark; it is also shown that current machine-learning methods are currently limited by the availability of data, which explains the only modest gain in model performance afforded by modern machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      (1) The new model structure is quite clever and will provide a powerful way to explore larger models.

      (2) Careful attention is paid to curating and processing large existing data sets.

      (3) The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Weaknesses:

      (1) 10x/single cell data has a fairly different error profile compared to bulk data. A synonymous model should be built from the same `briney` dataset as the base model to validate the difference between the two types of training data.

      (3) The decision to test only kernels of 7, 9, and 11 is not described. The selection/optimization of embedding size is not explained. The filters listed in Table 1 are not defined.

    3. Reviewer #2 (Public review):

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity.

      Key findings include:

      (1) Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.

      (2) Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM, or different biases in the type of data.

      (3) Open Source Contributions: The release of a Python package and pre-trained models adds practical value for the community.

      However, readers should be aware of the limitations. The improvements over existing models are modest, and the work is constrained by the availability of high-quality out-of-frame sequence data. The study also highlights that more complex modeling techniques, like transformers, did not enhance predictive performance, which underscores the role of data availability in such studies.

    4. Reviewer #3 (Public review):

      Summary:

      Modeling and estimating sequence context biases during B cell somatic hypermutation is important for accurately modeling B cell evolution to better understand responses to infection and vaccination. Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model's performance with rigorous testing across multiple subjects and datasets. Prior work has captured the mutation biases of fixed 3-, 5-, and 7-mers, but each of these expansions has significantly more parameters. The authors developed a machine-learning-based approach to learn these biases using wider contexts with comparatively few parameters.

      Strengths:

      Well-motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      Weaknesses:

      This study could be improved with better descriptions of dataset sequencing technology, sequencing depth, etc but this is a minor weakness.

    1. eLife Assessment

      Using an unbiased approach, this important study discovered a role of Ezh2 in the differentiation of granule neuron precursors, the cell of origin for Shh group of medulloblastoma. Furthermore, the authors also provided solid evidence that combined inhibition of Ezh2 and CDK4/6 likely represents a promising strategy for the treatment of this subgroup of MB. Validation of these findings using the FDA-approved Ezh2 inhibitor is needed to further strengthen this preclinical study.

    2. Reviewer #1 (Public review):

      In this manuscript, Purzner and colleagues examine the role of Ezh2 in cerebellar development and tumorigenesis using animal models of SHH medulloblastoma (MB). While Ezh2 plays a relatively minor role in granule neuron development and SHH MB, the authors demonstrate that Ezh2 inhibition, when combined with enforced cell cycle exit, promotes MB cell differentiation and potentially reduces malignancy. Overall, this study is solid and provides valuable insights into Ezh2 regulation in cerebellar development and SHH-MB tumorigenesis.

      Strengths:

      The authors investigate the role of Ezh2 in granule neuronal differentiation during cerebellar development and medulloblastoma (MB) progression, integrating multi-omics for a comprehensive epigenetic analysis. The use of Ezh2 conditional knockout (cKO) mice and combination therapy with Ezh2 and CDK4/6 inhibitors shows a promising strategy to induce terminal differentiation in MB cells, with potential therapeutic implications. Additionally, analysis of human SHH-MB samples reveals that higher EZH2 expression correlates with worse survival, indicating the clinical relevance.

      Weaknesses:

      The study does not fully explore compensatory mechanisms of PRC2 given that the phenotype of Ezh2 conditional knockout (cKO) in GNP development and MB tumor formation is relatively mild.

    3. Reviewer #2 (Public review):

      Summary:

      This study used an unbiased approach to evaluate epigenetic dynamics during the differentiation of granule neuron precursors, the cell of origin for Shh-MB. These profiling findings led to the focus on H3K27me3 dynamics, which correlate with the remodeling of epigenetic landscape associated with neuronal differentiation gene activation.

      Strengths:

      Depletion of EZH2, an enzymatic subunit of PRC2, resulted in premature neuronal differentiation in the developing cerebellum.

      Weaknesses:

      Little information is shown about the specific genetic programs disrupted by EZH2 depletion. This is a crucial weakness as existing PRC2 inhibitors do not effectively cross the blood-brain barrier. Further studies are necessary to identify downstream targets of PRC2 that could be targeted to induce neuronal differentiation in MB cells.

    1. eLife Assessment

      This serostudy of blood donors in Bolivia (a country with very high COVID death rates in 2020-21) provides useful insights on the successive viral variants of SARS-CoV-2 over 2021 and 2022. Using compelling antibody and neutralization assays, the authors describe variant specific distributions in the different parts of Bolivia. The main methodological advance is to use serology to understand variant diversity, which in turn helps deepen understanding of "hybrid" immunity from widespread infection (and vaccination).

    2. Reviewer #1 (Public review):

      Summary:

      This study provides valuable and comprehensive information about the SARS-CoV-2 seroprevalence during 2021 and 2022 in different regions of Bolivia. Moreover, data on immune responses against the SARS-CoV-2 variants based on neutralization tests denotes the presence of several virus variants circulating in the Bolivian population. Evidence for seroprevalence data provided by the authors is solid, across the study period, while data regarding variant circulation is limited to the early stages of the pandemic.

      Strengths:

      The major strength of this study is that it provided nationwide seroprevalence estimates from infection and/or vaccination based on antibodies against both spike and the nucleocapsid protein in a large representative sample of sera collected at two time points from all departments of Bolivia, gaining insight into COVID-19 epidemiology. On the other hand, data from virus neutralization assays inferred the circulation during the study period of four SARS-CoV-2 variants in the population. Overall, the study results provide an overview of the level of viral transmission and vaccination and insights into the spread across the country of SARS-CoV-2 variants.

      Weaknesses:

      The assessment of a Lambda variant that circulated in several neighboring countries (Peru, Chile, and Argentina), which had a significant impact on the COVID-19 pandemic in the region, may have strengthened the study to contrast Gamma spread. In addition, even though neutralizing antibodies can certainly reveal previous infections of SARSCOV2 variants in the population, it is of limited value to infer from this information some potential timing estimates of specific variant circulation, considering the heterogeneous effects that past infections, vaccinations, or a combination of both could have on the level of variant-specific neutralizing antibodies and/or their cross-neutralization capacity.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The conclusions of this paper are well supported by data, particularly regarding seroprevalence that reliably reflects the epidemiology of COVID-19 in Bolivia, and seroprevalence trends in other low- and middle-income countries.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      Since this is the first study that has been conducted to assess indicators of immunity against SARS-CoV-2 in the population of Bolivia at a nationwide scale, seroprevalence data provided by geographic regions at two time points can be useful as a reference for potential retrospective global meta-analysis and to further explore and compare the risk factors for infection, variant distribution, and the impact on infection and vaccination, gaining deeper insights into understanding the evolution of the COVID-19 pandemic in Bolivia and in the region.

    3. Reviewer #3 (Public review):

      Summary:

      This study attempts to reconstruct the history of the COVID-19 epidemic, with its successive waves of viral variants from SARS-CoV-2 seroprevalence during 2021 and 2022 among blood donors in different regions of Bolivia. By using serological tests "specific" for the various variants the authors try to achieve a "colour" vision that is not provided by standard "black-and-white" serology.

      Strengths and Weaknesses:<br /> I am not an expert on the performance of SARS-CoV-2 serological tests, so may overlook certain weaknesses. Instead I tried to assess whether the authors, in this manuscript, have managed to substantiate their claims that "seroprevalence studies are a valuable adjunct to active surveillance because they allow analysis of the level of immunity of a population to a specific pathogen without the need for prospective testing" , and that "genomic surveillance and serology offer distinct yet complementary insights thus far." I think they succeeded, as they paint a credible and interesting history of the epidemic in Bolivia using (to me) novel methodology that certainly will stimulate extensive discussion, controversies, and follow-up studies (for which the authors might make some suggestions).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study provides valuable and comprehensive information about the SARS-CoV-2 seroprevalence during 2021 and 2022 in different regions of Bolivia. Moreover, data on immune responses against the SARS-CoV-2 variants based on neutralization tests denotes the presence of several virus variants circulating in the Bolivian population. Evidence for seroprevalence data provided by the authors is solid, across the study period, while data regarding variant circulation is limited to the early stages of the pandemic.

      Strengths:

      The major strength of this study is that it provided nationwide seroprevalence estimates from infection and/or vaccination based on antibodies against both spike and the nucleocapsid protein in a large representative sample of sera collected at two time-points from all departments of Bolivia, gaining insight into COVID-19 epidemiology. On the other hand, data from virus neutralization assays inferred the circulation during the study period of four SARS-CoV-2 variants in the population. Overall, the study results provide an overview of the level of viral transmission and vaccination and insights into the spread across the country of SARS-CoV-2 variants.

      Weaknesses:

      The assessment of a Lambda variant that circulated in several neighboring countries (Peru, Chile, and Argentina), which had a significant impact on the COVID-19 pandemic in the region, may have strengthened the study to contrast Gamma spread. In addition, even though neutralizing antibodies can certainly reveal previous infections of SARSCOV2 variants in the population, it is of limited value to infer from this information some potential timing estimates of specific variant circulation, considering the heterogeneous effects that past infections, vaccinations, or a combination of both could have on the level of variant-specific neutralizing antibodies and/or their cross-neutralization capacity.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The conclusions of this paper are well supported by data, particularly regarding seroprevalence that reliably reflects the epidemiology of COVID-19 in Bolivia, and seroprevalence trends in other low- and middle-income countries.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      Since this is the first study that has been conducted to assess indicators of immunity against SARSCoV-2 in the population of Bolivia at a nationwide scale, seroprevalence data provided by geographic regions at two time-points can be useful as a reference for potential retrospective global metaanalysis and further explore and compare the risk factors for infection, variant distribution, and the impact on infection and vaccination, gaining deeper insights into understanding the evolution of the COVID-19 pandemic in Bolivia and in the region.

      Reviewer #2 (Public Review):

      Significance of the findings:

      In this study, blood donors were assessed using serology and viral neutralization assays to determine the prevalence of SARS-CoV-2 antibodies. S1 and NCP antibodies were used to distinguish between vaccination and natural infection and virus-specific neut titers were used to determine which variants the antibodies respond to. The study reports almost universal antibody prevalence and increases in antibodies against specific variants at different points corresponding to circulating variants identified phylogenetically in neighbouring countries. The authors propose this approach for settings like Bolivia where genetic sequencing is not readily available. Unfortunately, there are significant limitations to this approach that limit its utility - serological data are available after the fact in a fast-moving pandemic and so are a poor alternative to phylogenetic data. Rather, serological information can supplement phylogenetic data and is most useful in estimating population-level immunity.

      (1) Considerations in interpreting the results:

      We appreciate the reviewer's valuable feedback, which will certainly enhance the quality of our manuscript. As a result, we have revised the text to address their suggestions as thoroughly as possible.

      a. Serology provides different information to phylogenetic sequencing of the viruses and so both are important. Viral sequencing provides real-time information on circulating variants and indicates the proportion of each variant in circulation at any point as there are almost always multiple variants spreading but it is the fastest spreading variant that comes to dominate. Importantly serology measures asymptomatic infections as well, providing population estimates of infection that are not available through viral gene sequencing.

      We underscored this point in the introduction by incorporating the following sentences:

      “Seroprevalence studies are a valuable adjunct to active surveillance because they allow analysis of the level of immunity of a population to a specific pathogen without the need for prospective testing, and also provide information on the frequency of cases that do not attract medical attention (asymptomatic infections)(4).” and “To date, the circulation of SARS-CoV-2 variants has mainly been studied through molecular surveillance, giving the proportion of circulating variants in real time. Therefore, genomic surveillance and serology offer distinct yet complementary insights thus far.”

      b. A major concern in the interpretation of serology is that antibody titers vary markedly over time with rapid declines in the first year post-infection or post-vaccination. However, these declines vary depending on whether hybrid immunity is present. Disentangling this retrospectively is a challenge. A low antibody titer could reflect an infection that occurred a few months ago but may be below the threshold for positivity at the time of testing. There is also substantial individual variability in antibody responses.

      This limitation merits emphasis and has consequently been elaborated upon in the discussion section:

      “Secondly, our results are based on serological data and may not be strictly identical to the genomic data from a quantitative point of view, although they are likely to reflect similar trends and distributions (see below). The results could also be influenced by various factors, including significant individual variation in antibody responses, as well as the decline in antibody titers during the first months following infection or vaccination(31-34) and could therefore sligly underestimated. As the complexity of SARS-CoV-2 antigen exposure histories increased among tested individuals, we observed a tendency for serological data to start diverging from genomic data. This suggests, as expected, that the effectiveness of this method would be greater if implemented early in an epidemic when the occurrence of multiple infections with different variants or the administration of varying doses of vaccine in the analyzed population before or after infection (resulting in hybrid immunity) is still limited. However, to mitigate the potential challenges arising from complex antigen exposure, we employed straightforward criteria to identify the variant among the four tested in VNT that exhibited the highest value (cf methods), thereby likely indicating the main or most recent infection and minimizing the influence of crossneutralization on the final outcomes. In addition, several approaches were used to analyze the results, including quantification of circulating antigenic groups and individual variants, yielding results that were comparable and closely aligned with the genomic data.”

      c. Serology becomes increasingly difficult to untangle when an individual has had doses of vaccine and multiple natural infections with different variants. Due to the importance of hybrid immunity in population risk to new variants, it would be useful for estimates of hybrid immunity to be generated based on anti-S1 and anti-NCP antibodies. From a population immunity perspective, this could be important in guiding future protection and boosting strategies.

      We estimated the hybrid immunity for each department in 2021 and 2022 based on the prevalence of anti-S1 and anti-NCP antibodies and added a new Supplementary Table 1. We also added a description of this table in the result section: “The estimated hybrid immunity, based on the prevalence of anti-S1 and anti-NCP antibodies, ranged from 51.4% in Pando to 73.6% in Potosí in 2021. By 2022, this increased to between 83.3% in Santa Cruz and 90.6% in Tarija (Supplementary Table 1).”

      d. Since there is cross-neutralization by the antibodies stimulated by each variant, it is important to establish the sensitivity and specificity of each of the neutralization assays in a panel comprising multiple variants. An assessment of the accuracy of the neut assay for each variant is needed to be confident that it is able to distinguish between variants.

      Assessing the performance of a the VNT for each SARS-CoV-2 variants is a highly complex task. This evaluation requires samples with comprehensive data on vaccination and infection specific to each variant to determine the specificity of each VNT for each variant. However, the access to such samples for every newly emerging variant remains challenging. In order to circumvent this issue, we evaluated the circulation level of γ, δ, and ο variants under increasingly stringent conditions, by calculating the proportion of the population with log2-ratio values of ≤0 (variant titer equal to or greater than D614G), ≤-1 (variant titer at least twice that of D614G), and ≤-2 (variant titer at least four times that of D614G).

      e. Blood donors are notoriously poor representations of the general population in many countries, driven partly by whether donation is financially rewarded. For example, in the USA, drug addicts are disproportionately over-represented in blood donor populations as they use it as a source of money. The authors provide no information on whether the blood donor population in Bolivia is representative of the entire population. Comparison of the prevalence of specific disease markers in the general population and in blood donors could provide a signal of their comparability.

      This is a significant aspect addressed in point 3.

      (2) Please provide the sensitivity and specificity of each of the assays so that the reader can assess the degree of accuracy in the assay that claims that the prevalent antibodies are due to, for example, omicron.

      The sensitivity and specificity of the in vitro assays are now referenced in a previous study: “The sensitivity and specificity of the in vitro assays were described previously(23).”

      Neutralization assays are considered the gold standard for measuring neutralizing antibodies against SARS-CoV-2 and its variants, and they are widely used in seroprevalence studies. However, until now, no one has successfully evaluated the specificity and sensitivity of this assay for SARS-CoV-2 variants, as it requires sera from individuals exposed to a single variant, which are increasingly difficult to collect for each newly emerging variants. Nevertheless, using sera from laboratory-infected animals (primarily hamsters) with a single variant exposure has enabled the antigenic characterization of SARS-CoV-2 variants through viral neutralization. This approach has shown that it is possible to distinguish between sera from individuals infected with different variants, even among the Omicron subvariants (Anna Z. Mykytyn et al. Antigenic cartography of SARS-CoV-2 reveals that Omicron BA.1 and BA.2 are antigenically distinct.Sci. Immunol.7,eabq4450(2022); Samuel H. Wilks et al. Mapping SARS-CoV-2 antigenic relationships and serological responses.Science382,eadj0070(2023)).

      (3) Please provide an assessment of the representativity of the blood donor population eg. Is the prevalence of hepatitis B serological markers in the blood donor population comparable with the prevalence of hepatitis B serological markers in the general population from community-based studies?

      A new sentence was included in the discussion to offer support for considering the blood donor population as a representative sample of the general population: “In addition, in Bolivia, blood donation is unrewarded, and blood donors appear to be quite representative of the general population. Indeed, routine screening for several infection markers (such as HIV or HBV) is conducted in all donors, and the prevalences of these markers do not differ from those observed in the general population. For example, UNAIDS data highlights a 0.4% HIV prevalence within the Bolivian general population, with significantly higher rates exceeding 25% observed in high-risk groups such as men who have sex with men(29). Moreover, Sheena et al. estimated a 0.6% prevalence of HBsAg in Bolivia in 2019(30). Bolivian national statistics of National Blood Program of the Ministry of Health and Sports, indicate that between 2019 and 2023, the proportion of HIV- and HBV-reactive units among screened blood donors ranged from 0.26% to 0.41% and 0.16% to 0.25%, respectively (Dr. Lissete Bautista’s personal communication).”

    1. eLife Assessment

      This study presents a valuable finding on the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. The evidence supporting the claims of the authors is solid. This paper will be of interest to scientists in the infectious inflammatory disease field.

    2. Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data. And the authors were responsive to the reviewers' comments.

      Comments on revised version:

      The authors have thoroughly addressed the previous concerns and improved the manuscript. The revisions have strengthened both the conclusions. I have no additional suggestions for improvement and recommend this manuscript for publication.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result; (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is acknowledged but not experimentally addressed; (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not addressed in this study, making the inclusion of this observation in this manuscript incomplete; and (d) assessment of SLPI levels in healthy controls vs. Lyme disease patients is inadequate.

      Comments on revised verson:

      Several of the points were addressed in the revised manuscript, but the following issues remain:

      Previous point that the relationship of SLPI binding to B. burgdorferi to the enhanced disease of SLPI-deficient mice is not investigated: The authors indicate that such investigations are ongoing. In the absence of any findings, I recommend that their interesting BASEHIT and subsequent studies be presented in a future study, which would have high impact.

      Previous recommendation 1: (The authors added lines 267-68, not 287-68). This ambiguity is acknowledged but remains. In addition, in the revised manuscript, the authors state "However, these data also emphasize the importance of SLPI in controlling the development of inflammation in periarticular tissues of B. burgdorferi-infected mice." Given acknowledged limitations of interpretation, "suggest" would be more appropriate than "emphasize".

      Previous recommendation 5: The lack of clinical samples can be a challenge. Nevertheless, 4 of the 7 samples from LD patients are from individuals suffering from EM rather than arthritis (i.e., the manifestation that is the topic of the study) and some who are sampled multiple times, make an objective statistical comparison difficult. I don't have a suggestion as to how to address the difference in number of samples from a given subject. However, the authors could consider segregating EM vs. LA in their analysis (although it appears that limiting the comparison between HC and LA patients would not reveal a statistical difference).

      Previous recommendation 6: Given that binding of SLPI to the bacterial surface is an essential aspect of the authors' model, and that the ELISA assay to indicate SLPI binding used cell lysates rather than intact bacteria, a control PI staining to validate the integrity of bacteria seems reasonable.

      Previous recommendation 8: The inclusion of a no serum control (that presumably shows 100% viability) would validate the authors' assertion that 20% serum has bactericidal activity.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both Lyme arthritis patients and patients with earlier disease manifestations. This limitation is thoroughly discussed in the manuscript.

      Comments on revised version:

      I thank the authors for considering my comments carefully.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data, but two aspects need attention:

      (1) Cytokine Analysis:

      The serum cytokine/chemokine profile analysis appears without TNF-alpha data. Given TNF-alpha's established role in inflammatory responses, comparing its levels between wild-type and infected B. burgdorferi conditions would provide valuable insight into the inflammatory mechanism.

      (2) Sample Size Concerns:

      While the authors note limitations in obtaining Lyme disease patient samples, the control group is notably smaller than the patient group. This imbalance should either be addressed by including additional healthy controls or explicitly justified in the methodology section.

      We thank the reviewer for the careful review and positive comments.

      (1) We did look into the level of TNF-alpha in both WT and SLPI-/- mice with and without B. burgdorferi infection. At serum level, using ELISA, we did not observe any significant difference between all four groups. At gene expression level, using RT-qPCR on the tibiotarsal tissue, we also did not observe any significant differences. Our RT-qPCR result is consistent with the previous microarray study using the whole murine joint tissue (DOI: 10.4049/jimmunol.177.11.7930). The microarray study did not show significant changes in TNF-alpha level in C57BL/6 mice following B. burgdorferi infection. A brief discussion has been added, and the above data is provided as Supplemental figure 4 in the revised manuscript, line 334-339, and 756-763.

      (2) We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      We appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result;

      We agree that the observation of the elevated NE level and the enhanced inflammation is theoretically likely. Indeed, that was the hypothesis that we explored, and often what is theoretically possible does not turn out to occur. In addition, despite the known contribution of neutrophils to the severity of murine Lyme arthritis, the importance of the neutrophil serine proteases and anti-protease has not been specifically studied, and neutrophils secrete many factors. Therefore, our data fill an important gap in the knowledge of murine Lyme arthritis development – and set the stage for the further exploration of this hypothesis in the genesis of human Lyme arthritis.

      (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is not addressed;

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not clear; and

      We agree with the reviewer that we have not shown the importance of the SLPI-B. burgdorferi binding in the development of periarticular inflammation. It is an ongoing project in our lab to identify the SLPI binding partner in B. burgdorferi. Our hypothesis is that SLPI could bind and inhibit an unknown B. burgdorferi virulence factor that contributes to murine Lyme arthritis. A brief discussion has been added in the revised manuscript, line 401-407.

      (d) Several methodological aspects of the study are unclear.

      We appreciate the critique. We have modified the methods section in greater detail in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      We greatly appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both in Lyme arthritis patients and patients with earlier disease manifestations.

      We agree with the reviewer that it would be ideal to have more samples from Lyme arthritis patients. However, among the available archived samples, samples from Lyme arthritis patients are limited. For the samples from patients with single EM, the symptom persisted into 3-4 month after diagnosis, the same timeframe when acute arthritis is developed. A brief discussion has been added in the revised manuscript, line 364-369.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2, for histological scoring, do they have similar n numbers?

      In panel B, 20 infected WT mice and 19 infected SLPI-/- mice were examined. In panel D, 13 infected WT and SLPI-/- mice were examined. Without infection, WT and SLPI-/- mice do not develop spontaneous arthritis. Due to the slow breeding of the SLPI-/- mice, a small number of uninfected control animals were used. All the supporting data values are provided in the supplemental excel.

      (2) In Figure 3, for macrophage population analysis, maybe consider implementing Ly6G-negative gating strategy to prevent neutrophil contamination in macrophage population?

      We appreciate reviewer’s suggestion. We have analyzed the data using the Ly6G-negative gating strategy and provided the result in the Supplemental figure 1. The two gating strategies showed consistent result, significantly higher percentage of infiltrating macrophages in the tibiotarsal tissue from infected SLPI-/- mice, line 154-158, line 726-729.

      Reviewer #2 (Recommendations for the authors):

      (1) The investigators should address the possibility that much of the enhanced inflammatory features of infected SLPI-deficient mice are simply due to the higher bacterial load in the joint.

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (2) Fig. 1. (A) There is no statistically significant difference in the bacterial load in the heart or skin, in contrast to the tibiotarsal joint. It would be of interest to know whether other tissues that are routinely sampled to assess the bacterial load, such as injection site, knee, and bladder, also harbored increased bacterial load in SLPI-deficient mice. (B) Heart and joint burden were measured at "21-28" days. The two time points should be analyzed separately rather than pooled.

      (A) We appreciate the reviewer’s suggestion. We agree that looking into the infection load in other tissues is helpful. However, studies into murine Lyme arthritis have been predominantly focused on tibiotarsal tissue, which displays the most consistent and prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study. (B) We collected the heart and joint tissue approximately 3-week post infection within a 3-day window based on the feasibility and logistics of the laboratory. Using “21-28 d”, we meant to describe between 21 to 24 days post infection. We apologize for the mislabeling and it has been corrected it in the revised manuscript. In the methods, we defined the timeframe as “Mice were euthanized approximately 3-week post infection within a 3-day window (between 21 to 24 dpi) based on the feasibility and logistics of the laboratory”, line 464-466. In the results and figure legend, we corrected it as “between 21 to 24 dpi”.

      (3) Fig. 2. (A) The same ambiguity as to the days post-infection as cited above in Point 2B exists in this figure. (B) Panel B: Caliper measurements to assess joint swelling should be utilized rather than visual scoring. (In addition, the legend should make clear that the black circles represent mock-infected mice.)

      (A) The histology scoring, and histopathology examination were performed at the same time as heart and joint tissue collection, approximately 3 weeks post infection within a 3-day window based on the feasibility and logistics of the laboratory. We apologize for the mislabeling and it has been corrected in the revised manuscript. (B) We appreciate the reviewer’s suggestion. However, our extensive experience is that caliper measurement can alter the assessment of swelling by placing pressure on the joints and did not produce consistent results. Double blinded scoring was thus performed. Histopathology examination was performed by an independent pathologist and confirmed the histology score and provided additional measurements.

      (4) Fig. 3. (A) See Point 2B. (B) For Panels C-E, uninfected controls are lacking.

      We apologize for this omission. Uninfected controls have been provided in Figure 3 in the revised manuscript.

      (5) Fig. 4. Fig. 4. Some LD subjects were sampled multiple times (5 samples from 3 subjects with Lyme arthritis; 13 samples from 4 subjects with EM), and samples from same individuals apparently are treated as biological replicates in the statistical analysis. In contrast, the 5 healthy controls were each sampled only once.

      We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited, and sampled once. We used these samples to establish the baseline level of SLPI in the serum. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      (6) Fig. 5. (A) Panel A: does binding occur when intact bacteria are used? (B) Panels B, C: Were bacteria probed with PI to indicate binding likely to occur to surface? How many biological replicates were performed for each panel? Is "antibody control" a no SLPI control? What is the blue line?

      Actively growing B. burgdorferi were collected and used for binding assays. We do not permeabilize the bacteria for flow cytometry. Thus, all the binding detected occurs to the bacterial surface. Three biological replicates were performed for each panel. The antibody control is no SLPI control. For panel D, the bacteria were stained with Hoechst, which shows the morphology of bacteria. We apologize for the missing information. A complete and detailed description of Figure 5 has been provided in both methods and figure legend in the revised manuscript. 

      (7) Sup Fig. 1. (A) Panel A: Was this experiment performed multiple times? I.e., how many biological replicates? (B) Panel B: Strain should be specified.

      The binding assay to B. burgdorferi B31A was performed two times. In panel B, B. burgdorferi B31A3 was used. We apologize for the missing information. A complete and detailed description has been provided in the figure legend in the revised manuscript. 

      (8) Fig. S2. It is not clear that the condition (20% serum) has any bactericidal activity, so the potential protective activity of SLPI cannot be determined. (Typical serum killing assays in the absence of specific antibody utilized 40% serum.)

      In Fig. S2, panel B, the first two bars (without SLPI, with 20% WT anti serum) showed around 40% viability. It indicates that the 20% WT anti serum has bactericidal activity. Serum was collected from B. burgdorferi-infected WT mice at 21 dpi, which should contain polyclonal antibody against B. burgdorferi.

      Reviewer #3 (Recommendations for the authors):

      It was a pleasure to review! I congratulate the authors on this elegant study. I think the manuscript is very well-written and clearly conveys the research outcomes. I only have minor suggestions to improve the readability of the text.

      We greatly appreciate the reviewer’s recognition of our work.

      Line 92: Please briefly summarize the key results of the study at the end of the introduction section.

      We appreciate the reviewer’s suggestion. A brief summary has been added in the revised manuscript, line 93-103.

      Line 108: Why is the inflammation significantly occurred only in ankle joints of SLPI-I mice? Could you please provide a brief explanation?

      The inflammation may also happen in other joints the B. burgdorferi infected SLPI-/- mice, which has not been studied. The study into murine Lyme arthritis has been predominantly done in the tibiotarsal tissue, which displays the most prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study.

      Line 136: Please also include the gene names in Figure 3.

      We apologize for the omission. Gene names has been included in figure legend in the revised manuscript.

      Line 181: Please briefly introduce BASEHIT. Why did you use this tool? What are the benefits?

      We appreciate the reviewer’s suggestion. We have provided a brief introduction on BASEHIT in the revised manuscript, line 216-218.

    1. eLife Assessment

      This study presents valuable findings with practical and theoretical implications for drug discovery, particularly in the context of repurposing cipargamin CIP for the treatment of Babesia spp. The evidence is solid with the methods, data and analyses broadly supporting the claims. The paper will be of great interest to scientists in drug discovery, computational biology, and microbiology

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors have tried to repurpose cipargamin (CIP), a known drug against Plasmodium and Toxoplasma against Babesia. They proved the efficacy of CIP on Babesia in nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      Authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by lack of adaptive immunity.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. In the SCID mouse model, cipargamin was tested in combination with tafenoquine but not with atovaquone and/or azithromycin, although the latter combination is often used as first-line therapy for human babesiosis caused by Babesia microti.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors address an important issue in Babesia research by repurposing cipargamin (CIP) as a potential therapeutic against selective Babesia spp. In this study, CIP demonstrated potent in vitro inhibition of B. bovis and B. gibsoni with IC<sub>50</sub> values of 20.2 ± 1.4 nM and 69.4 ± 2.2 nM, respectively, and the in vivo efficacy against Babesia spp. using mouse model. The authors identified two key resistance mutations in the BgATP4 gene (BgATP4<sup>L921I</sup> and BgATP4<sup>L921V</sup>) and explored their implications through phenotypic characterization of the parasite using cell biological experiments, complemented by in silico analysis. Overall, the findings are promising and could significantly advance Babesia treatment strategies.

      Strengths:

      In this manuscript, the authors effectively repurpose cipargamin (CIP) as a potential treatment for Babesia spp. They provide compelling in vitro and in vivo data showing strong efficacy. Key resistance mutations in the BgATP4 gene are identified and analyzed through both phenotypic and in silico methods, offering valuable insights for advancing treatment strategies.

      Thank you for your insightful comments and for taking the time to review our manuscript.

      Weaknesses:

      The manuscript explores important aspects of drug repurposing and rational drug design using cipargamin (CIP) against Babesia. However, several weaknesses should be addressed. The study lacks novelty as similar research on cipargamin has been conducted, and the experimental design could be improved. The rationale for choosing CIP over other ATP4-targeting compounds is not well-explained. Validation of mutations relies heavily on in silico predictions without sufficient experimental support. The Ion Transport Assay has limitations and would benefit from additional assays like Radiolabeled Ion Flux and Electrophysiological Assays. Also, the study lacks appropriate control drugs and detailed functional characterization. Further clarity on mutation percentages, additional safety testing, and exploration of cross-resistance would strengthen the findings.

      We appreciate your feedback and for giving us the chance to improve our paper. We have specified how we revised the below comments one by one. I hope these address your concerns.

      Comment 1: It is commendable to explore drug repurposing, drug deprescribing, drug repositioning, and rational drug design, especially using established ATP4 inhibitors that are well-studied in Plasmodium and other protozoan parasites. While the study provides some interesting findings, it appears to lack novelty, as similar investigations of cipargamin on other protozoan parasites have been conducted. The study does not introduce new concepts, and the experimental design could benefit from refinement to strengthen the results. Additionally, the rationale for choosing CIP over other MMV compounds targeting ATP4 is not clearly articulated. Clarifying the specific advantages CIP may offer against Babesia would be beneficial. Finally, the validation of the identified mutations might be strengthened by additional experimental support, as reliance on in silico predictions alone may not fully address the functional impact, particularly given the potential ambiguity of the mutations (BgATP4 L to V and I).

      Thank you for your thoughtful feedback. We have addressed the concerns as follows: (1) Introduction of new concepts and experimental design: While our study primarily builds on existing frameworks, it provides novel insights into the interaction of CIP with Babesia parasites, which we believe contribute to the field. Regarding the experimental design, we acknowledge its limitations and have revised the manuscript to include additional experiments to strengthen the robustness of our findings. Specifically, we have added experiments on the detection of BgATP4-associated ATPase activity (Figure 3H), the evaluation of cross-resistance to antibabesial agents (Figures 5A and 5B), and the efficacy of CIP plus TQ combination in eliminating B. microti infection with no recrudescence in SCID mice (Figure 5C).

      (2) Rationale for choosing CIP over other MMV compounds targeting ATP4: We appreciate this point and have expanded the introduction section to articulate our rationale for selecting CIP (Lines 94-97). Specifically, CIP was chosen due to its previously demonstrated efficacy against Plasmodium and other protozoan parasites.

      (3) Validation of identified mutations: We agree that additional experimental data would strengthen the validation of the identified mutations. In response, we have indicated the ratio of wild-type to mutant parasites by Illumina NovaSeq6000 to validate the impact of the BgATP4 C-to-G and A mutations (Figure 2D).

      Comment 2: Conducting an Ion Transport Assay is useful but has limitations. Non-specific binding or transport by other cellular components can lead to inaccurate results, causing false positives or negatives and making data interpretation difficult. Indirect measurements, like changes in fluorescence or electrical potential, can introduce artifacts. To improve accuracy, consider additional assays such as

      a. Radiolabeled Ion Flux Assay: tracks the movement of Na<sup>+</sup> using radiolabeled ions, providing direct evidence of ion transport.

      b. Electrophysiological Assay: measures ionic currents in real-time with patch-clamp techniques, offering detailed information about ATP4 activity.

      Thank you for highlighting the limitations of the ion transport assay and suggesting alternative approaches to improve accuracy. However, they require specialized equipment and expertise not currently available in our laboratory. We have acknowledged these limitations and included these alternative methods as part of the study's future directions. Thank you for your suggestions which will undoubtedly enhance the rigor and depth of our research.

      Comment 3: In-silico predictions can provide plausible outcomes, but it is essential to evaluate how the recombinant purified protein and ligand interact and function at physiological levels. This aspect is currently missing and should be included. For example, incorporating immunoprecipitation and ATPase activity assays with both wild-type and mutant proteins, as well as detailed kinetic studies with Cipargamin, would be recommended to validate the findings of the study.

      Thank you for your insightful suggestions regarding the validation of in-silico predictions. We recognize the importance of evaluating the interaction and function of recombinant purified proteins and ligands at physiological levels to strengthen the study's findings. (1) Incorporating experimental validation:

      a. Immunoprecipitation assays: We agree that immunoprecipitation could provide valuable evidence of protein-ligand interactions. While this was not included in the current study due to limitations in sample availability, we plan to incorporate this assay in follow-up experiments.

      b. ATPase activity assays: Assessing ATPase activity in both wild-type and mutant proteins is a crucial step in validating the functional impact of the identified mutations. We included the results in the revised manuscript (Figure 3H).

      (2) Detailed kinetic studies with cipargamin: We appreciate the recommendation to conduct detailed kinetic analyses. These studies would provide deeper insights into the binding affinity and inhibition dynamics of cipargamin. We have included the results of these experiments in the current study (Figure 3I).

      Comment 4: The study lacks specific suitable control drugs tested both in vitro and in vivo. For accurate drug assessment, especially when evaluating drugs based on a specific phenotype, such as enlarged parasites, it is important to use ATP4 gene-specific inhibitors. Including similar classes of drugs, such as Aminopyrazoles, Dihydroisoquinolines, Pyrazoleamides, Pantothenamides, Imidazolopiperazines (e.g., GNF179), and Bicyclic Azetidine Compounds, would provide more comprehensive validation.

      Thank you for emphasizing the importance of including suitable control drugs. We acknowledge the absence of specific control drugs in the previous version of the manuscript. To date, no drug targeting ATP4 proteins in Babesia has been definitively identified. The suggested drugs could potentially disrupt the parasite's ability to regulate sodium levels by inhibiting PfATP4, a protein essential for its survival. This highlights PfATP4 as an attractive target for antimalarial drug development. However, further studies are required to evaluate whether these drugs exhibit similar activity against ATP4 homologs in Babesia.

      Comment 5: Functional characterization of CIP through microscopic examination and quantification for assessing parasite size enlargement is not entirely reliable. A Flow Cytometry-Based Assay is recommended instead 9 along with suitable control antiparasitic drugs). To effectively monitor Cipargamin's action, conducting time-course experiments with 6-hour intervals is advisable rather than relying solely on endpoint measurements. Additionally, for accurate assessment of parasite morphology, obtaining representative qualitative images using Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for treated versus untreated samples is recommended for precise measurements.

      Thank you for your constructive feedback regarding the methods for functional characterization of CIP and the evaluation of parasite morphology.

      (1) Flow Cytometry-Based Assay: We agree that a flow cytometry-based assay would enhance the accuracy of detecting changes in parasite size and morphology. We will implement this method in future studies as our laboratory currently does not have the capability to conduct such experiments.

      (2) Microscopy for Morphology Assessment: We acknowledge the importance of obtaining high-resolution, representative images of treated and untreated samples. Utilizing Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for qualitative analysis will significantly improve the precision of our morphological assessments. However, both methods have limitations.

      a. SEM: This technique can only scan the erythrocytes' surface; it cannot scan the parasite itself because it is inside the erythrocytes.

      b. TEM: Since the parasite is fixed, observations from various angles may reveal longitudinal or cross-sectional portions, making it impossible to precisely view the parasite's dimensions. As a result, we employed TEM to precisely observe the parasite's internal structure alterations both before and after treatment, as seen in Figure 3C.

      Comment 6: A notable contradiction observed is that mutant cells displayed reduced efficacy and affinity but more pronounced phenotypic effects. The BgATP4<sup>L921I</sup> mutation shows a 2x lower susceptibility (IC<sub>50</sub> of 887.9 ± 61.97 nM) and a predicted binding affinity of -6.26 kcal/mol with CIP. However, the phenotype exhibits significantly lower Na<sup>+</sup> concentration in BgATP4<sup>L921I</sup> (P = 0.0087) (Figure 3E).

      The seemingly contradicting observation of reduced CIP binding and efficacy in the BgATP4<sup>L921I</sup> mutant with a significant decrease in intracellular Na<sup>+</sup> concentration may be explained by factors other than the direct CIP interaction. Logically, we consider that CIP binds less effectively to its target in the BgATP4<sup>L921I</sup> mutant, but the observed phenotype may be attributed to the functional consequences of the mutation. The BgATP4<sup>L921I</sup> mutation probably directly impacts the function of BgATP4's ion transport mechanism, which likely disrupts Na<sup>+</sup> homeostasis independently. Thus, we hypothesize that the dysregulated Na<sup>+</sup> homeostasis is driven by the mutation itself rather than the already weakened inhibitory effect of CIP.

      Comment 7: The manuscript does not clarify the percentage of mutations, and the number of sequence iterations performed on the ATP4 gene. It is also unclear whether clonal selection was carried out on the resistant population. If mutations are not present in 100% of the resistant parasites, please indicate the ratio of wild-type to mutant parasites and represent this information in the figure, along with the chromatograms.

      Thank you for your valuable comments. We appreciate your detailed observations and giving us the opportunity to clarify these points. During the long-term culture process, subculturing was performed every three days. Although clonal selection was not conducted, mutant strains were effectively selected during this process. Using the Illumina NovaSeq6000 sequencing platform, high-throughput next-generation sequencing was performed to detect ratio of wild-type to mutant parasites. Results showed that for BgATP4<sup>L921V</sup>, 99.97% of 7,960 reads were G, and for BgATP4<sup>L921I</sup>, 99.92% of 7,862 reads were A. To enhance clarity, we have included a new figure (Figure 2D) illustrating the sequencing results. We believe this addition will help provide a clearer understanding for the readers.

      Comment 8: While the compound's toxicity data is well-established, it is advisable to include additional testing in epithelial cells and liver-specific cell lines (e.g., HeLa, HCT, HepG2) if feasible for the authors. This would provide a more comprehensive assessment of the compound's safety profile.

      Thank you for your thoughtful suggestion. We included toxicity testing in human foreskin fibroblasts (HFF) as supplemental toxicity data to provide a more comprehensive evaluation of the compound's safety profile (Figure supplement 1B).

      Comment 9: In the in vivo efficacy study, recrudescent parasites emerged after 8 days of treatment. Did these parasites harbor the same mutation in the ATP4 gene? The authors did not investigate this aspect, which is crucial for understanding the basis of recrudescence.

      Thank you for raising this important point. We acknowledge that understanding the genetic basis of recrudescence is critical for elucidating mechanisms of resistance and treatment failure. Although our current study did not include an analysis of the BrATP4 gene in relapse parasites due to limitations in sample availability, we evaluated CIP efficacy in SCID mice and performed sequencing analysis of the BmATP4 gene in recrudescent samples. However, no mutation points were identified (Lines 211-212). We believe that if a relapse occurs after the 7-day treatment, it is unlikely that the parasites would easily acquire mutations.  

      Comment 10: The authors should explain their choice of BABL/c mice for evaluating CIP efficacy, as these mice clear the infection and may not fully represent the compound's effectiveness. Investigating CIP efficacy in SCID mice would be valuable, as they provide a more reliable model and eliminate the influence of the immune system. The rationale for not using SCID mice should be clarified.

      We appreciate the reviewer's suggestion regarding the use of SCID mice to evaluate the efficacy of CIP. In response to your suggestion, we have now included an experiment using SCID mice to evaluate the efficacy of CIP and to eliminate the confounding influence of the immune system. We further investigated the potential of combined administration of CIP plus TQ to eliminate parasites, as we are concerned that the long-term use of CIP as a monotherapy may be limited due to its potential for developing resistance. The results are shown in Figure 5C.

      Comment 11: Do the in vitro-resistant parasites show any potential for cross-resistance with commonly used antiparasitic drugs? Have the authors considered this possibility, and what are their expectations regarding cross-resistance?

      Thank you for your insightful question regarding the potential for cross-resistance between in vitro-resistant parasites and commonly used antiparasitic drugs. In response to your suggestion, we have now included experiments to assess whether B. gibsoni parasites that are resistant to CIP exhibit any cross-resistance to other commonly used antiparasitic drugs, such as atovaquone (ATO) and tafenoquine (TQ). The IC<sub>50</sub> values for both ATO and TQ in the resistant strains showed only slight changes compared to the wild-type strain, with less than a onefold difference (Figure 5A, 5B). This minimal variation suggests that the resistant strain has a mild alteration in susceptibility to ATO and TQ, but not enough to indicate strong resistance or significant cross-resistance. This suggests that CIP could be used in combination with TQ to treat babesiosis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have tried to repurpose cipargamin (CIP), a known drug against plasmodium and toxoplasma against babesia. They proved the efficacy of CIP on babesia in the nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      The authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      Thank you for the comments and your time to review our manuscript.

      Weaknesses:

      The introduction section needs to be more informative. The authors are investigating the binding of CIP to the ATP4 gene, but they did not give any information about the gene or how the ATP4 inhibitors work in general. The resolution of the figures is not good and the font size is too small to read properly. I also have several minor concerns which have been addressed in the "Recommendations for the authors" section.

      We thank the reviewer for their valuable comments. In response, we have revised the introduction to include a more detailed explanation of the ATP4 gene, its biological significance, and the mechanism of ATP4 inhibitors to provide a better context of the study (Lines 86-93). Additionally, we have reformatted the figures to enhance resolution and increased the font size to ensure improved readability. We also appreciate the reviewer's careful assessment of the manuscript and have addressed all minor concerns outlined in the "Recommendations for the Authors" section. A detailed, point-by-point response to each concern is provided in the response letter, and the corresponding revisions have been incorporated into the manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro, growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na<sup>+</sup> ATPase that was found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin.

      We appreciate the reviewer for taking the time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. Exposure to cipargamin can induce resistance, indicating that cipargamin should not be used alone but in combination with other drugs. There was no attempt at testing cipargamin in combination with other drugs, particularly atovaquone, in the mouse model of Babesia microti infection. Given the difficulty in treating immunocompromised patients infected with Babesia microti, it would have been informative to test cipargamin in a mouse model of severe immunosuppression (SCID or rag-deficient mice).

      We thank the reviewer for raising these important comments. We address each concern as follows:

      (1) Identifying the lowest protective dose of CIP:

      Although our current study was designed to assess the efficacy of CIP at a single therapeutic dose over a 7-day period, we acknowledge that identifying the lowest effective dose would provide valuable information for optimizing treatment regimens. We plan to address this in future studies by conducting a dose-response experiment to identify the minimal protective dose of CIP.

      (2) Testing CIP in combination with other drugs:

      In the current study, we have tested the efficacy of tafenoquine (TQ) combined with CIP, as well as CIP or TQ administered individually, in a mouse model of B. microti infection. Our results demonstrated that, compared with monotherapy, the combination of CIP and TQ completely eliminated the parasites within 90 days of observation (Figure 5C).

      (3) Testing in an immunocompromised mouse model:

      We agree with the reviewer that evaluating CIP in immunocompromised models is critical for understanding its potential in treating immunocompromised patients. To address this, we have conducted experiments using SCID mice infected with B. microti. Our results indicated that the combination therapy of CIP plus TQ was effective in eliminating parasites in the severely immunocompromised model (Figure 5D).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: Table: Include the in-silico binding energies for each mutation and ligand.

      We have added binding energies for each mutation and ligand in Table supplement 3.

      Comment 2: Did the authors investigate the potential of combination therapies involving CIP?

      We have tested the efficacy of TQ combined with CIP in a mouse model of B. microti infection.

      Comment 3: Does this mutation affect the transmission of the parasite?

      Based on our observations, the growth and generation rates of the mutant strain are comparable to those of the wild-type strain. These findings suggest that the mutation does not significantly affect the spread or transmission of the parasite. We have included this observation in the revised manuscript (Lines 243-244).

      Comment 4: 60: Use abbreviations CLN for clindamycin and QUI for quinine.

      We have revised them accordingly (Lines 59-60).

      Comment 5: 86: The hypothesis is not strong or convincing; it needs to be modified to be more specific and convincing.

      We have revised the hypothesis to reflect the rationale behind the study better and to support our claim more strongly (Lines 94-97).

      Comment 6: 93: Change to: "In vitro efficacy of CIP against B. bovis and B. gibsoni.".

      We have changed the suggested content in the manuscript (Line 104).

      Comment 7: 96: Define CC<sub>50</sub>.

      We have added the definition of CC<sub>50</sub> (Line 106).

      Comment 8: 102: Change to: "...Balb/c mice increased dramatically in the...".

      We have changed the word following your recommendation (Line 114).

      Comment 9: 108: "...significant decrease at 12 DPI...".

      We have revised it according to your suggestion (Line 120).

      Comment 10: 110: "This indicates that the administration...".

      We have revised it according to your suggestion (Line 122).

      Comment 11: Figure 1:

      (1) Panels A and B should clearly indicate parasite species within the graph for better self-explanation.

      We have indicated parasite species within the graph.

      (2) For panels C, D, and E, if mice were eliminated or euthanized in the study, include a symbol in the graph to indicate this.

      For panels C and D, no mice were eliminated during the study; therefore, no symbol was added to these graphs. Panel F already provides information about the number of eliminated mice, which corresponds to the data in Panel E.

      (3) In panels C, D, and E, use a continuation arrow for drug treatment rather than a straight line, to cover the duration of the treatment.

      We have updated the figures to use continuation arrows instead of straight lines to represent the duration of drug treatment.

      Comment 12: Figure 2: The color combination for the WT and mutant curves is hard to read; consider using regular, less fluorescent, and more distinguishable colors.

      We have adjusted the color scheme to use more distinguishable and less fluorescent colors, ensuring better readability and clarity. The revised figure with the updated color scheme has been included in the updated manuscript, and we hope this resolves the readability concern.

      Comment 13: Figure 3:

      (1) Panel A: Represent a single infected iRBC rather than a field for better visualization.

      We have updated Panel A to display a single infected iRBC instead of a field.

      (2) Panels E and F: Change the color patterns, as the current colors, especially the green variants (WT and mutant L921V), are difficult to read.

      To improve readability, we have updated the color patterns for these panels by selecting more distinguishable colors with higher contrast (Figure 3F, 3G).

      Comment 14: Figure 4: Panels B, C, and D: The text is too small to read; increase the font size or change the resolution.

      We have increased the font size and replaced the panels with high-resolution versions (Figure 4B, 4C, 4D).

      Reviewer #2 (Recommendations for the authors):

      Comment 1: In the last paragraph of the introduction, the authors mentioned determining the activity of CIP in vitro in B. bovis and B. gibsoni while in vivo in B. microti and B. rodhaini. It is not explained why they are testing the in vitro and in vivo effects on different Babesia species. Could you please add some logic there? Also, why did they mention measuring the inhibitory activity of CIP by monitoring the Na<sup>+</sup> and H<sup>+</sup> balance? This part needs to be rewritten with more information. The ATP4 gene is not properly introduced in the manuscript.

      We thank the reviewer for raising these important points. Below, we address each aspect of the comment in detail:

      (1) Rationale for testing different Babesia spp. in vitro and in vivo:

      B. bovis and B. gibsoni are well-established Babesia models for in vitro culture systems, allowing evaluation of CIP's inhibitory activity under controlled laboratory conditions. B. microti and B. rodhaini, on the other hand, are commonly used rodent models for the in vivo studies of babesiosis, enabling the assessment of drug efficacy in a mammalian host system. This multi-species approach provides a comprehensive evaluation of CIP's efficacy across Babesia spp. with different biological characteristics.

      (2) Measuring CIP's inhibitory activity via Na<sup>+</sup> and H<sup>+</sup> balance:

      We acknowledge that this section of the introduction requires more context. The revised manuscript now includes additional information explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93). CIP disrupts the ion homeostasis maintained by ATP4, leading to an imbalance in Na<sup>+</sup> and H<sup>+</sup> concentrations. Monitoring these ionic changes provides a mechanistic understanding of CIP's mode of action and its impact on parasite viability. This rationale has been expanded in the introduction to clarify its significance.

      Comment 2: The figure fonts are too small. The resolution for the images is also poor.

      We have increased the font size in all figures to improve readability. Additionally, we have replaced the figures with high-resolution versions to ensure clarity and visual quality.

      Comment 3: Figures 1A and 1B: one of the error bars merged to the X-axis legend. Please modify these panels. Which curve was used to determine the IC<sub>50</sub> values (although it's mentioned in the methods section, would it be better to have the information in the figure legends as well)?

      We thank the reviewer for their comments regarding Figures 1A and 1B.

      (1) Error bars overlapping the X-axis legend:

      The error bars in the figures were automatically generated using GraphPad Prism9 based on the data and are determined by the values themselves. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      To clarify the determination of IC<sub>50</sub> values, we have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves. This visual representation provides clear information about the IC<sub>50</sub> points.

      Comment 4: Supplementary Figure 1: what are MDCK cells? What is CC<sub>50</sub>? Please mention their full forms in the text and figure legends (they should be described here because the methods section comes later). What is meant by a predicted selectivity index? There should be an explanation of why and how they did it. Which curve was used to determine the IC<sub>50</sub> values?

      We thank the reviewer for pointing out the need to clarify terms and provide additional context in the supplementary figure and text. We have updated the figure legend and text to include the full forms of MDCK (Madin-Darby canine kidney) cells and CC<sub>50</sub> (50% cytotoxic concentration), ensuring clarity for readers encountering these terms for the first time. In text, now we have included a brief explanation of the selectivity index as a measure of a drug's safety and specificity (Lines 108-110). The selectivity index is calculated as the ratio between the half maximal inhibitory concentration (IC<sub>50</sub>) and the 50% cytotoxic concentration (CC<sub>50</sub>) values (Lines 333-335). We also have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves (Figure supplement 1).

      Comment 5: Figures 1C-F: It feels unnecessary to write down n=6 for each panel and each group. Since "n" is equal for all, it would be nice to just mention it in the figure legend only.

      We appreciate the reviewer's suggestion regarding the notation of "n=6" in Figures 1C-F. To improve clarity and reduce redundancy, we have removed the "n=6" notation from the individual panels and included it in the figure legend instead.

      Comment 6: Figure 2A: was never mentioned in the text.

      We have described the sequencing results for the wild-type B. gibsoni ATP4 gene with a reference to Figure 2A in the revised manuscript (Lines 134-135).

      Comment 7: Figure 2D: some of the error bars merged to the X-axis legend. Please modify. Again, which curve was used to determine the IC<sub>50</sub> values? Can the authors explain why the pH declined after 4 minutes?

      We thank the reviewer for this insightful question.

      (1) Error bars overlapping the X-axis legend:

      The error bars in Figure 2E were automatically generated using GraphPad Prism9 and are determined by the underlying data values. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      Since Figure 2E contains three separate curves, adding dashed lines to indicate the IC<sub>50</sub> for each curve would make the figure overly cluttered and reduce readability. To address this, we have clearly indicated the IC<sub>50</sub> values in Figures 1A and 1B and described the methodology for determining IC<sub>50</sub> values in the Methods section. We believe this approach provides sufficient clarity without compromising the visual experience of Figure 2E.

      (3) The pH decline observed after 4 minutes (Figure 3E) may be attributed to the following factors:

      a. Ion transport dynamics:

      The initial rise in pH likely reflects the rapid inhibition of Na<sup>+</sup>/H<sup>+</sup> exchange mediated by CIP, which temporarily alkalinizes the intracellular environment. However, after this initial phase, compensatory mechanisms, such as proton influx or metabolic acid production, may lead to a subsequent decline in pH.

      b. Drug kinetics and target interaction:

      The decline could also result from the time-dependent effects of CIP on ATP4-mediated ion transport. As the drug action stabilizes, the parasite may partially restore ionic balance, leading to a decrease in intracellular pH.

      Comment 8: Supplementary Figure 2: It's difficult to distinguish between red and pink colors, so it would be wise to use two contrasting colors to distinguish between Pf and Tg CIP resistant cites.

      We have updated the figure to enhance clarity. Purple squares and arrows now represent sites linked to P. falciparum CIP resistance, replacing the previous red squares. Similarly, gray squares and arrows have replaced the green squares to denote sites associated with T. gondii (Figure supplement 2).

      Comment 9: Line 65: Is it possible to add a reference here?

      We have added a reference in line 65.

      Comment 10: Line 69: Please spell the full form of G6PD as it was never mentioned before.

      We have added the full form of G6PD in lines 69-70.

      Comment 11: Line 103: mention what DPI is (irrespective of the methods section which comes later).

      We have spelled out DPI (days postinfection) in line 115.

      Comment 12: Line 120: It's not explained why B. gibsoni ATP4 gene was investigated? There should be more explanation and references to previous work.

      We thank the reviewer for pointing out the need to provide more context for investigating the B. gibsoni ATP4 gene. To address this, we have added more information to the introduction, explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93).

      Comment 13: Line 203-219: line spacing seems different from the rest of the manuscript.

      We have corrected the incorrect format (Lines 262-278).

      Reviewer #3 (Recommendations for the authors):

      Comment 1: Lines 66-68: The report by Marcos et al. 2022 did not demonstrate that tafenoquine was effective in curing relapsing babesiosis. In the discussion of that article, the authors state that "it is impossible to conclude that the drug tafenoquine provided any clinical benefit." The first demonstration of tafenoquine efficacy against relapsing babesiosis was reported by Rogers et al. 2023 and confirmed by Krause et al. 2024. Please rephrase the statement and use relevant citations.

      We thank the reviewer for pointing out this issue and we have rephrased the statement and used relevant citations (Lines 66-68).

      Comment 2: Line 103: mean parasitemia at 10 DPI is reported to be 35.88% but Figure 1C appears to indicate otherwise.

      We are sorry for the carelessness, the correct mean parasitemia at 10 DPI is 38.55%, and this has been updated in line 115 of the revised manuscript to reflect the data shown in Figure 1C.

      Comment 3: Line 116: parasitemia is said to recur on day 14 post-infection but Figure 1E indicates that recurrence was already noted on day 12 post-infection.

      We thank the reviewer for pointing out this inconsistency. We have corrected the relapse day to reflect that recurrence was noted on day 12 post-infection, as shown in Figure 1E. This correction has been made in the revised manuscript (Line 128).

      Comment 4: Line 120: Replace "wells" with "strains". Also, start the paragraph with one brief sentence to state how resistant parasites were generated.

      We have replaced "wells" with "strains" and added one brief sentence to explain how resistant parasites were generated (Lines 132-134).

      Comment 5: Line 169: is Ji et al, 2022b truly the appropriate reference to support a statement on tafenoquine?

      We thank the reviewer for highlighting this point. We have added one other reference to support a statement on tafenoquine. The IC<sub>50</sub> value of TQ was 20.0 ± 2.4 μM against B. gibsoni (Ji et al., 2022b), and 31 μM against B. bovis (Carvalho et al., 2020) (Lines 223-225).

      Comment 6: Lines 184-185: given that exposure to CIP induces mutations in the ATP4 gene and therefore resistance to CIP, what is the prospect of using CIP for the treatment of babesiosis? Can the authors speculate on whether CIP should not be used alone but rather in combination with other drugs currently used for the treatment of human babesiosis?

      We thank the reviewer for raising this important question. Given that exposure to CIP induces mutations in the ATP4 gene, leading to resistance, we acknowledge that the long-term use of CIP as a monotherapy may be limited due to the potential for resistance development. To address this concern, we investigated the combination therapy of TQ and CIP to achieve the complete elimination of B. microti in infected mice (a model for human babesiosis). The results of this study are presented in Figure 5C.

      Comment 7: Lines 258-259: it is stated that drug treatment was initiated on day 4 post-infection when mean parasitemia was 1% and that drug treatment was continued for 7 days. This is not the case for B. rodhaini infection. As reported in Figure 1E, treatment was initiated on day 2 post-infection.

      We apologize for the oversight and any confusion caused. We have corrected the statement to reflect that drug treatment for B. rodhaini-infected mice was initiated at 2 DPI, as reported in Figure 1E (Lines 347-349).

      Comment 8: Lines 282-285: RBCs are said to be exposed to CIP for 3 days but parasite size is said to be measured on day 4. Which is correct?

      We thank the reviewer for pointing out this discrepancy. To clarify, the infected erythrocytes were exposed to CIP for three consecutive days (72 hours). Blood smears were then prepared at the 73<sup>rd</sup> hour, corresponding to the fourth day.

      Comment 9: Lines 35-37: this sentence can be omitted from the abstract as it does not summarize additional insight or additional data.

      We have omitted this sentence from the abstract.

      Comment 10: Line 55: replace Drews et al. 2023 with Gray and Ogden 2021 (doi: 10.3390/pathogens10111430). This excellent article directly supports the statement made by the authors.

      We appreciate the reviewer's suggestion and have replaced the reference with Gray and Ogden, 2021 (doi: 10.3390/pathogens10111430) (Line 54).

      Comment 11: Line 55: modify the start of sentence to read "The disease is known as babesiosis ...".

      We have modified the sentence (Line 54).

      Comment 12: Line 56: rephrase to read ".... but chronic infections can be asymptomatic".

      We have modified the sentence (Line 55).

      Comment 13: Line 57: rephrase to read "The fatality rate ranges from 1% among all cases to 3% among hospitalized cases but has been as high as 20% in immunocompromised patients."

      We have rephrased the sentence (Lines 55-57).

      Comment 14: Line 61: replace Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216).

      We have replaced Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216) (Line 60).

      Comment 15: Line 62: rephrase to read "... cytochrome b, which is targeted by atovaquone, were identified in patients with relapsing babesiosis." Here, also cite Lemieux et al., 2016; Simon et al., 2017; Rosenblatt et al, 2021, Marcos et al., 2022; Rogers et al., 2023; Krause et al., 2024.

      We have rephrased the sentence and cited the suggested references (Lines 61-64).

      Comment 16: Line 65: rephrase "Despite its efficacy, this combination can elicit adverse drug reactions (Vannier and Krause, 2012)."

      We have rephrased the sentence (Lines 65-66).

      Comment 17: Lines 75-77: rephrase to read "... of the drug indicated that CIP taken orally had good absorption, a long half-life, and ...".

      We have rephrased the sentence (Lines 76-77).

      Comment 18: Line 79: remove "the".

      We have removed "the" (Lines 79-80).

      Comment 19: Lines 83-85: rephrase to read "Mice infected with T. gondii that were treated with CIP on the day of infection and the following day had 90% fewer parasites 5 days post-infection (Zhou et al., 2014).".

      We have rephrased the sentence (Lines 83-85).

      Comment 20: Line 90: shorten the sentence to end as follows "... of CIP on Babesia parasites.".

      We have shortened the sentence in line 100 with your suggestion.

      Comment 21: Line 96: spell out CC<sub>50</sub>.

      We have spelled out the full form of CC<sub>50</sub> (Line 106).

      Comment 22: Line 104: remove "of body weight".

      We have removed "of body weight" (Line 116).

      Comment 23: Line 108: delete "from 8 DPI to 24 DPI, with statistically significant decreases".

      We have deleted "from 8 DPI to 24 DPI, with statistically significant decreases" (Line 120).

      Comment 24: Line 111: start a new paragraph with the sentence "BALB/c mice infected ...".

      We have started a new paragraph with the sentence "BALB/c mice infected ..." (Line 124).

      Comment 25: Line 123: replace "showed" with "occurred".

      We have replaced "showed" with "occurred" (Line 138).

      Comment 26: Line 127: rephrase to read "... sensitivity of the resistant parasite lines ...".

      We have rephrased the sentence (Line 144).

      Comment 27: Lines 137-140: rephrase to read ".... lines were lower when compared with ..." .

      We have rephrased the sentence (Line 158).

      Comment 28: Line 149: replace "BgATP4" with "B. gibsoni ATP4".

      We have replaced "BgATP4" with "B. gibsoni ATP4" (Line 183).

      Comment 29: Line 154: spell out "pLDDT" prior to pLDDT.

      We have provided the full form of pLDDT in the revised manuscript (Line 188).

      Comment 30: Lines 165-166: rephrase to read "CIP is a novel compound that inhibits Plasmodium development by targeting ATP4 and has been ...".

      We have rephrased the sentence (Lines 219-220).

      Comment 31: Lines 171-172: rephrase to read "...AZI, the combination recommended by the CDC in the United States.

      We have rephrased the sentence (Lines 226-227).

      Comment 32: Line 173: rephrase to read "... B. rodhaini infection, with survival up to 67%.".

      We have rephrased the sentence (Line 228).

      Comment 33: Lines 175-178: rephrase to read "In a previous study, a P. falciparum Dd2 strain that acquired resistance to CIP carried the G358S mutation in the ...".

      We have rephrased the sentence (Lines 230-231).

      Comment 34: Lines 179-180: rephrase to read "ATP4 is found in the parasite plasma membrane and is specific to the subclass of apicomplexan parasites.".

      We have rephrased the sentence (Lines 232-233).

      Comment 35: Lines 182-184: rephrase to read "In another study of Toxoplasma gondii, a cell line that carried the mutation G419S in the TgATP4 gene was 34 times ...".

      We have rephrased the sentence (Lines 235-237).

      Comment 36: Lines 201-202: deleted the last sentence of this paragraph.

      We have deleted the last sentence of the paragraph (Line 261).

      Comment 37: Line 228: rephrase to read "... that CIP had a weaker binding to BgATP4<sup>L921I</sup> than to BgATP4<sup>L921V</sup>.".

      We have rephrased the sentence (Lines 294-295).

      Comment 38: Lines 261-262: please state that drugs were prepared in sesame oil. Add "20 mg/kg" in front of AZI.

      We have stated that drugs were prepared in sesame oil and added "20 mg/kg" in front of AZI (Lines 350-352).

      Comment 39: Line 265: replace "care" with "treatments".

      We have replaced "care" with "treatments" (Line 355).

      Comment 40: Line 267: replace "observe" with "assess".

      We have replaced "observe" with "assess" (Line 357).

      Comment 41: Lines 269-271: please provide the absolute numbers of B. gibsoni infected RBCs and the absolute numbers of uninfected RBCs that were added to the culture medium.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included the absolute numbers of B. gibsoni-infected RBCs and uninfected RBCs added to the culture medium. Specifically, the culture medium contained 10 μL (5×10 <sup>6</sup>) B. gibsoni iRBCs mixed with 40 μL (4×10 <sup>8</sup>) uninfected RBCs (Lines 360-361).

      Comment 42: Line 279: replace "confirmed" with "identified".

      We have replaced "confirmed" with "identified" (Line 370).

      Comment 43: Figure Supplement 2: the squares are not readily visible. Could the entire column corresponding to the mutation position be highlighted?

      We thank the reviewer for this suggestion. To improve visibility, we have changed the color of the squares and added arrows to make the mutation sites as prominent as possible. Unfortunately, due to software limitations, we were unable to highlight the entire column corresponding to the mutation position.

      Comment 44: Figure Supplement 4: for the parasite that carries a mutation in BgATP4, please delete the arrows that are next to BgATP4. These arrows send the message that the mutation ATP4 has an active role in pumping back Na<sup>+</sup> and H<sup>+</sup> back in their compartment, which is not the case.

      We thank the reviewer for their observation. The dotted arrows next to BgATP4 are intended to indicate the recovery of H<sup>+</sup> and Na<sup>+</sup> balance facilitated by the mutated ATP4, which reduces susceptibility to ATP4 inhibitors. To avoid potential confusion, we have revised the figure legend to clearly explain the role of the arrows, ensuring the intended message is accurately conveyed.

    1. eLife Assessment

      This important study utilizes humanized mice, in which human immune cells are introduced into immune-deficient mice, to provide convincing evidence that two helper CD4 T-cell subsets, T-follicular helper (Tfh) and T-peripheral helper (Tph) cells, are able to drive both autoantibody production and induction of autoimmunity. The work will be of broad interest to medical scientists engaged in deciphering how human immune cells mediate immune responses and contribute to the development of autoimmune diseases.

    2. Reviewer #1 (Public review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of Immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this Vecchione et al, have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres in which Tfh may exert some of their key functions. In some cases, the definition of Tph-like does not seem to differentiate well between Tph and highly activated CD4 T-cells in general, partly since the literature around these cells has not fully resolved this point.

    3. Reviewer #2 (Public review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to use humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debateble, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, such as Fig. 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      The authors added the data about FOXP3 expression among Tfh/Tph cells in the revised manuscript. This improved our data interpretation.

      (2) The definition of "Disease" discussed after Fig. 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      The authors appropriately modified the manuscript and provided sufficient information about the definition of diseases.

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Hu-derived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Fig. 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial.

      The authors correctly cited their previous findings about the TCR repertoire variation. This strengthened the discussion of this study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this, Vecchione et al have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres. For Tfh it is unclear how we can interpret their function without the structure where they have the greatest influence. In some cases, the definition of Tph does not seem to differentiate well between Tph and highly activated CD4 T-cells in general.

      The limited ability of HIS mice to generate well-defined lymphoid tissue structures is well noted. While the emergence of T cells in HIS mice increases the size of lymphoid tissues, the structure remains suboptimal and vaccination responses are limited. We believe this is mainly due to the common gamma chain knockout, which results in a lack of murine lymphoid tissue inducer (LTi) cells, which require IL-7 signaling to interact with murine mesenchymal cells for normal lymphoid tissue development. Ongoing efforts by our group and others aim to address this challenge by providing the necessary signals. Despite this challenge, these mice do develop Tfh cells, allowing us to study this cell subset.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1 expression, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Reviewer #2 (Public Review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to using humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debatable, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, for example in Figure 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      We analyzed the % FOXP3+ cells and the % of ICOS+ cells within the Tfh and Tph cells in the spleen of Hu/Hu and Mu/Hu mice at 20 weeks post-transplantation. Importantly, we see no difference in FOXP3 expression between Tfh of Mu/Hu and Hu/Hu mice. The results have been added to panels J and K of Figure 2. 

      (2) The definition of "Disease" discussed after Figure 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      We have added a definition of disease to the Methods section as requested. Regarding the possibility of antibody-mediated disease that may be missed by this definition, we acknowledge this point in the Discussion section. However, we also discuss the point that the deficient complement pathway in NSG mice is likely to have protected the HIS mice from autoantibody-mediated organ damage.

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Huderived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Figure 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial. 

      Consistent with the reviewer’s suggestion, we have previously shown that the TCR repertoire in Mu/Hu mice is less diverse than that in Hu/Hu mice (Khosravi-Maharlooei M, et al., J Autoimmun., 2021). We believe that the narrowed TCR repertoire in the periphery of Mu/Hu mice, combined with the inadequate negative selection in the murine thymus reported in the paper cited above, results in selective peripheral expansion primarily of the few T cell clones that are cross-reactive with HLA/murine self peptide complexes presented by human APCs in the periphery.  We have discussed the reasons why these cells, when transferred to secondary recipients containing the same APCs, might not be as active as the more diverse, HLA-selected T cell repertoire transferred from Hu/Hu mice.  These possible reasons include exhaustion of the T cells in Mu/Hu mice, limited expression of the few targeted HLA-peptide complexes recognized by the narrow cross-reactive TCR repertoire of Mu/Hu T cells and the consequent relatively impaired T-B cell collaboration in these mice.   

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      The authors note that they removed an outlier result from Figures 1 B & C. With only 4 mice it seems difficult to see exactly how they determined the result was an outlier. Presumably, it was quite different from the others but in such a small dataset removing data without a very clear statistical rationale seems likely to strongly influence the results.

      We have revised Fig 1 to include the previously-deleted outlier mouse.   

      Figure 4. The authors describe the follicular area. Were they able to observe any GC-like structures in their data?

      From the examples, I can see that the PNA staining is sometimes diffuse but even if the authors felt they could not observe a distinct GC this should be stated and discussed in the text.

      We now describe the three colors IF staining in more detail in accordance with this comment. We characterized 4 Hu/Hu and 3 Mu/Hu spleens earlier than 20 weeks post-transplant. In all of these mice, distinct B cell areas (CD20+) were obvious and PNA+ cells were more concentrated in the B cell zones. We stained 4 Hu/Hu and 3 Mu/Hu spleens from mice between 20-30 weeks post-transplant and found that B cell areas were smaller in all these spleens compared to those taken before 20-weeks post-transplant. PNA+ areas are also more diffusely distributed and are not enriched in the B cell areas. Only 2 Mu/Hu mice showed clear B cell zones with some enriched PNA+ areas in the B cell zones. Additionally, we stained 2 Hu/Hu and 2 Mu/Hu mice later than week 30 post-transplant. No distinct B cell areas were observed in any of the spleens of these mice and PNA+ cells were diffusely distributed.  

      In Figure 3E the authors sort CD25-CXCR5-CD45RA- CD4 T-cells as Tph. This does seem a very loose definition including essentially all non-naïve CD4 cells that are not Tregs or Tfh.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Tph is sometimes a hard cell type to separate from more general highly activated CD4 T-cells. The broad CXCR5PD1+ phenotype they have used is common in the literature and the authors have confirmed some enrichment of IL21 production by these cells. However, they should consider if there are ways of further confirming this by examination of other markers such as CCR2 and CCR5 or elimination of other effector identities such as Th1 and Th17 or PD1+ exhaustion phenotypes.

      For this study, we chose to follow the commonly used definitions in the literature for Tph and Tfh cells. For this reason, we are careful to refer to “Tph-like” cells rather than Tph cells in this manuscript. Distinguishing Tph cells from other subsets of activated CD4 cells would require further studies such as single cell RNA seq, which we hope to be able to perform in the future with additional funding.  

      Figure 8. The authors perform some analysis of B-cell phenotypes looking at markers such as CD27, IgD in 8B, and CD11c in 8C. Why is CD11c considered in isolation? The level of expression of the other markers would change how this data would be interpreted e.g. IgD-CD27-CD11c+ = DN2/Atypical cells, IgD-CD27+CD11c+ = Activated or ageassociated, etc.

      In response to this comment, we reanalyzed the splenic samples of the donor Mu/Hu and Hu/Hu mice and their adoptive recipients. Interestingly, in the T cell donors, the Mu/Hu B cells included greater proportions of activated/age-associated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+), compared to the Hu/Hu B cells. This is consistent with the increased disease, increased Tph/Tfh and increased IgG antibody findings in the primary Mu/Hu compared to Hu/Hu mice. These results have been added to Figure 5G. We performed a similar analysis in the blood (week 9) and spleen of adoptive recipient mice. These studies showed that activated/ageassociated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+) were significantly increased in the adoptive recipients of Hu/Hu Tph and Tfh cells compared to the adoptive recipients of Mu/Hu Tph and Tfh cells (Fig. 8C). These results are consistent with the disease, T cell expansion and antibody results in the adoptive recipients. 

      Data not shown occurs often in this manuscript. In some cases what is not shown is potentially important. The authors note in the text relating to Figure 7 that the "purity of the cell populations as assessed by FCM ranged from 56-60% (data not shown)". Those numbers are a little alarming. They are referring to the purity of the FCS sorted Tfh and Tph prior to transfer? Currently, some of the discussion of this paper is about the possibility of plasticity, with Tfh switching into a Tph phenotype. If the transferred cell populations are 56-60% pure I don't think it is possible to make any interpretation of plasticity.

      We looked into this further and realized that the purity figure cited in the original manuscript was erroneous due to a misunderstanding on the part of the first author of a question from the senior author. Unfortunately, data on the purity of the FACS-sorted population was not saved. However, we have added panel B to Figure 7 to show the sorting strategy for Tfh and Tph cells.   We agree that any discussion of plasticity between these cell types is speculative, as outgrowth of a minor population is possible even from well-purified sorted cells.  

      Minor points:

      Some graphs have issues with presentation; Figures 5D and 5E, split scale clips data points. 5F the color representing time would be better replaced with direct labels. 6C and 6C some distortion of text clipping other elements.

      We changed 5D and 5E y axis scales to avoid cutting the data points. Also, we changed 5F labels. Distortion of text clipping and other elements in Fig 6E and 6A have been corrected.  

      The abbreviation LIP is used in the abstract without a clear definition until later in the text.

      This abbreviation has been defined again in the text.

      Generally, the discussion section is quite long.

      We agree that the discussion is quite long, but the results are quite complex and require considerable discussion.  We have attempted to be as concise as possible.

      Reviewer #2 (Recommendations For The Authors):

      Suggestion

      Can Supplementary Figures be merged into the mains for the convenience of readers? There is enough extra margin.

      We prefer to keep the order of main and supplementary figures as they are. 

      There are some confusing results which I would recommend to make the additional explanation for readers. For example, about 10% of Hu/Hu CD3+ T cells reacted to Auto-DC in Figure 1B, but neither CD4+ nor CD8+ cells did in Figure 1C.

      We have re-analyzed the data in Fig 1 and included the previously-deleted outlier mouse. 

      Minor

      Figure 3C

      The figure legend does not explain the figure. Hu/Mu or Mu/Mu?

      Both groups were combined in the figure, as the results were similar for both.  The N per group is given in the figure legend.  The same applies to figure 3D.

      Figure 4B, 4C

      Why were Hu/Hu and Mu/Hu data merged only in 4B? They should be discussed in the context of parallel comparison. Both y-axis labels are the same between B and C despite the legend saying differently.

      We switched the order of Figure 4B and 4C, each of which serves a different purpose. Figure 4B aims to demonstrate the similarity between the two groups at each timepoint.  Figure 4C combines the two groups in order to provide sufficient animal numbers to demonstrate the statistically significant changes over time. 

      Figure 5D

      The axis label was missing and the uncertain bar emerged. The authors should replace it with the corrected one.

      The axis and the bar in 5D have been corrected.

      Figure 5F

      The legend does not explain the figure. What are these numbers? Also, it is better if the authors add a detailed explanation to the manuscript about the reason why the sum of antibody titer represents the poly-reactivity of IgM in these mice.

      The numbers in the previous version of the figure were eartag numbers, which we have now renumbered as animal 1,2,3, etc in each group. Please refer to the final paragraph of the "Autoreactivity of IgM and IgG in HIS Mice" section in the Results section for an explanation of IgM polyreactivity.

      Fig. 7D-E etc.

      The definition of Asterisk is insufficient. Between what to what in the multiple comparisons?

      The green asterisks show significant differences between the Tph in Hu/Hu vs Mu/Hu mice, while the orange asterisks show significant differences between the Tfh in Hu/Hu vs Mu/Hu mice. This has been added to the figure legend.

      Figure 7 ~ Figure 8

      The legends on the figure are confusing due to the different order of figures. The scales are inappropriate in some figures. The readers cannot interpret the data from the unfairly compressed plots.

      We made the plots bigger to make them readable and changed the order.

      Methods

      In the description of B cell depletion Experiments, the authors should directly mention the figure number instead of "In the second Experiment ..."

      We have corrected this in the Methods section.

      There is no definition of how to define the "disease" onset.

      This definition has been added to the Methods section.

      Several undefined abbreviations: "LIP", "BLT" ...

      We defined these in the text.

    1. eLife Assessment

      This important paper on measuring molecular connectivity using combined serotonin PET and resting-state fMRI provides both novel methods for studying the brain as well as insights into the effects of ecstasy administration. The methods are convincing, with the high anaesthetic dose used likely limiting network activity.

    2. Reviewer #1 (Public review):

      This paper by Ionescu et al. applies novel brain connectivity measures based on fMRI and serotonin PET both at baseline and following ecstasy use in rats. There are multiple strengths to this manuscript. First, the use of connectivity measures using temporal correlations of 11C-DASB PET, especially when combined with resting state fMRI, is highly novel and powerful. The effects of ecstasy on molecular connectivity of the serotonin network and salience network are also quite intriguing.

      The authors discussed their use of high-dose (1.3%) isolfurane in the context of a recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") which found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. The authors acknowledge their suboptimal anaesthetic regimen, which was chosen before the publication of the consensus paper. This likely explains, in part, why fMRI ICs in figure 2A appear fairly restricted.

      The PET ICs appear less bilateral than the fMRI ICs, which the authors attribute to lower SNR.

    3. Reviewer #2 (Public review):

      Summary:

      The article aims to describe a novel methodology for the study of brain organization, in comparison to fMRI functional connectivity, under rest vs. controlled pharmacological stimulation.

      Strengths:

      Solid study design with pharmacological stimulation applied to assess the biological significance of functional and (novel) molecular connectivity estimates.

      Provides relevant information on the multivariate organization of serotoninergic system in the brain.

      Provides relevant information on the sensitivity of traditional (univariate PET analysis, fMRI functional connectivity) and novel (molecular connectivity) methods in measuring pharmacological effects on brain function.

      Comments on revisions:

      I thank the authors for carefully addressing my comments and in particular for the interesting insights added to the discussion.

      I have just one last remark pertaining to the point of the sample size: rats undergoing the MDMA acute challenge constitute a relatively small sample (N=11); I feel there is a certain risk the results presented might not be particularly replicable. Could the authors prove the stability of their (main) results by randomly iterating the individuals included in their sample (e.g. via permutation tests)? Alternatively, including at least a justification of the sample size in the context of the available evidence would be valuable.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1- I would like the authors to discuss and justify their use of high-dose (1.3%) isolfurane. A recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. To overcome any doubts about the effects of the high-dose anaesthetic I'd encourage the authors to show the results of their functional connectivity specificity using the same or similar image processing protocol as described in that consensus paper. This is especially true since the fMRI ICs in Figure 2A appear fairly restricted.

      We thank the reviewer for their insightful comments. We agree that the combination of medetomidine and isoflurane, as recommended by Grandjean et al. in their consensus paper, provides superior physiological stability and fMRI signal quality, and should indeed be considered the preferred protocol for future studies. In fact, we have adopted this combination in our subsequent research [1]. However, the data acquired in the present study were acquired prior to the publication of the consensus recommendations and have been previously published [2, 3]. While isoflurane is not the ideal anesthetic for functional connectivity studies, we have demonstrated in earlier work [4], that using isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression, a key issue with higher isoflurane doses.

      Regarding preprocessing, we acknowledge the importance of standardized approaches as outlined in the consensus paper. However, to maintain methodological consistency with our prior work, we retained the original preprocessing pipeline for this study. This decision ensures comparability with our previous analyses. To address the reviewer’s concerns and encourage further verification, we have uploaded the full dataset to a public repository (as suggested in Comment 4). This will enable other researchers to reanalyze the data using updated preprocessing pipelines or explore additional analyses.

      We have updated the manuscript discussion (page 19) to clearly acknowledge these points:

      “One limitation of our study is that our experimental protocols predate the recently published consensus recommendations for rat fMRI [42], particularly concerning anesthesia and preprocessing pipelines. The use of isoflurane anesthesia, although common at the time of data acquisition, introduces a potential confound due to its known effects on neuronal activity. However, we previously demonstrated that isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression [43], a concern at higher doses. Furthermore, other studies have reported that low-dose isoflurane remains feasible for resting-state functional connectivity studies [44]. While isoflurane, as a GABA-A agonist, could theoretically interact with the mechanisms of MDMA in the brain, we found no evidence in the literature suggesting significant cross-talk between these substances. Future studies employing medetomidine-based protocols may help minimize this potential confound.

      Regarding data preprocessing, we chose to retain the same pipeline used in our prior publications [13, 14] to maintain methodological consistency. While we recognize the advantages of adopting standardized preprocessing as outlined in the consensus guidelines, this approach ensures comparability with our previous analyses. To facilitate further investigation, we have made the full dataset publicly available (see Data Availability Statement), enabling reanalysis with updated pipelines or additional explorations of this dataset.”

      Comment 2 - I'd also be interested to read more about why the cerebellum was chosen as a reference region, given that serotonin is highly expressed in the cerebellum, and what effects the choice of reference region has on their quantification.

      This is something we ourselves have examined in a paper, dedicated to determine the most suitable reference region for [11C]DASB, and while the reviewer is correct in saying there is also serotonin in the cerebellum, we found the lowest binding for this tracer in the cerebellar gray matter, recommending this region as a valid reference area. (“Displaceable binding of (11)C-DASB was found in all brain regions of both rats and mice, with the highest binding being in the thalamus and the lowest in the cerebellum. In rats, displaceable binding was largely reduced in the cerebellar cortex”, please refer to [5]).

      We amended our materials and methods part to specify that we had shown in this previous publication that the cerebellar gray matter is appropriate as a reference region (page 6):

      “Binding potentials were calculated frame-wise for all dynamic PET scans using the DVR-1 (equation 1) to generate regional BPND values with the cerebellar gray matter as a reference region, which our earlier studies have demonstrated to be the most appropriate for this tracer in rats [5, 6]:”

      Comment 3 - The PET ICs appear less bilateral than the fMRI ICs. Is that simply a thresholding artefact or is it a real signal?

      We thank the reviewer for this observation. The reduced bilaterality of PET ICs compared to fMRI ICs is likely due to the inherent limitation in the temporal resolution of PET, which provides significantly fewer frames (100 frames compared to 3000 frames for fMRI). This lower temporal resolution leads to reduced signal-to-noise ratio when computing the ICA, which can affect the stability and symmetry of the ICs during ICA computation, particularly at higher IC numbers. While thresholding may also a minor role, we believe the primary factor is poorer SNR associated with the PET data. We have clarified this point in the discussion section (page 17) as follows:

      “In our analysis, PET ICs appeared less bilateral than fMRI ICs. This is likely due to the lower temporal resolution of PET (100 frames) compared to fMRI (3000 frames), resulting in reduced signal-to-noise ratio (SNR) and potentially affecting the stability and symmetry of the independent components.”

      Comment 4 - "The data will be made available upon reasonable request" is not sufficient - please deposit the data in an open repository and link to its location.

      We agree with the request of the reviewer and uploaded the data to a Dryad repository. We amended our Data Availability Statement accordingly.

      Comment 5 (recommendation) - Please add the age and sex of the rats in lines 92-97.

      Amended.

      Comment 6 (recommendation) - There are multiple typos throughout the manuscript - for example, "z-vlaue" on line 164, "negligable" on line 194, etc.. Sometimes the 11 in 11C is superscripted, sometimes it isn't. This paper would benefit from a careful proofread.

      Thank you for pointing this out. We sent the manuscript for language and grammar editing to AJE (see certificate).

      Reviewer 2:

      Comment 1 - While the study protocol is referenced in the paper, it would be useful to at least report whether the study uses bolus, constant infusion, or a combination of the two and the duration of the frames chosen for reconstruction. Minimal details on anesthesia should also be reported, clarifying whether an interaction between the pharmacological agent for anesthesia and MDMA can be expected (whole-brain or in specific regions).

      We fully agree that this would improve the readability of our manuscript and added the information to the materials and methods and discussion accordingly. Please refer to page 4/5.

      Comment 2 - Some terminology is used in a bit unclear way. E.g. "seed-based" usually refers to seed-to-voxel and not ROI-to-ROI analysis, or e.g. it is a bit confusing to have IC1 called SERT network when in fact all ICs derived from DASB data are SERT networks. Perhaps a different wording could be used (IC1 = SERT xxxxx network; IC2= SERT salience network).

      Based on the reviewer´s suggestion, we suggest to rename IC1 and IC2 according to their anatomical and functional characteristics (page 13):

      “IC1 = SERT Salience Network: This name highlights the involvement of the regions typically associated with the salience network (e.g., CPu, Cg, NAc, Amyg, Ins, mPFC), which play key roles in emotional and cognitive processing.”

      “IC2 = SERT Subcortical Network: This name reflects the involvement of subcortical regions which play a role in arousal, stress response, and autonomic regulation, which are heavily modulated by serotonin in areas like the hypothalamus, PAG, and thalamus.”

      Comment 3 - The limited sample size for the rats undergoing pharmacological stimulation which might make the study (potentially) not particularly powerful. This could not be a problem if the MDMA effect observed is particularly consistent across rats. Information on inter-individual variability of FC, MC, and BPND could be provided in this regard.

      We thank the reviewer for raising this point. To address the concern about limited sample size and inter-individual variability, we have added this information to Figures 5 B and D. Regarding the BPND variability, the dotted lines in Figure 3 indicate the standard deviation in the regional BPNDs, however, this was not clearly stated in the original figure description. We have now amended the figure legend to explicitly clarify this point.

      Comment 4 (recommendation) - "Our research employs a novel approach named "molecular connectivity" (MC), which merges the strengths of various imaging methods to offer a comprehensive view of how molecules interact within the brain and affect its function." I'd recommend rephrasing to "..how molecular interact across different areas within the brain..". Molecular connectivity is a potentially ambiguous term (used to study interactions across different molecules (in the same compartment/environment) vs. to study interactions across the same molecules in different areas). I'd add a couple of references to help the reader disambiguate too (e.g. https://pubmed.ncbi.nlm.nih.gov/30544240/ , https://pubmed.ncbi.nlm.nih.gov/36621368/)

      We appreciate the reviewer’s suggestion and agree that the term "Molecular Connectivity" could be ambiguous. To clarify, we rephrased the description to emphasize that our approach specifically examines interactions of the same molecule (i.e., serotonin transporter) across different brain regions, rather than interactions between different molecules within the same environment. We propose the following revised text (page 2):

      “Our research employs a novel approach termed molecular connectivity (MC), which combines the strengths of various imaging methods to provide a comprehensive view of how specific molecules, such as the serotonin transporter, interact across different brain regions and influence brain function.”

      Additionally, we will incorporate the suggested references to help the reader further contextualize the use of this term.

      Comment 5 - In the methods, it is not clear if for MC the authors also compute ROI-to-ROI correlations or only ICA.

      Thank you for highlighting this point. To clarify, our MC analysis, includes both ROI-to-ROI correlations and ICA. Specifically, as described at the end of the “Molecular Connectivity Analysis” subchapter, we compute ROI-to-ROI correlations using the following steps: 1. The first 20 minutes of each scan are discarded to account for perfusion effects. 2. A detrending approach is applied to the remaining 60 minutes of BP<sub>ND</sub> time courses. 3. ROI-to-ROI calculations are then calculated and organized into subject-level correlation matrices, which are subsequently z-transformed to generate mean correlation matrices across subjects.

      We revised the methods section to explicitly state that both ROI-to-ROI correlations and ICA are integral components of the MC analysis to ensure this point is clear to readers (page 6).

      “The BP<sub>ND</sub> time courses were then used to calculate MC as described above for fMRI: ROI-to-ROI subject-level correlation matrices between all regional time courses were generated and z-transformed correlation coefficients were used to calculate mean correlation matrices.”

      Comment 7 - In the discussion, it could be useful to relate IC1 and IC2 to well-established neuroanatomical/molecular knowledge of the serotoninergic system. Did the authors expect the IC1 and IC2 anatomical distributions? is there a plausible biological reason as to why the time courses of BPnd variations would be somehow different between IC1 and IC2?

      We appreciate the reviewer’s insightful comment and agree on the importance of relating IC1 and IC2 to well-established neuroanatomical and molecular knowledge of the serotonergic system.

      In our discussion, we noted that IC1 primarily encompasses subcortical structures such as the brainstem, midbrain, and thalamus. These regions are consistent with areas housing dense serotonergic projections originating from the raphe nuclei, the primary source of serotonin release. In contrast, IC2 involves limbic and cortical regions - including the striatum, amygdala, cingulate, insular, and prefrontal cortices - which are key targets of the serotonergic pathways. This anatomical distinction aligns with the hierarchical organization of the serotonergic system, where the brainstem nuclei exert both local and distal serotonergic modulation.

      The observed differences in the temporal dynamics of the binding potential (BP<sub>ND</sub>) variations between IC1 and IC2 likely reflect the distinct functional roles of these regions within the serotonergic network. The more immediate changes in IC1 could be attributed to the direct effect of MDMA on the raphe nuclei, leading to rapid serotonin release in subcortical structures. In contrast, the delayed changes in IC2 may reflect downstream modulation in cortical and limbic regions involved in processing more complex emotional and cognitive functions.

      That said, while these interpretations are plausible based on current neuroanatomical and functional knowledge, the exact biological mechanisms underlying the differential time courses remain unclear. As discussed in the manuscript, future studies incorporating direct, simultaneous measurements of serotonin levels and imaging data will be essential to fully elucidate the temporal and spatial dynamics of serotonin transmission in these regions. We have revised to better highlight this limitation in the discussion section (page 17) as an important area for further investigation:

      “Our results demonstrate that compared with FC, MDMA induces more pronounced changes in MCs, particularly in regions associated with the SERT subcortical network. The distinct temporal dynamics of BPnd variations between these components may reflect the hierarchical organization of the serotonergic system. Specifically, the raphe nuclei, as the primary source of serotonin, are likely to exert more immediate modulation on posterior subcortical structures (IC2), whereas downstream effects on limbic and cortical regions (IC1) may occur more gradually. While these findings align with current neuroanatomical and molecular knowledge, the precise biological mechanisms driving these temporal differences remain unclear. Future investigations are warranted to elucidate these mechanisms. Future studies combining direct measurements of serotonin levels with neuroimaging data will be critical to fully understanding these components’ distinct roles and temporal profiles in regulating serotonergic function.”

      Comment 8 - In the discussion (physiological basis), could the authors detail the expected "time scale" in changes in SERT expression? How quickly can SERT expression change, especially under resting-state conditions? Is it reasonable to consider tracer fluctuations under rest conditions as biologically meaningful?

      SERT regulation can occur over different time scales depending on the mechanism involved [7].

      Acute, rapid changes (milliseconds to seconds): Protein-protein interactions with key regulatory proteins (e.g., syntaxin1A, neuronal nitric oxide synthase) can lead to rapid modulation of SERT surface expression [8-11]. These interactions often involve changes in transporter trafficking or conformational states and can occur within milliseconds to seconds. For example, syntaxin1A directly interacts with the N-terminus of SERT, influencing its availability on the plasma membrane within short timescales.

      Intermediate time scales (seconds to minutes): Posttranslational modifications, such as phosphorylation by kinases (e.g., protein kinase C) or dephosphorylation by phosphatases, are known to influence SERT function and surface expression [12-14]. These processes are typically initiated in response to cellular signaling and occur over seconds to minutes, affecting the SERT trafficking dynamics and serotonin uptake capacity [15, 16].

      Longer-term changes (minutes to hours): Longer-term regulation involves processes like endocytosis, recycling, or degradation of SERT. These pathways typically take minutes to hours and are often part of more sustained cellular responses to changes in neuronal activity or serotonin levels. Such changes are slower but contribute to the overall cellular homeostasis of SERT under prolonged stimulation.

      Under resting-state conditions, where neurons are not subjected to rapid or dramatic fluctuations in neurotransmitter release or signaling, SERT expression and activity are generally stable but still subject to subtle fluctuations due to ongoing basal regulatory processes. Basal phosphorylation or low-level protein-protein interactions can still dynamically modulate SERT trafficking and function, albeit at a lower intensity than under stimulated conditions. These fluctuations, although smaller in magnitude, may reflect fine-tuning of serotonin homeostasis and can occur on shorter timescales (seconds to minutes).

      Biological Relevance of Tracer Fluctuations at Rest:

      It is reasonable to consider that tracer fluctuations under resting conditions could reflect biologically meaningful variations in SERT expression and function. Even subtle shifts in SERT surface availability or activity can impact serotonin clearance and signaling, given the fine balance required to maintain serotonergic tone. These fluctuations may reflect intrinsic neuronal variability or ongoing homeostatic adjustments to maintain optimal neurotransmitter levels or serve as early indicators of adaptive responses to environmental or physiological changes before more overt modifications in transporter expression or activity become apparent.

      In summary, while SERT expression can change rapidly in response to signaling events (milliseconds to minutes), even under resting-state conditions, subtle regulatory fluctuations can be biologically meaningful. These fluctuations likely reflect ongoing regulatory adjustments essential for maintaining serotonergic balance and should not be disregarded as noise, particularly in experimental measurements using tracers.

      We added the following paragraph to the discussion (page 16):

      In addition, SERT regulation occurs over multiple time scales, ranging from milliseconds to hours, depending on the mechanism involved [31]. Rapid changes in SERT surface expression can be mediated by protein-protein interactions or posttranslational modifications [32, 33], such as phosphorylation, which occur on a timescale of milliseconds to minutes. These processes dynamically modulate surface availability and function, allowing fine-tuned regulation of serotonin uptake even under resting-state conditions. Additionally, while slower processes involving endocytosis, recycling, and degradation typically occur over minutes to hours, subtle fluctuations in SERT trafficking and activity can still occur under basal conditions. These minor yet biologically relevant changes likely reflect ongoing homeostatic regulation essential for maintaining serotonergic balance. Therefore, tracer fluctuations observed during resting-state measurements should not be dismissed, as they may represent meaningful variations in SERT regulation that contribute to the fine control of serotonin clearance.

      Comment 9 - In the discussion, the SERT network results should be commented on more extensively, as there is now only a generic reference to MC changes being stronger than FC ones, without spatial reference to the SERT network (while only negative salience network results are referenced explicitly instead, making the paragraph a bit confusing).

      We expanded the discussion to accommodate a more thorough contemplation of this network. This revised paragraph (page 17) directly addresses the spatial aspects of the SERT network, highlighting the specific regions involved in serotonergic connectivity and contrasting molecular and functional connectivity changes induced by MDMA.

      Comment 10 - Figure 3; I'd switch left and right charts in the bottom panel (last row only), to keep the SERT network always on the left of the Figure.

      We agree with the suggestion and changed the figure accordingly.

      Comment 11 - Figure 4: I'd add FC decreases to the figure, to allow the reader to compare BPnd, MC, and FC changes more easily and I'd add a horizontal line at the equivalent of e.g. Z-1.96 (or similar) so that it is clear which measures/regions display significant changes.

      We prefer to keep the figure focusing on the two analyses of PET alterations, since we want to emphasize their complementarity in the context of PET specifically. However, we added lines indicating significances, in line with the reviewer’s suggestion.

      Comment 12 - In Figure 5D, the y-axis mentioned FC but I suppose it should mention MC.

      We amended the figure accordingly, together with the changes to the names of the networks implemented across the manuscript.

      (1) Marciano, S., et al., Combining CRISPR-Cas9 and brain imaging to study the link from genes to molecules to networks. Proc Natl Acad Sci U S A, 2022. 119(40): p. e2122552119.

      (2) Ionescu, T.M., et al., Striatal and prefrontal D2R and SERT distributions contrastingly correlate with default-mode connectivity. Neuroimage, 2021. 243: p. 118501.

      (3) Ionescu, T.M., et al., Neurovascular Uncoupling: Multimodal Imaging Delineates the Acute Effects of 3,4-Methylenedioxymethamphetamine. J Nucl Med, 2023. 64(3): p. 466-471.

      (4) Ionescu, T.M., et al., Elucidating the complementarity of resting-state networks derived from dynamic [(18)F]FDG and hemodynamic fluctuations using simultaneous small-animal PET/MRI. Neuroimage, 2021. 236: p. 118045.

      (5) Walker, M., et al., In Vivo Evaluation of 11C-DASB for Quantitative SERT Imaging in Rats and Mice. J Nucl Med, 2016. 57(1): p. 115-21.

      (6) Walker, M., et al., Imaging SERT Availability in a Rat Model of L-DOPA-Induced Dyskinesia. Mol Imaging Biol, 2020. 22(3): p. 634-642.

      (7) Lau, T. and P. Schloss, Differential regulation of serotonin transporter cell surface expression. Wiley Interdisciplinary Reviews: Membrane Transport and Signaling, 2012. 1(3): p. 259-268.

      (8) Haase, J., et al., Regulation of the serotonin transporter by interacting proteins. Biochem Soc Trans, 2001. 29(Pt 6): p. 722-8.

      (9) Quick, M.W., Regulating the conducting states of a mammalian serotonin transporter. Neuron, 2003. 40(3): p. 537-49.

      (10) Ciccone, M.A., et al., Calcium/calmodulin-dependent kinase II regulates the interaction between the serotonin transporter and syntaxin 1A. Neuropharmacology, 2008. 55(5): p. 763-70.

      (11) Chanrion, B., et al., Physical interaction between the serotonin transporter and neuronal nitric oxide synthase underlies reciprocal modulation of their activity. Proc Natl Acad Sci U S A, 2007. 104(19): p. 8119-24.

      (12) Qian, Y., et al., Protein kinase C activation regulates human serotonin transporters in HEK-293 cells via altered cell surface expression. J Neurosci, 1997. 17(1): p. 45-57.

      (13) Ramamoorthy, S., et al., Phosphorylation and regulation of antidepressant-sensitive serotonin transporters. J Biol Chem, 1998. 273(4): p. 2458-66.

      (14) Jayanthi, L.D., et al., Evidence for biphasic effects of protein kinase C on serotonin transporter function, endocytosis, and phosphorylation. Mol Pharmacol, 2005. 67(6): p. 2077-87.

      (15) Steiner, J.A., A.M. Carneiro, and R.D. Blakely, Going with the flow: trafficking-dependent and -independent regulation of serotonin transport. Traffic, 2008. 9(9): p. 1393-402.

      (16) Lau, T., et al., Monitoring mouse serotonin transporter internalization in stem cell-derived serotonergic neurons by confocal laser scanning microscopy. Neurochem Int, 2009. 54(3-4): p. 271-6.

    1. eLife Assessment

      This important work provides another layer of regulatory mechanism for TGF-beta signaling activity. The evidence convincingly supports the involvement of microtubules as a reservoir of Smad2/3, and association of Rudhira with microtubules is critical for this process. The work will be of board interest to developmental biologists in general and molecular biologists in the field of growth factor signaling.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how that Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths:

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides unappreciated additional layer of TGFβ signaling activity regulation after ligand-receptor interaction.

      Weaknesses:

      (1) It is unclear how current findings provide a better understanding of Rudhira KO mice, which the authors published some years ago.

      (2) Why do they use HEK cells instead of SVEC cells in Fig 2 and 4 experiments?

      (3) A model shown in Fig 5E needs improvement to grasp their findings easily.

    3. Author response:

      The following is the authors’ response to the previous reviews

      According to the reviewers' comments, we appreciate your substantial updates. However, the statistical issue remains unsolved. The following is a general way to get fold changes between controls and experimental samples. Each sample will generate relative differences between target molecules and internal controls. For the case of Fig 1B, the target is pSmad2, and the internal control is the total Smad2. Three control samples will generate three numbers for pSmad2/Smad2 ratios with variations. Similarly, T204D samples will generate three numbers with variations. Then, the average of these three numbers will be set as 1 (with variations) to calculate fold changes between the control and T204D groups. The point is that the statistical significance needs to be evaluated between two groups with variations. This standard method differs from what you described in the manuscript. I hope this explains why the issue needs to be fixed. Please work on the following 11 panels to revise.

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (8) Fig 4C, intensity, with variation, fine.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For western blots:

      Graphs for western blots in the following figures have been modified to show the variance in controls, as suggested:

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For qPCRs:

      The reader’s comment asked to display error bars if the variance in controls was considered. The variance in controls was not considered, which is a standard practice in the qPCR assay. In this regard, an example from an eLife paper is cited below (variation not considered in controls):

      Fig 4C from Conti et al., N6-methyladenosine in DNA promotes genome stability, revised v2 Feb 3, 2025.

      Accordingly, the following graphs remain unchanged:

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      For crystal violet experiments:

      Due to variability in the procedure introduced from CV preparation, uptake, and extraction etc., in the absence of a reference/standard, it is not possible to determine the absolute cell number across experiments. To simplify the calculation, we normalize CV intensity of all the samples to control for an experiment, so the control group doesn’t have error bars. In this regard, an example from an eLife paper is cited below (variation not considered in controls).

      Fig 2H from Brunner et al., PTEN and DNA-PK determine sensitivity and recovery in response to WEE1 inhibition in human breast cancer, version of record July 6, 2020.

      Accordingly, the following graphs remain unchanged:

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      Lastly, #8 remains unchanged.

      (8) Fig 4C, intensity, with variation, fine.

    1. eLife Assessment

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. It will be of interest to researchers trying to dissect structure of complex interaction networks across scales, from cells to regions.

    2. Reviewer #2 (Public review):

      Summary:

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. The methods are clear, and seem broadly applicable, although they require some forethought about data and modeling choices.

      Strengths:

      The study is well-developed, providing overall clear exposition of relevant methods, as well as in-depth validation of the key network structural and dynamical assumptions. The questions and concerns raised in reading the text were always answered in time, with straightforward figures and supplemental materials.

      Weaknesses:

      In earlier drafts of the work, the narrative structure at times conflicts with the interpretability, however, this was greatly improved during revisions. The only remaining limitation for broad applicability lies in the full observability required in the current paradigm, however, the authors point at avenues for relaxing this assumption, which could be fruitful next steps for researchers aiming to deploy this work to EM or two-photon based datasets.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      Summary:

      In this study, Fakhar et al. use a game-theoretical framework to model interregional communication in the brain. They perform virtual lesioning using MSA to obtain a representation of the influence each node exerts on every other node, and then compare the optimal influence profiles of nodes across different communication models. Their results indicate that cortical regions within the brain's "rich club" are most influential.

      Strengths:

      Overall, the manuscript is well-written. Illustrative examples help to give the reader intuition for the approach and its implementation in this context. The analyses appear to be rigorously performed and appropriate null models are included.

      Thank you.

      Weaknesses:

      The use of game theory to model brain dynamics relies on the assumption that brain regions are similar to agents optimizing their influence, and implies competition between regions. The model can be neatly formalized, but is there biological evidence that the brain optimizes signaling in this way? This could be explored further. Specifically, it would be beneficial if the authors could clarify what the agents (brain regions) are optimizing for at the level of neurobiology - is there evidence for a relationship between regional influence and metabolic demands? Identifying a neurobiological correlate at the same scale at which the authors are modeling neural dynamics would be most compelling.

      This is a fundamental point, and we put together a new project to address it. The current work focuses on, firstly, rigorously formalizing a prevailing assumption that brain regions optimize communication, and then uncovering what are the characteristics of communication if this optimization is indeed taking place. Based on our findings, we suspect the mechanism of an optimal communication to be through broadcasting (compared to other modes explored in our work, e.g., the shortest-path signalling or diffusion). However, we recognize that our game-theoretical framework does not directly address “how” this mechanism is implemented. Thus, in our follow-up work, we are analyzing available datasets of signal propagation in the brain to see if communication dynamics there match the predictions of the game-theoretical setup. However, following your question, we extended our discussion to cover this point, cited five other works on this topic, and what, we think, could be the neurobiological mechanism of optimal signalling.  

      It is not entirely clear what Figure 6 is meant to contribute to the paper's main findings on communication. The transition to describing this Figure in line 317 is rather abrupt. The authors could more explicitly link these results to earlier analyses to make the rationale for this figure clearer. What motivated the authors' investigation into the persistence of the signal influence across steps?

      Great question. Figure 6 in part follows Figure 5, which summarizes a key aspect of our work: Signals subside at every step but not exponentially (Figure 5), and they nearly fall apart after around 6 steps (Figure 6 A and B). Subplots A and B together suggest that although measures like communicability account for all possible pathways, the network uses a handful instead, presumably to balance signalling robustness versus the energetic cost of signalling. Subplot C, one of our main findings, then shows how one simple model is all needed to predict a large portion of optimal influence compared to other models and variables. In sum, Figure 5 focused on the decay dynamics while Figure 6 focused on the extent, in terms of steps, given that the decay is monotonic. Together, our motivation for this figure was to show how the right assumption about decay rate and dynamics can outperform other measures in predicting optimal communication. 

      The authors used resting-state fMRI data to generate functional connectivity matrices, which they used to inform their model of neural dynamics. If I understand correctly, their functional connectivity matrices represent correlations in neural activity across an entire fMRI scan computed for each individual and then averaged across individuals. This approach seems limited in its ability to capture neural dynamics across time. Modeling time series data or using a sliding window FC approach to capture changes across time might make more sense as a means of informing neural dynamics.

      We agree with you on the fact that static fMRI is limited in capturing neural dynamics. However, we opted not to perform dynamic functional connectivity fitting just yet for a practical reason: Other communication models used here do not fit to any empirical data and provide a static view of the dynamics, comparable to the static functional connectivity. Since one of our goals was to compare different communication regimes, and the fact that fitting dynamics does not seem to substantially change the outcome if the end result is static (Figure 7), we decided to go with the poorer representation of neural data for this work. However, part of our follow-up project involves looking into the dynamics of influence over time and for that, we will fit our models to represent more realistic dynamics.

      The authors evaluated their model using three different structural connectomes: one inferred from diffusion spectrum imaging in humans, one inferred from anterograde tract tracing in mice, and one inferred from retrograde tract-tracing in macaque. While the human connectome is presumably an undirected network, the mouse and macaque connectomes are directed. What bearing does experimentally inferred knowledge of directionality have on the derivation of optimal influence and its interpretation?

      In terms of if directionality changes the interpretation of optimal influence, we think it sets limits for how much we can compare communication dynamics of these two types of networks. We think interpreting optimal communication in directed graphs needs to disentangle incoming influence from outgoing influence, e.g., analyzing “projector hubs/coordinators” and “receiver hubs/integrators” instead of putting both into a common class of hubs. Also, here we showed the extent of which a signal travels before it significantly degrades, having done so in an undirected graph. One of its implications for a directed graph is the possibility that some nodes can be unreachable from others, given the more restricted navigation. A possibility that we did not observe in the human connectome as all nodes could reach others, although with limited influence (see Figure 2. C). We did not explore these differences, as we used mice and macaque connectomes primarily to control for modality-specific confounds of DSI. However, our relatively poorer fit for directed networks (Supplementary Figure 2) motivated us to analyze how reciprocal connections shape dynamics and what impact do they have on networks’ function. Using the same connectomes as the current work, we addressed this question in a separate publication (Hadaeghi et al., 2024) and plan to extend both works by analyzing the signalling properties of directed networks.

      It would be useful if the authors could assess the performance of the model for other datasets. Does the model reflect changes during task engagement or in disease states in which relative nodal influence would be expected to change? The model assumes optimality, but this assumption might be violated in disease states.

      This is a wonderful idea that we initially had in mind for this work as well, but decided to dedicate a separate work on deviations in different tasks states, as well as disease states (mainly neurodegenerative disorders). We noticed the practical challenges of fitting large-scale models to task dynamics and harmonizing neuroimaging datasets of neurodegenerative disorders is beyond the scope of the current work. Unfortunately, this effort, although exciting and promising, is still pending as the corresponding author does not yet have the required expertise of neuroimaging processing pipelines.

      The MSA approach is highly computationally intensive, which the authors touch on in the Discussion section. Would it be feasible to extend this approach to task or disease conditions, which might necessitate modeling multiple states or time points, or could adaptations be made that would make this possible?

      Continuing our response from the previous point, yes, we think, in theory, the framework is applicable to both settings. Currently, our main point of concern is not the computational cost of the framework but the harmonization of the data, to ensure differences in results are not due to differences in preprocessing steps. However, assuming that all is taken care of, we believe a reasonable compute cluster should suffice by parallelizing the analytical pipeline over subjects. We acknowledge that the process would still be time-consuming, but besides the fitting process, we expect a modern high-performance CPU with about 32–64 threads to take up to 3 days analyzing one subject, given 100 brain regions or fewer. This performance then scales with the number of cluster nodes that can each work on one subject. We note that the analytical estimators such as SAR could be used instead, as it largely predicts the results from MSA. The limitations are then the lack of dynamics over time and potential estimation errors.

      Reviewer #2 (Public review):

      Summary:

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. The methods are clear and seem broadly applicable, however further clarity on this front is required.

      Strengths:

      The study is well-developed, providing an overall clear exposition of relevant methods, as well as in-depth validation of the key network structural and dynamical assumptions. The questions and concerns raised in reading the text were always answered in time, with straightforward figures and supplemental materials.

      Thank you.

      Weaknesses:

      The narrative structure of the work at times conflicts with the interpretability. Specifically, in the current draft, the model details are discussed and validated in succession, leading to confusion. Introducing a "base model" and "core datasets" needed for this type of analysis would greatly benefit the interpretability of the manuscript, as well as its impact.

      Following your suggestion, we modified the introduction to emphasize on the human connectome and the linear model as the main toolkit. We also added a paragraph explaining the datasets that can be used instead.

      Recommendations for the authors:

      Essential Revisions (for the authors):

      (1) The method presents an important and well-validated method for linking structural and functional networks, but it was not clear precisely what the necessary data inputs were and what assumptions about the data mattered. To improve the clarity of the presentation for the reader, it would be beneficial to have an early and explicit description of the flow of the method - what exact kinds of datasets are needed and what decisions need to be made to perform the analysis. In addition, there were questions about how the use or interpretation of the method might change with different methods of measuring structure or function, which could be answered via an explicit discussion of the issue. For example, how do undirected fMRI correlation networks compare to directed tracer injection projection networks? Similarly, could this approach apply in cases like EM connectomics with linked functional imaging that do not have full observability in both modalities?

      This is an important point that we missed addressing in detail in the original manuscript. Now we did so, by first adding a paragraph (lines 292-305, page 10) explaining the pipeline and how our framework handles different modeling choices, and then further discussing it in the Discussion (lines 733-748, page 28). Moreover, we adjusted Figure 1, by delineating two main steps of the pipeline. Briefly, we clarified that MSA is model-agnostic, meaning that, in principle, any model of neural dynamics can be used with it, from the most abstract to the most biologically detailed. Moreover, the approach extends to networks built on EM connectomics, tract-tracing, DTI, and other measures of anatomical connectivity. However, we realized that a key detail was not explicitly discussed (pointed to by Reviewer #2), that is, the fact that these models naturally need to be fitted to the empirical dataset, even though this fitting step appears not to be critical, as shown in Figure 7.

      Lines 292-305:

      “The MSA begins by defining a ‘game.’ To derive OSP, this game is formulated as a model of dynamics, such as a network of interacting nodes. These can range from abstract epidemic and excitable models (Garcia et al., 2012; Messé et al., 2015a) to detailed spiking neural networks (Pronold et al., 2023) and to mean-field models of the whole brain dynamics, as chosen here (see below). The model should ideally be fitted to reflect real data dynamics, after which MSA systematically lesions all nodes to derive the OSP. Put together, the framework is general and model-agnostic in the sense that it accommodates a wide range of network models built on different empirical datasets, from human neuroimaging and electrophysiology to invertebrate calcium imaging, and anything in between. In essence, the framework is not bound to specific modelling paradigms, allowing direct comparison among different models (e.g., see section Global Network Topology is More Influential Than Local Node Dynamics).”

      Lines 733-740:

      “As noted in the introduction, OI is model-agnostic, here, we leveraged this liberty to compare signaling under different models of local dynamics, primarily built upon undirected human connectome data. We also considered different modalities, e.g., tract tracing in Macaque (see Structural and Functional Connectomes under Materials and Methods) to confirm that the influence of weak connections is not inflated due to imaging limitations (Supplementary Figure 5. A). The game theoretical formulation of signaling allows for systematic comparison among many combinations of modeling choices and data sources.”

      We then continued with addressing the issue of full observability. We clarified that in this work, full observability was assumed. However, the mathematical foundations of our method capture unobserved contributors/influencers as an extra term, similar to the additive error term of a linear regression model. To keep the paper as non-technical as possible, we omitted expanding the axioms and the proof of how this is achieved, and instead referred to previous papers introducing the framework. 

      Lines 740-748:

      “Nonetheless, in this work, we assumed full observability, i.e., complete empirical knowledge of brain structure and function that is not necessarily practically given. Although a detailed investigation of this issue is needed, mathematical principles behind the method suggest that the framework can isolate the unobserved influences. In these cases, activity of the target node is decomposed such that the influence from the observed sources is precisely mapped, while the unobserved influences form an extra term, capturing anything that is left unaccounted for, see (Algaba et al., 2019b; Fakhar et al., 2024) for more technical details.”

      (2) The value of the normative game theoretic approach was clear, but the neurobiological interpretation was less so. To better interpret the model and understand its range of applicability, it would be useful to have a discussion of the potential neurobiological correlates that were at the same level of resolution as the modeling itself. Would such an optimization still make sense in disease states that might also be of interest?

      This is a brilliant question, which we decided to explore further in separate studies. Specifically, the link between optimal communication and brain disorders is a natural next step that we are pursuing. Here, we expanded our discussion with a few lines first explaining the roots of our main assumption, which is that neurons optimize information flow, among other goals. We then hypothesized that the biological mechanisms by which this goal is achieved include (based on our findings) adopting a broadcasting regime of signaling. We suspect that this mode of communication, operationalized on complex network topologies, is a trade-off between robust signaling and energy efficiency. Currently, we are planning practical steps to test this hypothesis.

      Lines 943-962:

      “Nonetheless, our framework is grounded in game theory where its fundamental assumption is that nodes aim at maximizing their influence over each other, given the existing constraints. This assumption is well explored using various theoretical frameworks (Buehlmann and Deco, 2010; Bullmore and Sporns, 2012; Chklovskii et al., 2002; Laughlin and Sejnowski, 2003; O’Byrne and Jerbi, 2022) and remains open to further empirical investigation. Here, we used game theory to mathematically formalize a theoretical optimum for communication in brain networks. Our findings then provide a possible mechanism for achieving this optimality through broadcasting. Based on our results, we speculate that, there exists an optimal broadcasting strength that balances robustness of the signal with its metabolic cost. This hypothesis is reminiscent of the concept of brain criticality, which suggests the brain to be positioned in a state in which the information propagates maximally and efficiently (O’Byrne and Jerbi, 2022; Safavi et al., 2024). Together, we suggest broadcasting to be the possible mechanism with which communication is optimized in brain networks, however, further research directions include investigating whether signaling within brain networks indeed aligns with a game-theoretic definition of optimality. Additionally, if it does, subsequent studies could then examine how deviations from optimal communication contribute to or result from various brain states or neurological and psychiatric disorders.”

      Reviewer #1 (Recommendations for the authors):

      I would recommend that the authors consider the following point in a revision, as well as the major weaknesses of the public review. Some aspects of Figure 1 could be clearer. What is being illustrated by the looping arrow to MSA? What is being represented in the matrices (labeling "source" and "target" on the matrix might enhance clarity)? Is R2 the metric used to assess the degree of similarity between communication models? These could be addressed by making small additions to the figure legend or to the figure itself.

      Thank you for your constructive comment on Figure 1, which is arguably the most important figure in the manuscript. We adjusted the figure and its caption (see above) based on your suggestions. After doing so, we think the figure is now clearer regarding the pipeline used in this work.

      Reviewer #2 (Recommendations for the authors):

      Overall, as stated in the public review and the short assessment, the manuscript is in a clearly mature state and brings an important method to link the fields of structural and functional brain networks.

      Nevertheless, the paper would benefit from an early, and clear, discussion of the:

      (1) components of the model, and assumptions of each, should be stated at the end of the introduction, or early in results. (2) datasets necessary to run the analysis.

      The confusion arises from lines 130-131, stating "In the present work (summarized in Figure 1), we used the human connectome, large-131 scale models of dynamics, and a game-theoretical perspective of signaling." This, to me, indicated that a structural connectivity map may be the only dataset required, as the dynamics model and game theory component are solely simulated. However, later, lines 214-216 state that the empirical functional connectivity is estimated from the structural connectivity, indicating that the method is only applied to cases where we have both.

      Finally, Supplemental Figure 5 validates a number of metrics on different solely structural networks (which is a very necessary and well-done control). Similarly, while the dynamical model is discussed in depth, and beautifully shown that the specific choice of dynamical model does not directly impact the results, it would be helpful to clarify the dynamical model utilized in the early figures.

      Thank you for pointing out a critical detail that we missed elaborating sufficiently early in the paper: the modelling step. Following your suggestions, we added a paragraph from line 292 to 305 (page 10) expanding on the modelling framework. We also explicitly divided the modelling step in Figure 1 and briefly clarified our modelling choices in the caption. Together, we emphasized the fact that our framework is generally model agnostic, which allows different models of dynamics to be plugged into various anatomical networks. We then clarified that, like in any modelling effort, one needs to first fit/optimize the model parameters to reproduce empirical data. In other words, we emphasized the fact that our framework relies on a computational model as its ‘game’ to infer how regions interact, and we fine-tuned our models to reproduce the empirical FC.

      Again, this is not a critique of the methods, which are excellent, but the presentation. It would help readers, and even me, to have a clear indication of the model earlier. Further, it would help to discuss, both in the introduction and discussion, the datasets required for applying these methods more broadly. For instance, 2-photon recordings are discussed - would it be possible to apply this method then to EM connectomes with functional data recorded for them? In theory, it seems like yes, although the current datasets have 100% observability, whereas 2-photon imaging, or other local methods, will not have perfect overlap between structural and functional connectomes. Discussions like this, related to the assumptions of the model, the necessary datasets, and broader application directions beyond DSI, fMRI, and BOLD cases where the method was validated, would increase the impact and interpretability for a broad readership.

      This is a valid point that we should have been more explicit about. The revised manuscript now contains a paragraph (lines 740-748) clarifying the fact that, throughout this work, we assumed full observability. We then briefly discuss, based on the mathematical principles of the framework, what we expect to happen in cases with partial observability. We then point at two references in which the details of a framework with partial observability are laid out, one containing mathematical proofs and the other using numerical simulations.

      References:

      Hadaeghi, F., Fakhar, K., & Hilgetag, C. C. (2024). Controlling Reciprocity in Binary and Weighted Networks: A Novel Density-Conserving Approach (p. 2024.11.24.625064). bioRxiv. https://doi.org/10.1101/2024.11.24.625064

    1. eLife Assessment

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. The manuscript describes a method using EM polyclonal epitope mapping to help elucidate endogenous antibodies. The work is interesting and valuable to the fields of immunology and serology, and the strength of evidence to support its findings is considered solid.

    2. Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and provision of openly available tools is of genuine value to a growing number of people.

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      We would like to thank the reviewer for this assessment. While indeed we implement ModelAngelo as published without changes to its algorithms or code, we did add new functionality to Stitch to read the generated output from ModelAngelo and assemble it against known databases of germline-encoded antibody sequences. Of note, ModelAngelo was not primarily developed to determine exact sequence from CryoEM images, but instead to provide input for sequence determination from sequence searches with profile HMMs. Such models are designed to handle ambiguous calls of residues at different positions of a protein sequence. We are of the opinion that one of the main contributions of our study is to finally benchmark the EMPEM approach against known sequences to build a framework for data quality requirements in the future. From our study in best-case scenario’s EM data alone will provide sequences at 80-90% accuracy. In other words, the sequences are riddled with errors and cannot be taken at face value without orthogonal sequencing data. We demonstrate that mass spectrometry data can fill this requirement and yield much improved accuracy of the sequences even against high backgrounds of unrelated antibody sequences. We are incredibly excited about the prospects and future developments for EMPEM and believe that its integration with orthogonal sequencing approaches like MS are critical moving forward. By developing this pipeline we hope to have taken steps in the right direction.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

      Thanks for the kind assessment.

      Weaknesses:

      The AI methods used are not a substitute for high quality data and at present very few of the results obtained from EMPEM will be of sufficient quality to robustly assign the sequence of the antibody. However, rather more are likely to be good enough, especially in combination with MS data, to provide a pretty good indication of the V-gene family.

      We fully agree with the assessment of the reviewer, as this being a general limitation of the EMPEM field. If anything, we hope our benchmark study and developed pipeline to integrate with MS-based sequencing data have more clearly established the current limitations of the technique and the requirements/prospects for orthogonal sequencing data to fill the missing gaps.

      Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

      WhiIe the approach is clearly a work in progress, the manuscript should made easier to understand for the general reader. Indeed, I had a hard time understanding the workflow until I got to Fig. 3. So re-ordering the figures, for example, may be helpful in this regard.

      It would be useful to provide additional concrete examples where the described workflow would assist in the elucidation of CDR3's, in cases where this isn't already known. (In the benchmark dataset from the Electron Microscopy Data Bank, all the antibodies and Fabs are presumably known, as is the case for the monoclonal antibody CR3022). I am having difficulty envisioning how one would prepare samples from actual plasma samples that would be appropriate for single particle cryo-EM and MS data on dominant antibodies of interest. In my experience, most of these samples tend to be quite complex mixtures. So additional discussion of this point would be helpful.

      We would like to thank the reviewer for their kind and critical assessment of our work. We have adopted the suggestion to reorder the graphical material, such that the workflow schematic is now Figure 1 in the main text. We hope this will improve the readability.

      Regarding the concrete examples where the workflow could aid in elucidating CDR3 sequences, we would like to refer to all published EMPEM studies and in particular those highlighted in Figure 6. We are also actively working to integrate EMPEM data with MS-based sequencing on novel samples, but those will be subject of later studies. We have added additional discussion regarding the experimental feasibility of the approach. We have highlighted several milestone results where functional antibodies were reconstructed from EMPEM and/or MS data. In the discussion we write:

      “While sample complexity remains an important bottleneck, and questions remain about the dynamic range of the true serum antibody repertoire and the depth of coverage from these novel experimental approaches, several studies have recently reached the important milestone of reconstructing functional antibodies from direct measurements of the secreted serum components.” (see references in manuscript)

      “We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1-10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 172: I am surprised the heavy chain is not worse than the light chain

      We have added the following sentence:

      “The length of the complete antigen binding loops was estimated with an average error of 0.5 ± 3.3 or 1.7 ± 6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. While CDRH3 is the more challenging region in MS-based approaches to antibody sequencing, we believe that the moderately better length and sequence accuracy of CDRH3 compared to CDRL3 in ModelAngelo output reflects the CDRH3’s notoriously tight involvement in antigen binding, hence a greater relative stability in the antibody-antigen complex, resulting in better order in the reconstructed EM density maps.”

      Line 175: Global FSC is not going to be useful. Why not use a local value?

      We agree that local resolution estimates would be more appropriate, that is exactly why we added this remark to our initial analysis. However, local resolution estimates are non-trivial and raise the question about ‘how local’ we need to estimate the quality of the map (see for instance https://doi.org/10.1016/j.sbi.2020.06.005). At present, we believe that the required work for this local resolution analysis is not warranted, only to arrive at the rather intuitive if not tautological conclusion that a better map quality translates into more accurate sequences. While we agree that a better quantitative understanding of the data requirements for EMPEM could benefit the field, we opted to leave this, especially considering that the Stitch alignment score is already a good alternative predictor of sequence accuracy compared to map resolution as demonstrated in Figure 3,

      Line 259: 'of the 23 maps' .... Actually there were 46 maps originally, so I feel this is a tad misleading.

      The statistic of ‘46 total’ was added to the text.

    1. eLife Assessment

      This valuable study presents an interesting analysis of the role of the polyamine precursor putrescine in the pili-dependent surface motility of a laboratory strain of Escherichia coli. The overall data convincingly demonstrate a role in this case. This study presents interesting findings for those studying uropathogenic bacteria, and those studying bacterial polyamine function.

    2. Reviewer #2 (Public review):

      Summary:

      Mehta et al., in constructing E. coli strains unable to synthesize polyamines, noted that strains deficient in putrescine synthesis showed decreased movement on semisolid agar. They show that strains incapable of synthesizing putrescine have decreased expression of Type I pilin and, hence, decreased ability to perform pilin-dependent surface motility.

      Strengths:

      The authors characterize the specific polyamine pathways that are important for this phenomenon. RNAseq provides a detailed overview of gene expression in the strain lacking putrescine. They rule out potential effects of pilin phase variation on the phenotype. The data suggest homeostatic control of polyamine synthesis and metabolic changes in response to putrescine.

      Weaknesses:

      The authors do not, in the end, uncover the molecular details of pilin expression per se, but that would require significantly more analyses and data; the mechanisms of pilin regulation are complicated and still not completely understood.

    3. Reviewer #3 (Public review):

      Summary:

      This study by Mehta et al. describes the mechanisms behind the observation that putrescine biosynthesis mutants in Escherichia coli strain W3110 are affected in surface motility. The manuscript shows that the surface motility phenotype is dependent on Type I fimbriae and that putrescine levels affect the expression level of fimbriae. The results further suggest that without putrescine, the metabolism of the cell is shifted towards production of putrescine and away from energy metabolism.

      Strengths:

      The authors show the effect of putrescine on the regulation of type I fimbriae using various strategies (mutants, addition of exogenous, RNA seq, etc.). All experiments converge to the same conclusion that an optimal level of putrescine is needed.

      Weakness:

      The authors use one isolate of E. coli strain W3110, that contains an insertion in fimE which controls the expression of type I fimbriae. The insertion in fimE likely modifies the ratio of cells expressing fimbriae in the population, and it would be important to confirm the results in other isolates or other strains.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Alternate explanations for major conclusions.

      The major conclusions are (a) surface motility of W3110 requires pili which is not novel, (b) pili synthesis and pili-dependent surface motility require putrescine — 1 mM is optimal, and 4 mM is inhibitory, and (c) the existence of a putrescine homeostatic network that maintains intracellular putrescine that involves compensatory mechanisms for low putrescine, including diversion of energy generation toward putrescine synthesis.

      Conclusion a: Reviewer 3 suggests that the mutant may have lost surface motility because of outer surface structures that actually mediate motility but are co-regulated with or depend on pili synthesis. The reviewer explicitly suggests flagella as the alternate appendage, although flagella and pili are reciprocally regulated. Most experiments were performed in a Δ_fliC_ background, which lacks the major flagella subunit, in order to prevent the generation of fast-moving flagella-dependent variants. Furthermore, no other surface structure that could mediate surface motility is apparent in the electron microscope images. This observation does not definitively rule out this possibility, especially because of the large transcriptomic changes with low putrescine. Our explanation is the simplest.

      Conclusion b, first comment: Reviewer 1 states that “it is not possible to conclude that the effects of gene deletions to biosynthetic, transport or catabolic genes on pili-dependent surface motility are due to changes in putrescine levels unless one takes it on faith that there must be changes to putrescine levels.” The comment ignores both the nutritional supplementation and the transcript changes that strongly suggest compensatory mechanisms for low putrescine. Why compensate if the putrescine concentration does not change? The reviewer then implicitly acknowledges changes in putrescine content: “it is important to know how much putrescine must be depleted in order to exert a physiological effect”.

      Conclusion b, second comment: Reviewer 1 proposes that agmatine accumulation can account for some of the observed properties, but which property is not specified. With respect to motility, agmatine accumulation cannot account for motility defects because motility is impaired in (a) a speA mutant which cannot make agmatine and (b) a speC speF double mutant which should not accumulate agmatine. With respect to the transcriptomic results, even if high agmatine is the reason for some transcript changes, the results still suggest a putrescine homeostasis network.

      Conclusion c: the reviewers made no comments on the RNAseq analysis or the interpretation of the existence of a homeostatic network.

      Additional experiments proposed.

      Complementation. Reviewers 1 and 3 suggested complementation experiments, but the latter states that nutritional supplementation strengthens our arguments. The most relevant complementation is with speB.  We tried complementation and found that our control plasmid inhibited motility by increasing the lag time before movement commenced. A plasmid with speB did stimulate motility relative to the control plasmid, but movement with the speB plasmid took 4 days, while wild-type movement took 1.5 days. We think that interpretation of this result is ambiguous. We did not systematically search for plasmids that had no effect on motility.

      The purpose of complementation is to determine whether a second-site mutation is the actual cause of the motility defect. In this case, the artifact is that an alteration in polyamine metabolism is not the cause of the defect. However, external putrescine reverses the effects on motility and pili synthesis in the speB mutant. This result is inconsistent with a second-site mutation. Still, we agree that complementation is important, and because of our difficulties, we tested numerous mutants with defects in polyamine metabolism. The results present an interpretable and coherent pattern. For example, if putrescine is not the regulator, then mutants in putrescine transport and catabolism should have had no effect. Every single mutant is consistent with a role in movement and pili synthesis. The simplest explanation is that putrescine affects movement and pili synthesis.

      Phase variation. Reviewer 2 noted that we did not discuss phase variation. The comment came from the observation that the speB mutant had fewer fimB transcripts which could explain the loss of motility. The reviewer also suggested a simple experiment, which we performed and found that putrescine does not control phase variation. We present those results in the supplemental material. Our discussion of this topic includes a major qualification.

      Testing of additional strains. Published results from another lab showed that surface motility of MG1655 requires spermidine instead of putrescine (PMID 19493013 and 21266585). MG1655 and the W3110 that we used in our study are E. coli K-12 derivatives and phylogenetic group A. Any number of changes in enzymes that affect intracellular putrescine concentration could result in different responses to putrescine. We are currently studying pili synthesis and motility in other strains. While that study is incomplete, loss of speB in a strain of phylogenetic group D eliminates no surface motility. This work was intended as our initial analysis and the focus was on a single strain.

      Measuring intracellular polyamines. We felt that we had provided sufficient evidence to conclude that putrescine controls pili synthesis and putrescine concentrations are lower in the speB mutant: the nutritional supplementation, the lower levels of transcripts for putrescine catabolic enzymes which require putrescine for their expression strongly suggest lower putrescine in a mutant lacking a putrescine biosynthesis gene, and a transcriptomic analysis that found the speB mutant had transcript changes to compensate for low putrescine. We understand the importance of measuring intracellular polyamines. We are currently examining the quantitative relationship between intracellular polyamines and pili synthesis in multiple strains which respond differently to loss of speB.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should measure putrescine, agmatine, cadaverine, and spermidine levels in their gene deletion strains.

      Polyamine concentration measurements will be part of a separate study on polyamine control of pili synthesis of a uropathogenic strain. A comparison is essential, and the results from W3110 will be part of that study.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 28. Your statements about urinary tract infections are pure speculation. They are fine for the discussion, but should not be in the abstract.

      The abstract from line 27 on has been reworked. The comment of the reviewer is fair.

      (2) Line 65. Do we need this discussion about the various strains? If you keep it, you should point out that they were all W3110 strains. But you could just say that you confirmed that your background strain can do PDSM (since you are also not showing any data for the other isolates). Discussing the various strains implies that you are not confident in your strain and raises the question of why you didn't use a sequenced wt MG1655, or something like that.

      This section has been reworked. Our strain of W3110 has an insertion in fimB which is relevant for movement but does not affect our results. The insertion limits our conclusions about phase variation. We want to point out that strains variations are large. We also sequenced our strain of W3110.

      (3) Related. You occasionally use "W3110-LR" to designate the wild type. You use this or not, but be consistent throughout the text.

      Fixed

      (4) Line 99. Does eLife allow "data not shown"?  

      (5) Line 119. As you note, the phenotype of the puuA patA double mutant is exactly the opposite of what one would expect. Although you provide additional evidence that high levels also inhibit motility, complementing the double mutant would provide confidence that the strain is correct.

      We rapidly ran into issues with complementation which are discussed in public responses to reviewer comments.

      (6) Figure 6C. Either you need to quantify these data or you need a better picture.

      The files were corrupted. It was repeated several time, but we lost the other data.

      (7) Figure 7. Label panels A and B to indicate that these strains are speB. Also, you need to switch panels C and D to match the order of discussion in the manuscript.

      Done

      (8) Line 134. Is there a statistically significant difference in the ELISA between 1 and 4 mM? You need to say one way or the other.

      No statistical significance and this has been added to the paper

      (9) Figure 10C. You need to quantify these data.

      Quantification added as an extra panel.

      (10) Line 164. You include H-NS in the group of "positive effectors that control fim operon expression" and you reference Ecocyc, rather than any primary reference. Nowhere in the manuscript do you mention phase variation. In the speB mutant, you see decreased fimB, increased fimE, and decreased hns expression. My interpretation of the literature suggests that this would drive the fim switch to the off-state. This could certainly explain some of the results. It is also easily measurable with PCR. This might require testing cells scraped directly from the plates.

      The experiments were performed. There is no need to scrap cells from plates because the fimB result from RNAseq was from a liquid culture, and the prediction would be that the phase-locking should be evident in these cells.

      (11) Figure 10. Likewise, do you know that your hns mutant is not locked in the off-state? Granted, the original hns mutants (pilG) showed increased rates of switching, but growth conditions might matter.

      We also did phase variation for the hns mutant and the hns mutant was not phase locked. This result is shown. In addition to growth conditions, the strain probably matters.

      (12) Line 342. You describe the total genome sequencing of W3110, yet this is not mentioned anywhere else in the manuscript.

      It is now

      Minor points:

      (13) Line 192. "One of the most differentially expressed genes...".

      (14) Line 202. "...implicates extracellular putrescine in putrescine homeostasis."

      (15) Line 209. "...potential pili regulators...".

      (16) You are using a variety of fonts on the figures. Pick one.

      (17) Figure 9A. It took me a few minutes to figure out the labeling for this figure and I was more confused after reading the legend. It would be simpler to independently label red triangles, blue triangles, red circles, and blue circles.

      (18) Figure 9B and 10. The reader can likely figure out what W3110_1.0_3 means, but more straightforward labeling would be better, or you need to define these labels.

      All points were addressed and fixed.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Please go through the figures and the reference to figures in the text, as they often do not refer to the right panel (ex: figures 2 and 7 for instance). In the text, please homogenize the reference to figures (Figure 2C vs Figure 3). To help compare motility experiments between figures, please use the same scale in all figures.

      This has been fixed.

      (2) Lines 65-70: I am not sure I get the reason behind choosing the W3110 strain from your lab stock. In what background were the initial mutants constructed (from l.64-65)? Were the nine strains tested, all variations of W3110? If so, is the phenotype described in the manuscript robust in all strains?

      We have provided more explanation. W3110 was the most stable: insertions that allowed flagella synthesis in the presence of glucose were frequent. We deleted the major flagella subunit for most experiments. Before introduction of the fliC deletion, we needed to perform experiments 10 times so that fast-moving variants, which had mutationally altered flagella synthesis, did not complicate results.

      (3) Line 82-84: As stated in the public review, I think more controls are needed before making this conclusion, especially as type I fimbriae are usually involved in sessile phenotypes.

      Response provided in the public response.

      (4) In Figure 3: Changing the order of the image to follow the text would make the figure easier to follow.

      Fixed as requested

      (5) Lines 100-101: simultaneous - the results presented here do not support this conclusion. In Figure 4b, the addition of putrescine to speB mutants is actually not different from WT. From the results, it seems like one of biosynthesis or transport is needed, but it's not clear if both are needed simultaneously. For this, a mutant with no biosynthesis and no transport is needed and/or completely non-motile mutants would be needed to compare.

      We disagree. If there are two pathways of putrescine synthesis and both are needed, then our conclusion follows.

      (6) Lines 104-105: '... because E. coli secretes putrescine.' - not sure why this statement is there, as most transporters tested after are importers of putrescine? It is also not clear to me if putrescine is supplemented in the media in these experiments. If not, is there putrescine in the GT media?

      Good points, and this section has been reworded to clarify these issues. Some of the material was moved to the discussion.

      (7) Line 109: 'We note that potE and plaP are more highly expressed than potE and puuP...' - first potE should be potF?

      This has been corrected.

      (8) Figure 8: What is the difference between the TEM images in Figure 1 and here? The WT in Figure 1 does show pili without the supplementation unless I'm missing something here. Please specify.

      The reviewer means Figure 2 and not Figure 1. Figure 2 shows a wild-type strain which has both putrescine anabolic pathways while Figure 8 is the ΔspeB strain which lacks one pathway.

      (9) Line160-162: Transcripts for the putrescine-responsive puuAP and puuDRCBE operons, which specify genes of the major putrescine catabolic pathway, were reduced from 1.6- to 14- fold (FDR {less than or equal to} 0.02) in the speB mutant (Supplemental Table 1), which implies lower intracellular putrescine. I might not get exactly the point here. If the catabolic pathways are repressed in the speB mutant, then there will be less degradation which means more putrescine!?

      Expression of these genes is a function of intracellular putrescine: higher expression means more putrescine. Any discussion of steady putrescine must include the anabolic pathways: the catabolic pathways do not determine the intracellular putrescine, they are a reflection of intracellular putrescine.

      (10) Lines 162-163: Deletion of speB reduced transcripts for genes of the fimA operon and fimE, but not of fimB. It seems that the results suggest the opposite a reduction of fimB but not fimE!?

      The reviewer is correct, and it is our mistake, and the text now states what is in the figure..

    1. eLife Assessment

      This important study analyzes the effect of heat treatment on phage-bacterial interactions and convincingly shows that prior heat exposure alters the bacterial cell envelope, enhancing persistence and bacterial survival when exposed to lytic phages. The study will interest researchers working on antibiotic resistance, tolerance, and phage therapy.

    2. Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.) Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper, and make it about heat tolerance rather than the evolution of heat resistance.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

    3. Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy, and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

    4. Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

    1. eLife Assessment

      This important study leverages the power of Drosophila genetics and sparsely-labeled neurons to propose an intriguing new model for neuronal injury signaling. The authors present convincing evidence to show that the somatic response to axonal injury can be suppressed if the injury is not complete, suggesting the presence of a new mode of injury 'integration.' While the underlying mechanism of this fascinating observation has yet to be determined, the phenomenon itself will be of broad significance in the field.

    2. Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. This is a highly powerful model. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      - A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.<br /> - Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      -The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail (and likely an extreme technical challenge to assess) and should not detract from the overall importance of the study

      -That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      Comments on revisions:

      I appreciate your discussion about the potential bi-modal regulation of the puckered transcriptional reporter and think that readers would benefit from a short discussion of this.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably is originating from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      While there are many questions raised by these results that are not answered here, including the pathways upstream and downstream of DLK and how the binary switch control of DLK/puc signaling is executed, the model built in this manuscript is valuable to future work going after these important questions.

      Because the conclusions of the paper are focused on a single (albeit well validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling, or whether this is a binary/threshold response in all contexts (for example, sensory axons or interneurons). However, the author points out in the response that there are sensory neuron examples where a spared connection does not block DLK activation. As such, it may not be a universal mechanism but could provide a model for better understanding of DLK control across different contexts.

      Comments on revisions:

      The new panels in Figure 1E do not have Y-axis labels. (mean puc-lacZ intensity?)

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated.

      However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      In Figure 1E we have replotted the puc-lacZ data to show comparisons between different injuries that leave different numbers of spared (or lost) boutons and branches.  We observed no differences between injuries that remove only a small fraction of boutons (injury location (a)) and injuries that remove nearly all of them (injury locations (b) and (c)) and uninjured neurons (Figure 1E). These observations argue against the interpretation that the strength of DLK activation (at least within the cell body) depends on the severity of injury. Rather, puc-lacZ induction appears to be bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only a small fraction of the total boutons. We therefore think that the presence of a remaining synaptic connection rather than the extent of the injury per se is a major determinant of whether the cell body component of Wnd signaling can be activated. 

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNKcJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of sparedbranch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question. 

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is a beautiful study. Naturally, you're searching now for the underlying mechanism.

      A few questions:

      (1) At present you can not determine if the Wnd signal is never initiated (when a spared branch is present) or if it gets to the cell body but is incapable of activating the puckered reporter. Is there any optical reporter (JNK activation?) that could differentiate this?

      The reviewer is correct that a tool to detect local activity of JNK kinase in axons would be ideal for probing the mechanisms that underlie our observations. A FRET reporter for JNK kinase activity has been developed and utilized in cultured cells (Fosbrink et al. 2010). It would be interesting to implement this reporter in Drosophila; it would need to be sensitive enough to visualize  in single Drosophila axons. We have previously noted Wnd-dependent phosphorylated JNK in the cell body of injured motoneurons following nerve crush (Xiong et al., 2010). However anti-pJNK antibodies detect what appears to be a constitutive signal in uninjured axons that does not appear to be influenced by activation or inhibition of Wnd (Xiong et al., 2010).

      (2) What happens when you injure the axon in a dSarm KO? This is more of a curiosity, not a necessity, but is it the axon dying or the detection of the injury itself?

      We have tested whether overexpression of Nmnat or the WldS transgene, which inhibit Wallerian degeneration of injured axons, affect the induction of puc-lacZ following nerve injury. This manipulation has no effect on puc-lacZ expression in uninjured animals, and also has no effect on the induction of puc-lacZ following peripheral nerve crush (TJ Waller, personal communication).

      (3) Are Wnd rescue experiments possible in this context? Would be an interesting place to do Wnd structure-function and compare it to the synaptic work.

      This is not possible with current reagents. Expression of wild type wnd cDNA under the Gal4/UAS promoter leads to strong induction of puc-lacZ in uninjured animals, even when weak Gal4 driver lines are used (Xiong et al., 2012, 2010). Similar observations of constitutively active signaling have been observed for expression studies of DLK in mammalian cells ((Hao et al., 2016; Huntwork-Rodriguez et al., 2013; Nihalani et al., 2000), and data not shown). These and other observations suggest that the levels of Wnd/DLK protein are tightly controlled by posttranscriptional mechanisms. Delineation of sequences within Wnd/DLK that are required for its regulation would be helpful for addressing this question.

      This will be required reading in my lab.

      That is an honor. We look forward to help from the field to understand how and why this pathway is restrained at synapses. Your students may bring new ideas to the table.

      Reviewer #3 (Recommendations for the authors):

      Piezo is spelled incorrectly in the supplemental table in multiple places.

      Thank you for pointing this out! We have made the correction.

      References cited (in rebuttal)

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015– 1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211– 223.

    1. eLife Assessment

      This study reports that the RNA binding and cardiomyopathy-associated protein RBM20 is expressed in specific populations of neurons in the CNS, where it binds to and regulates the expression of synapse-related RNAs. This is an important finding because it reveals a new mechanism for gene regulation in neurons by an RNA binding protein previously studied in the heart; the authors also provide data to suggest that the mechanism by which RBM20 acts in neurons may be distinct from the splicing regulation studied in cardiac tissue. The data in support of the binding and regulation of RNAs by RBM20 is compelling, using leading edge sequencing methods to determine RNA binding profiles, and cell type specific genetics for evaluation of function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study set out to find RNA binding proteins in the CNS in cell-type specific sequencing data and discover that the cardiomyopathy-associated protein RBM20 is selectively expressed in olfactory bulb glutamatergic neurons and PV+ GABAergic neurons. They make an HA-tagged RBM20 allele to perform CLIP-seq to identify RBM20 binding sites and find direct targets of RBM20 in olfactory bulb glutmatergic neurons. In these neurons, RBM20 binds intronic regions. RBM20 has previously been implicated in splicing, but when they selectively knockout RBM20 in glutamatergic neurons they do not see changes in splicing, but they do see changes in RNA abundance, especially of long genes with many introns, which are enriched for synapse-associated functions. These data show that RBM20 has important functions in gene regulation in neurons, which was previously unknown, and they suggest it acts through a mechanism distinct from what has been studied before in cardiomyocytes.

      Strengths:

      The study finds expression of the cardiomyopathy-associated RNA binding protein RBM20 in specific neurons in the brain, opening new windows into its potential functions there.

      The study uses CLIP-seq to identify RBM20 binding RNAs in olfactory bulb neurons.

      Conditional knockout of RBM20 in glutamatergic or PV neurons allows the authors to detect mRNA expression that is regulated by RBM20.

      The data include substantial controls and quality control information to support the rigor of the findings.

      Weaknesses:

      The authors do not fully identify the mechanism by which RBM20 acts to regulate RNA expression in neurons, though they do provide data suggesting that neuronal RBM20 does not regulate alternate splicing in neurons, which is an interesting contrast to its proposed mechanism of function in cardiomyocytes. Discovery of the RNA regulatory functions of RBM20 in neurons is left as a question for future studies.

      The study does not identify functional consequences of the RNA changes in the conditional knockout cells, so this is also a question for the future.

    3. Reviewer #2 (Public review):

      Summary:

      The group around Prof. Scheiffele has made seminal discoveries reg. alternative splicing that is reflected by a current ERC advanced grant and landmark papers in eLife (2015), Science (2016), and Nature Neuroscience (2019). Recently, the group investigated proteins that contain an RRM motif in the mouse cortex. One of them, termed RBM20, was originally thought be muscle-specific and involved in alternative splicing in cardiomyocytes. However, upon close inspection, RBP20 is expressed in a particular set of interneurons (PV positive cells of the somatosensory cortex) in the cortex as well as in mitral cells of the olfactory bulb (OB). Importantly, they used CLIP to identify targets in the OB and heart. Next and quite importantly, they generated a knock-in mouse line with a His-biotin acceptor peptide and a HA epitope to perform specific biochemistry. Not surprisingly, this allowed them to specifically identify transcripts with long introns, however, most of the intronic binding sites were very distant to the splice sites. Closer GO term inspection revealed that RBM20 specifically regulates synapse-related transcripts. In order to get in vivo insight into its function in the brain, the authors generated both global as well as conditional KO mice. Surprisingly, there were no significant differences in in RBM20 PV interneurons, however, 409 transcripts were deregulated in in OB glutamatergic neurons. Here, CLIP sites were mostly found to be very distant from differentially expressed exons. Furthermore, loss-of-function RBM20 primarily yields loss of transcripts, whereas upregulation appears to be indirect. Together, these results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Strengths:

      The quality of the data and the figures is high, impressive and convincing. The reported results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Weaknesses:

      In their revised manuscript, the authors significantly improved the intro and results section, which is now much better suited for the general public and allows better to follow the logic of the experiments. Also, the discussion has now been expanded doing better justice to the importance of the findings presented.

      In my opinion, the revised manuscript clearly improved and represents a timely and important study, which provides major new insight into the expression and possible function of RBM20 in tissues outside of muscle.

    4. Reviewer #3 (Public review):

      Summary:

      The authors identified RBM20 expression in neural tissues using cell type-specific transcriptomic analysis. This discovery was further validated through in vitro and in vivo approaches, including RNA fluorescent in situ hybridization (FISH), open-source datasets, immunostaining, western blotting, and gene-edited RBM20 knockout (KO) mice. CLIP-seq and RiboTRAP data demonstrated that RBM20 regulates common targets in both neural and cardiac tissues, while also modulating tissue-specific targets. Furthermore, the study revealed that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins.

      Strengths:

      • Utilization of a large dataset combined with experimental evidence to identify and validate RBM20 expression in neural tissues.<br /> • Global and tissue-specific RBM20 KO mouse models provide robust support for RBM20 localization and expression.<br /> • Employing heart tissue as a control highlights the unique findings in neural tissues.

      Weaknesses:

      • Lack of physiological functional studies to explore RBM20's role in neural tissues.<br /> • Data quality requires improvement for stronger conclusions.

      Comments on revisions:

      The authors have effectively addressed most of my concerns, which has significantly improved the quality and reliability of the data. While sufficient functional data were not provided, the current findings offer valuable and novel insights into the expression of RBM20 in neurons. I have no further concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We thank the three reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We have now addressed these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions and provide more detailed explanations in results and discussion.

      (2) We have uploaded higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We have included additional data on eCLIP control experiments in the supplementary figures.

      (4) We have performed additional replications of the western blot analysis for Rbm20 knock-out animals and provided the data in a new Figure.

      Recommendations for the authors:

      Reviewer #1:

      (1) The study is missing CLIP-seq data from control mice that do not express HA, or HA-knocked into a safe-harbor locus. This is important because there is plenty of background HA staining in Figure S2B, in wild-type mice. Including this control would allow subsequent peak calling to distinguish between non-specific HA peaks and RBM20 specific peaks.

      The biochemical conditions used in immunostaining are much less stringent than the buffers employed for immunoprecipitation in the eCLIP protocol. Thus, background staining is not a an informative reference to assess specificity of CLIP isolations. In previous experiments, we confirmed very low background with the anti-HA antibodies in our eCLIP protocol. In the present study, we used a “no-crosslinking control” where samples were not irradiated with UV light. This negative control is now included in Supplementary Figure 4.

      (2) The GO analysis performed to infer synapse-gene specific regulation would be more useful if the authors would discuss specific genes that are represented within these terms and have been shown to be associated with neuronal function.

      We have now noted several synapse-related genes identified in the text.

      (3) Some figures would benefit from larger size and higher resolution including Fig S1, S3.

      We had previously embedded Figures as png files in the text document. In the revised version we uploaded the figures in higher resolution as individual jpeg files. Moreover, we now split Figure S1 into two separate supplementary figures (new Fig.S2) which allowed for enlarging the size of panels. We further enlarged the panels of (former) Fig.S3 (now Fig.S4).

      (4) RBP genes in Figure 1A x-axis are all lowercase. This is not standard mouse gene nomenclature.

      We corrected this.

      (5) Typo in Figure S4F rightmost panel y-axis - 'Length' is misspelled.

      We corrected this.

      Reviewer #2:

      Minor points:

      - Shortly explain DESEQ2 (p4)

      We now added a brief note and corresponding reference in the main text of the manuscript.

      - Is RBM20 a shuttling protein? Any detection in the cytoplasm?

      Our immunostainings for the endogenous RBM20 in heart and olfactory bulb cells suggest that the vast majority of wild-type RBM20 is localized to the nucleus. Previous work on RBM20 disease mutants suggest that pathological forms can accumulate in the cytoplasm. However, with the sensitivity of our detection we did not obtain evidence for a significant cytoplasmic pool in neurons. This does not exclude the possibility that the protein is shuttling – but assessing this would require different types of experiments.

      Reviewer #3:

      (1) Figure 1C: It is shown that some of the RBM20 staining do not colocalize with PV. This observation requires further explanation and discussion to clarify the significance.

      As seen in the fluorescent in situ hybridizations as well as the RiboTRap purifications (Fig.S1C,D), we observe mRNA RBM20 expression not only in parvalbumin-positive interneurons but also somatostatin-positive cells of the neocortex. Accordingly, some RBM20-positive cells do not express parvalbumin. We now clarified this in the text.

      Additionally, in Figure S1C, the resolution of the image is low, making it difficult to conclusively determine whether RBM20 RNA is localized in the nucleus. A high-resolution image would be beneficial to address this ambiguity.

      The Rbm20 mRNA is localized in the nucleus and cytoplasm. We have now split Figure S1 into two separate figures to enlarge the panels for S1C and make this more visible. Moreover, we uploaded higher resolution figure files.

      (2) Figure 1E: The molecular weight of RBM20 is approximately 135 kDa, yet there is a band near 135 kDa in the KO heart. How do the authors determine that the 150 kDa band represents RBM20 rather than the 135 kDa band? The authors may consider increasing the sample size to confirm whether the smaller band consistently appears across all KO heart tissues.

      We appreciate that in this higher molecular weight range, the indicated weight markers may not be entirely accurate. We used a validated knock-out mouse line to identify the appropriate RBM20 protein band. As the 150kDa band was reproducibly lost in the knock-out tissue in the brain and the heart tissue whereas the fainter band of lower mobility remained we concluded that on our gel system RBM20 protein has an apparent molecular weight of 150 kDa. This is further supported by the fact that also the endogenously tagged RBM20 protein has a similar mobility.

      As suggested by the reviewer, we now re-ran Western blots from multiple wild-type and corresponding knock-out tissues. This further confirmed the migration of the protein and loss of the 150 kDa band in the mutant mice (new Figure 1E).

      (3) Figure 2A: A higher-resolution image is recommended. Prior studies on RBM20 mutation knock-in mice suggest that when RBM20 localizes to the cytoplasm, it promotes molecular condensate formation. This seems to be the case in Figure 2A; however, the low image quality makes it difficult to see these molecular condensates.

      Figure2A shows endogenous RBM20 (not the epitope-tagged protein in the knock-in mice). The vast majority of the protein is localized in the nucleus rather than the cytoplasm. We are a bit uncertain what “condensates” the reviewer refers to. In the heart, we indeed see accumulations of RBM20 in foci (as described previously in the literature). As judged by their location within the DAPI-positive area, these foci are in the nucleus. By contrast, in the olfactory bulb neurons (which express lower levels of RBM20) we do not see a comparable concentration in nuclear foci but rather broad and diffuse staining. This is consistent with the hypothesis that the nuclear foci depend on the expression of highly expressed target transcripts such as titin. To better visualize this, we now uploaded files with higher resolution for the revised manuscript.

      (4) Figure 4D: This figure is not cited in the main text and should be referenced appropriately.

      We corrected this.

      (5) Page 5: The sentence "Finally, introns bound by RBM20 were significantly longer than expected by chance as assed..." contains a typo. The word "assed" should be corrected to "assessed".

      We corrected this.

      (6) Functional data: The study would benefit from functional experiments to elucidate the physiological role of RBM20 in PV neurons. For instance, since RBM20 regulates calcium-handling genes in neurons, does its absence impair calcium signaling in PV neurons? Additionally, given that RBM20 is involved in synaptic regulation, could RBM20 KO disrupt synaptic function? While it may not be feasible to address all these questions, providing some functional data would greatly enhance the overall significance of the study.

      We completely agree with the reviewer that this would greatly advance the study and the lack of data on cellular functions is the most significant limitation of this work. We attempted to obtain insights into cellular function through the structural investigations (Fig.S5). We had obtained some data on a behavioral phenotype in the mice which indicates that knock-out in vGLUT2 neurons precipitates alterations in behavior. However, due to conditions in our animal facility (emissions from construction) we struggled to solidify/confirm this data. Thus, in the interest of sharing the existing data in a timely manner we felt that more elaborate functional studies on synaptic transmission or calcium imaging should better be performed in a separate effort.

    1. eLife Assessment

      This study presents a useful method based on flow cytometry to study partitioning noise during cell division. The evidence supporting the claims of the authors is incomplete, as the method neglects other sources of noise present in cells. With the theoretical part extended, this paper would be of interest to cell biologists and biophysicists working on asymmetric partitioning during cell division.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry-based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition vary for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. This seems a large weakness because it is well known that fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. However, while I appreciate the overall goal and motivation of this work, I was not entirely convinced by the strength of this contribution. The approach focuses on a quite specific case, where the dynamics of the labelled component depend purely on partitioning. As such it seems incompatible with studying the partitioning noise of endogenous components that exhibit production/turnover. The description of the methods was partly hard to follow and should be improved. In addition, I have several technical comments, which I hope will be helpful to the authors.

      Comments:

      (1) In the theoretical model, copy numbers are considered to be conserved across generations. As a consequence, concentrations will decrease over generations due to dilution. While this consideration seems plausible for the considered experimental system, it seems incompatible with components that exhibit production and turnover dynamics. I am therefore wondering about the applicability/scope of the presented approach and to what extent it can be used to study partitioning noise for endogenous components. As presented, the approach seems to be limited to a fairly small class of experiments/situations.

      (2) Similar to the previous comment, I am wondering what would happen in situations where the generations could not be as clearly identified as in the presented experimental system (e.g., due to variability in cell-cycle length/stage). In this case, it seems to be challenging to identify generations using a Gaussian Mixture Model. Can the authors comment on how to deal with such situations? In the abstract, the authors motivate their work by arguing that detecting cell divisions from microscopy is difficult, but doesn't their flow cytometry-based approach have a similar problem?

      (3) I could not find any formal definition of division asymmetry. Since this is the most important quantity of this paper, it should be defined clearly.

      (4) The description of the model is unclear/imprecise in several parts. For instance, it seems to me that the index "i" does not really refer to a cell in the population, but rather a subpopulation of cells that has undergone a certain number of divisions. Furthermore, why is the argument of Equation 11 suddenly the fraction f as opposed to the component number? I strongly recommend carefully rewriting and streamlining the model description and clearly defining all quantities and how they relate to each other.

      (5) Similarly, I was not able to follow the logic of Section D. I recommend carefully rewriting this section to make the rationale, logic, and conclusions clear to the reader.

      (6) Much theoretical work has been done recently to couple cell-cycle variability to intracellular dynamics. While the authors neglect the latter for simplicity, it would be important to further discuss these approaches and why their simplified model is suitable for their particular experiments.

      (7) In the discussion the authors note that the microscopy-based estimates may lead to an overestimation of the fluctuations due to limited statistics. I could not follow that reasoning. Due to the gating in the flow cytometry measurements, I could imagine that the resulting populations are more stringently selected as compared to microscopy. Could that also be an explanation? More generally, it would be interesting to see how robust the results are in terms of different gating diameters.

      (8) It would be helpful to show flow cytometry plots including the identified subpopulations for all cell lines, currently, they are shown only for HCT116 cells. More generally, very little raw data is shown.

      (9) The title of the manuscript could be tailored more to the considered problem. At the moment it is very generic.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry-based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition vary for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. This seems a large weakness because it is well known that fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise.

      We thank the Reviewer for her/his evaluation of our manuscript. The point raised is indeed a crucial one. In a cell division cycle, there are at least three distinct sources of noise that affect component numbers [1] : 

      (1) Gene expression and degradation, which determine component numbers fluctuations during cell growth.

      (2) Variability in cell division time, which depending on the underlying model may or may not be a function of protein level and gene expression.

      (3) Noise in the partitioning/inheritance of components between mother and daughter cells.

      Our approach specifically addresses the latter, with the goal of providing a quantitative measure of this noise source. For this reason, in the present work, we consider homogeneous cancer cell populations that could be considered to be stationary from a population point-of-view. By tracking the time evolution of the distribution of tagged components via live fluorescent markers, we aim at isolating partitioning noise effects. However, as noted by the Reviewer, other sources of noise are present, and depending on the considered system the relative contributions of the different sources may change. Thus, we agree that a quantification of the effect of the various noise sources on the accuracy of our measurements will improve the reliability of our method. 

      In this respect, assuming independence between noise sources, we reasoned that variability in cell cycle length would affect the timing of population emergence but not the intrinsic properties of those populations (e.g., Gaussian variance). To test this hypothesis, we conducted a preliminary set of simulations in which cell division times were drawn from an Erlang distribution (mean = 18 h, k=4k = 4k=4). The results, showing the behavior of the mean and variance of the component distributions across generations, are presented in Author response image 1. Under the assumption of independence between different noise sources, no significant effects were observed. Next, we plan to quantify the accuracy of our measurements in the presence of cross-talks between the various noise sources. As suggested, we will update the manuscript to include a more complete discussion on this topic and an evaluation of our model’s stability.

      Author response image 1.

      Variance and mean of the distribution of fluorescence intensity as a function of the generation for a time course dynamic with cell-cycle length variability. We repeated the same simulations as the one in figure 1 of the manuscript, but introducing a variable division time for each cell. The division time of each cell is extracted from an Erlang distribution (mean = 18 h and k = 4). As it is possible to observe in the plots, the results of our theoretical framework are not affected from the introduction of this variability. Hence, the Gaussian Mixture Model is still able to give the correct results  even in a noisy environment.

      (1) Soltani, Mohammad, et al. "Intercellular variability in protein levels from stochastic expression and noisy cell cycle processes." PLoS computational biology 12.8 (2016): e1004972.

      Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. However, while I appreciate the overall goal and motivation of this work, I was not entirely convinced by the strength of this contribution. The approach focuses on a quite specific case, where the dynamics of the labelled component depend purely on partitioning. As such it seems incompatible with studying the partitioning noise of endogenous components that exhibit production/turnover. The description of the methods was partly hard to follow and should be improved. In addition, I have several technical comments, which I hope will be helpful to the authors.

      We are grateful to the Reviewer for her/his comments. Indeed, both partitioning and production turnover noise are in general fundamental processes. At present the only way to consider them together are time-consuming and costly transfection/microscopy/tracking experiments. In this work, we aimed at developing a method to effectively pinpoint the first component, i.e. partitioning noise thus we opted to separate the two different noise sources.  

      Below, we provide a point-by-point response that we hope will clarify all raised concerns.

      Comments:

      (1) In the theoretical model, copy numbers are considered to be conserved across generations. As a consequence, concentrations will decrease over generations due to dilution. While this consideration seems plausible for the considered experimental system, it seems incompatible with components that exhibit production and turnover dynamics. I am therefore wondering about the applicability/scope of the presented approach and to what extent it can be used to study partitioning noise for endogenous components. As presented, the approach seems to be limited to a fairly small class of experiments/situations.

      We see the Reviewer's point. Indeed, we are proposing a high-throughput and robust procedure to measure the partitioning/inheritance noise of cell components through flow cytometry time courses. By using live-cell staining of cellular compounds, we can track the effect of partitioning noise on fluorescence intensity distribution across successive generations. This specific procedure is purposely optimized to isolate partitioning noise from other sources and, as it is, can not track endogenous components or dyes that require fixation. While this certainly poses limits to the proposed approach, there are numerous contexts in which our methodology could be used to explore the role of asymmetric inheritance. Among others, (i) investigating how specific organelles are differentially partitioned and how this influences cellular behavior could provide deeper insights into fundamental biological processes: asymmetric segregation of organelles is a key factor in cell differentiation, aging, and stress response. During cell division, organelles such as mitochondria, the endoplasmic reticulum, lysosomes, peroxisomes, and centrosomes can be unequally distributed between daughter cells, leading to functional differences that influence their fate. For instance, Kajaitso et al. [1] proposed that asymmetric division of mitochondria in stem cells is associated with the retention of stemness traits in one daughter cell and differentiation in the other. As organisms age, stem cells accumulate damage, and to prevent exhaustion and compromised tissue function, cells may use asymmetric inheritance to segregate older or damaged subcellular components into one daughter cell. (ii) Asymmetric division has also been linked to therapeutic resistance in Cancer Stem Cells  [2]. Although the functional consequences are not yet fully determined, the asymmetric inheritance of mitochondria is recognized as playing a pivotal role [3]. Another potential application of our methodology may be (iii) the inheritance of lysosomes, which, together with mitochondria, appears to play a crucial role in determining the fate of human blood stem cells [4]. Furthermore, similar to studies conducted on liquid tumors [5][6], our approach could be extended to investigate cell growth dynamics and the origins of cell size homeostasis in adherent cells [7][8][9].  The aforementioned cases of study can be readily addressed using our approach that in general is applicable whenever live-cell dyes can be used. We will add a discussion of the strengths and limitations of the method in the Discussion section of the revised version of the manuscript. 

      (1) Katajisto, Pekka, et al. "Asymmetric apportioning of aged mitochondria between daughter cells is required for stemness." Science 348.6232 (2015): 340-343.

      (2) Hitomi, Masahiro, et al. "Asymmetric cell division promotes therapeutic resistance in glioblastoma stem cells." JCI insight 6.3 (2021): e130510.

      (3) García-Heredia, José Manuel, and Amancio Carnero. "Role of mitochondria in cancer stem cell resistance." Cells 9.7 (2020): 1693.

      (4) Loeffler, Dirk, et al. "Asymmetric organelle inheritance predicts human blood stem cell fate." Blood, The Journal of the American Society of Hematology 139.13 (2022): 2011-2023.

      (5) Miotto, Mattia, et al. "Determining cancer cells division strategy." arXiv preprint arXiv:2306.10905 (2023).

      (6) Miotto, Mattia, et al. "A size-dependent division strategy accounts for leukemia cell size heterogeneity." Communications Physics 7.1 (2024): 248.

      (7) Kussell, Edo, and Stanislas Leibler. "Phenotypic diversity, population growth, and information in fluctuating environments." Science 309.5743 (2005): 2075-2078.

      (8) McGranahan, Nicholas, and Charles Swanton. "Clonal heterogeneity and tumor evolution: past, present, and the future." Cell 168.4 (2017): 613-628.

      (9) De Martino, Andrea, Thomas Gueudré, and Mattia Miotto. "Exploration-exploitation tradeoffs dictate the optimal distributions of phenotypes for populations subject to fitness fluctuations." Physical Review E 99.1 (2019): 012417.

      (2) Similar to the previous comment, I am wondering what would happen in situations where the generations could not be as clearly identified as in the presented experimental system (e.g., due to variability in cell-cycle length/stage). In this case, it seems to be challenging to identify generations using a Gaussian Mixture Model. Can the authors comment on how to deal with such situations? In the abstract, the authors motivate their work by arguing that detecting cell divisions from microscopy is difficult, but doesn't their flow cytometry-based approach have a similar problem?

      The point raised is an important one, as it highlights the fundamental role of the gating strategy. The ability to identify the distribution of different generations using the Gaussian Mixture Model (GMM) strongly depends on the degree of overlap between distributions. The more the distributions overlap, the less capable we are of accurately separating them.

      The extent of overlap is influenced by the coefficients of variation (CV) of both the partitioning distribution function and the initial component distribution. Specifically, the component distribution at time t results from the convolution of the component distribution itself at time t−1 and the partitioning distribution function. Therefore, starting with a narrow initial component distribution allows for better separation of the generation peaks. The balance between partitioning asymmetry and the width of the initial component distribution is thus crucial.

      As shown in Author response image 2, increasing the CV of either distribution reduces the ability to distinguish between different generations.

      Author response image 2.

      Components distribution at varying CVs of initial components and partitioning distributions. Starting from a condition in which both division asymmetry and wideness of the initial components distribution are low and different generations are clearly separable, increasing either the CVs leads to distribution mixing and greater reconstruction difficulty.

      However, the variance of the initial distribution cannot be reduced arbitrarily. While selecting a narrow distribution facilitates a better reconstruction of the distributions, it simultaneously limits the number of cells available for the experiment. Therefore, for components exhibiting a high level of asymmetry, further narrowing of the initial distribution becomes experimentally impractical.

      In such cases, an approach previously tested on liquid tumors [1] involves applying the Gaussian Mixture Model (GMM) in two dimensions by co-staining another cellular component with lower division asymmetry.

      Regarding time-lapse fluorescence microscopy, the main challenge lies not in disentangling the interplay of different noise sources, but rather in obtaining sufficient statistical power from experimental data. While microscopy provides detailed insights into the division process and component partitioning, its low throughput limits large-scale statistical analyses. Current segmentation algorithms still perform poorly in crowded environments and with complex cell shapes, requiring a substantial portion of the image analysis pipeline to be performed manually, a process that is time-consuming and difficult to scale. In contrast, our cytometry-based approach bypasses this analysis bottleneck, as it enables a direct population-wide measurement of the system's evolution. We will provide a detailed discussion on these aspects in the revised version of the manuscript.

      (1) Peruzzi, Giovanna, et al. "Asymmetric binomial statistics explains organelle partitioning variance in cancer cell proliferation." Communications Physics 4.1 (2021): 188.

      (3) I could not find any formal definition of division asymmetry. Since this is the most important quantity of this paper, it should be defined clearly.

      We thank the Reviewer for the note. With division asymmetry we refer to a quantity that reflects how similar two daughter cells are likely to be in terms of inherited components after a division process. We opted to measure it via the coefficient of variation (root squared variance divided by the mean) of the partitioning fraction distribution. We will amend this lack of definition in the reviewed version of the manuscript. 

      (4) The description of the model is unclear/imprecise in several parts. For instance, it seems to me that the index "i" does not really refer to a cell in the population, but rather a subpopulation of cells that has undergone a certain number of divisions. Furthermore, why is the argument of Equation 11 suddenly the fraction f as opposed to the component number? I strongly recommend carefully rewriting and streamlining the model description and clearly defining all quantities and how they relate to each other.

      We are amending the text carefully to avoid double naming of variables and clarifying each computation passage. In equation 11 the variable f refers to the fluorescent intensity, but the notation will be changed to increase clarity. 

      (5) Similarly, I was not able to follow the logic of Section D. I recommend carefully rewriting this section to make the rationale, logic, and conclusions clear to the reader.

      We will update the manuscript clarifying the scope of section D and its results. In brief, Section A presents a general model to derive the variance of the partitioning distribution from flow cytometry time-course data without making any assumptions about the shape of the distribution itself. In Section D, our goal is to interpret the origin of asymmetry and propose a possible form for the partitioning distribution. Since the dyes used bind non-specifically to cytoplasmic amines, the tagged proteins are expected to be uniformly distributed throughout the cytoplasm and present in large numbers. Given these assumptions the least complex model for division follows the binomial distribution, with a parameter that measures the bias in the process. Therefore, we performed a similar computation to that in Section A, which allows us to estimate not only the variance but also the degree of biased asymmetry. Finally, we fitted the data to this new model and proposed an experimental interpretation of the results.

      (6) Much theoretical work has been done recently to couple cell-cycle variability to intracellular dynamics. While the authors neglect the latter for simplicity, it would be important to further discuss these approaches and why their simplified model is suitable for their particular experiments.

      We agree with the Reviewer, we will discuss this aspect in the revised version of the manuscript.

      (7) In the discussion the authors note that the microscopy-based estimates may lead to an overestimation of the fluctuations due to limited statistics. I could not follow that reasoning. Due to the gating in the flow cytometry measurements, I could imagine that the resulting populations are more stringently selected as compared to microscopy. Could that also be an explanation? More generally, it would be interesting to see how robust the results are in terms of different gating diameters.

      The Reviewer is right on the importance of the sorting procedure. As already discussed in a previous point, the gating strategy we employed plays a fundamental role: it reduces the overlap of fluorescence distributions as generations progress, enables the selection of an initial distribution distinct from the fluorescence background, allowing for longer tracking of proliferation, and synchronizes the initial population. The narrower the initial distribution, the more separated the peaks of different generations will be. However, this also results in a smaller number of cells available for the experiment, requiring a careful balance between precision and experimental feasibility. A similar procedure, although it would certainly limit the estimation error, would be impracticable In the case of microscopy. Indeed, the primary limitation and source of error is the number of recorded events. Our pipeline allowed us to track on the order of hundreds of division dynamics, but the analysis time scales non-linearly with the number of events. Significantly increasing the dataset would have been extremely time-consuming. Reducing the analysis to cells with similar fluorescence, although theoretically true, would have reduced the statistics to a level where the sampling error would drastically dominate the measure. Moreover, different experiments would have been hardly comparable, since different fluorescences could map in equally sized cells. In light of these factors, we expect higher CV for the microscopy measure than for flow cytometry’s ones.  In the plots below, we show the behaviour of the mean and the standard deviation of N numbers sampled from a gaussian distribution N(0,1) as a function of the sampling number N. The higher is N the closer the sampled distribution will be to the true one. The region in the hundreds of samples is still very noisy, but to do much better we would have to reach the order of thousands. We will add a discussion on these aspects in the reviewed version of the manuscript. 

      Author response image 3.

      Standard deviation and mean value of a distribution of points sampled from a Gaussian distribution with mean 0 and standard deviation 1,  versus the number of samples, N. Increasing N leads to a closer approximation of the expected values. In orange is highlighted the Microscopy Working Region (Microscopy WR) which corresponds to the number of samples we are able to reach with microscopy experiments. In yellow the region we would have to reach to lower the estimating error, which is although very expensive in terms of analysis time.

      (8) It would be helpful to show flow cytometry plots including the identified subpopulations for all cell lines, currently, they are shown only for HCT116 cells. More generally, very little raw data is shown.

      We will provide the requested plots for the other cell lines together with additional raw data coming from simulations in the Supplementary Material. 

      (9) The title of the manuscript could be tailored more to the considered problem. At the moment it is very generic.

      We see the Reviewer point. The proposed title aims at conveying the wide applicability of the presented approach, which ultimately allows for the assessment of the levels of fluctuations in the levels of the cellular components at division. This in turn reflects the asymmetricity in the division.

    1. eLife Assessment

      This study presents valuable findings suggesting that the late maturation of prefrontal cortex-based control processes enhances conceptual learning by allowing a period of less-constrained knowledge acquisition. The authors provide convincing computational evidence that delayed semantic control promotes learning without compromising representation integrity, with the strongest benefits emerging when control connections target intermediate layers of the model. However, the model's narrow scope raises concerns about scalability to more complex, real-world learning environments, and the meta-analysis, while supporting the developmental trajectory, does not directly test the model's specific predictions regarding task outcomes or error patterns.

    2. Reviewer #1 (Public review):

      Summary:

      This study was motivated by the general claim that delayed development of cognitive control can be beneficial for learning, and investigated this claim in the specific domain of conceptual development. A comprehensive set of computational model simulations showed that delaying the onset of semantic control produces faster learning with only minimal effects on conceptual abstraction. The simulations also showed that control was most effective at intermediate levels between modality-specific "spokes" and the multimodal "hub". A meta-analysis of developmental data was consistent with the claim of delayed onset of semantic control: young children show substantially better semantic knowledge than the ability to constrain that knowledge to a specific task at hand.

      Strengths:

      The computational modelling is based on a very well-established model of semantic cognition, which means that the simulations allow exploring the specific issues under investigation here in the context of a model that accounts for a very large set of semantic cognition phenomena. The simulations are comprehensive - manipulating different parameters of the model provides important insights into how (and why) it works.

      In addition to simulations exploring delayed maturation, there is an exploration of where semantic control is most effective, yielding the interesting result that control is most effective when it targets intermediate levels of semantic processing. To my knowledge, this is a novel finding and a concrete prediction for future testing.

      The meta-analysis is designed in a very clever way that allows extracting evidence of semantic control from a large body of prior work. The results are quite clear and compelling in showing that semantic knowledge is acquired before children are able to use task demands to constrain the use of that knowledge.

      Weaknesses:

      Computational models of cognition inherently require simplification in order to focus on the mechanisms under investigation. However, it is also important to keep these simplifications in mind because they limit the generality of the inferences that can be made from the simulation results. Two aspects are important in this context:

      (1) The multimodal structure was orthogonal to the surface similarity structure of the concepts to be learned. It is certainly true that multimodal structure does not perfectly mirror surface similarity, but closely related things tend to be perceptually similar. There are exceptions (whales, penguins, etc.), but they are *exceptional*, not typical. It may be that the somewhat extreme dissociation of multimodal and surface similarity structures creates demands that are not faced in natural conceptual development.

      (2) Much of the benefit of delayed semantic control seems to be because the model is not penalised for activating task-irrelevant features. This blurs the distinction between being aware of a feature and making a response based on that feature. A full model that also includes a response layer could become a lot more complicated and more difficult to understand, so maybe there is an advantage to using a simpler architecture.

      In addition, there is a bit of a misalignment between the model simulations and the meta-analysis. In the model, there are distinct modality-specific "spokes" and control is required in order to focus on modality/spoke in a task-appropriate way. The meta-analysis does not compare a task-defined selection of a modality; it compares the selection of taxonomic vs thematic relations, both of which are multimodal. One way to resolve this is to say that taxonomic and thematic relations are also represented in distinct sub-systems of semantic knowledge and semantic control is needed to select between them in a task-appropriate way.

      This is particularly relevant to the inference at the bottom of p. 38: "taxonomic and thematic relationships ...[are]... both being encoded within the same system of representation", which seems in direct contradiction to the present results, or at least to the logic of combining these simulations with this meta-analysis. The simulations are based on semantic control being used to select/constrain the correct distinct sub-system (modality-specific spoke); the meta-analysis is based on semantic control being used to select/constrain the correct relationship type. If these two things are analogous in some way, then the relationship type has to be something like a distinct sub-system.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates the idea that the protracted maturation of the prefrontal cortex - often viewed as a developmental limitation - may actually confer advantages for conceptual learning in children. The authors focus on semantic control processes, which govern the context-sensitive application of conceptual knowledge, and are closely associated with late-developing regions of the prefrontal cortex.

      Drawing on a computational model, the paper formally tests whether delayed maturation of semantic control promotes the acquisition of conceptual knowledge. The simulations demonstrate that when semantic control and anatomical connectivity mature later, conceptual learning is accelerated without compromising the integrity of the learned representations. Notably, the benefit is most apparent when control connections target intermediate layers in the computational model, suggesting a nuanced interplay between control processes and the underlying conceptual network.

      To validate these computational insights in a human developmental context, the authors conduct a meta-analysis of the classic triadic matching task - a paradigm where participants decide which of two choices best matches a reference concept based on either taxonomic or thematic relations. Critically, when these relations conflict, semantic control is required to select the context-appropriate match. Results indicate that context-sensitive semantic control develops more slowly than basic conceptual knowledge, showing marked improvements between 3 and 6 years of age.

      Overall, the paper argues that the delayed development of prefrontal cortex-based control processes allows for a period of less constrained learning, ultimately enhancing conceptual acquisition. The findings challenge the traditional view of late PFC maturation as solely disadvantageous and instead position it as an adaptive feature for building robust conceptual frameworks in early childhood.

      Strengths:

      (1) Novel Theoretical Contribution<br /> The paper offers a compelling, counterintuitive argument that a developmental lag in the maturation of control processes might be beneficial for semantic learning. This stands in contrast to the conventional framing of late prefrontal cortex (PFC) development as purely disadvantageous (e.g., a "necessary but unfortunate" constraint).

      (2) Well-Grounded Computational Approach<br /> The authors propose a neural network model that is both theoretically driven (hub-and-spoke framework) and systematically tested under various conditions (different timelines for control onset, and different connectivity patterns). Their simulations replicate and extend previous findings about how insulating the multimodal hub from direct control inputs helps preserve abstract conceptual representations.

      (3) Neuro-anatomical basis<br /> The paper connects its computational claims to empirical neuroanatomy, particularly the lack of direct structural connectivity between ventral ATL (the "hub") and the PFC in humans. This lends biological plausibility to the argument that control signals likely reach the ATL via intermediate regions (e.g., posterior temporal cortex).

      (4) Meta-Analysis of Triadic Match-to-Sample<br /> The authors leverage decades of developmental data on conceptual matching tasks, reframing them in terms of semantic control vs. semantic representation. Their analysis nicely illustrates that children can identify semantic relationships (taxonomic or thematic) at age 2 if the task does not require them to select between conflicting semantic relations. In contrast, the ability to choose a task-relevant relation only emerges more robustly in 3-6 years. This developmental pattern aligns with the computational model's predictions.

      Weaknesses:

      The contribution of the paper might be considered rather specialist, and might not appeal to a broad public, which should be typical of a generalist journal. Moreover, the scope of the model is fairly narrow - its relatively small, controlled training environment raises questions about scalability to more naturalistic, high-dimensional data. Finally, the meta-analysis does not test directly the model predictions in terms of specific outcomes of the task, error patterns, or model fit, but only the developmental pattern which was an already observed phenomenon that in part motivated the hypothesis and the model itself.

    4. Author response:

      On the control of taxonomic versus thematic information. Both reviewers had questions about the relationship between the focus of the meta-analysis, the control of responses based on taxonomic versus thematic relationships, and the simulation. Both the model and the meta-analysis focus on the same mechanism, the controlled selection of task-appropriate features. In the case of the meta-analysis, this was the features and associations needed to identify the taxonomic or thematic relationships. As reviewer 1 notes, one possibility is that these kinds of structures are represented in distinct cortical regions. For instance, Mirman, Schwartz and colleagues have suggested that temporoparietal regions may preferentially support thematic knowledge while temporal regions may preferentially support taxonomic knowledge. Alternatively, they may be supported by different features instantiated within the same regions.  However, whether taxonomic and thematic relationships require access to features in different regions or not, is not crucial to the conclusions of this paper. The simulations used here happen to select features based on their inclusion in a particular sensory modality, yet they could learn to select any combination of features. Indeed, prior simulations using the Jackson et al., (2021) model show that the functional impact on learning of “deep” conceptual representations (together with controlled behaviours) is the same regardless of whether the potentiated features are localised within one spoke or distributed across spokes. Thus, the key results regarding the acquisition of semantic knowledge before the maturation of control in the current work should hold regardless of whether knowledge of taxonomic and thematic relations is localised to different anatomical regions.

      On model size and scalability. Both reviewers noted the relatively small size of the model and wondered about implications for ecological validity of the simulations and scalability to larger, noisier, and potentially more systematically structured training environments. We agree this is an important direction for future research, but one that faces two nontrivial challenges. First, reviewer 1 notes that, whereas our model environment employs orthogonal structures across spokes and for the cross-modal features, perceptual structure may be better-aligned with conceptual structure for real-world experience. While we appreciate the intuition, its validity depends to a key extent on how visual information about objects is encoded. Conceptual structure is certainly not apparent, for instance, in the distance between bitmap images of objects, nor the overlap of simple feature-extraction algorithms (such as edge detection or Fourier decomposition, etc). Even in this age of deep vision models, it remains unclear how the visual system extracts and discerns perceptual similarity from retinal input (see e.g. Mukherjee & Rogers, 2025). Most successful contemporary models train neural networks to assign visual images to semantic categories, suggesting that the visual features the model learns, and thus the perceptual similarities it represents, depend on learning to generate semantic information. Therefore, it is not clear whether the similarity that people perceive amongst instances of the same class is natively apparent in the bottom-up visual input, or whether it depends on semantic/cross-modal learning and representation. It should also be noted that within our training environment, there are features in each modality that are predictive of features in other modalities, as well as some that are only predictive of features within this modality. Thus, the full cross-modality conceptual structure is not orthogonal to the information available in each sensory domain, instead there is a relationship between surface and multimodal similarity in the dataset as in the real-world environment. In general, one virtue of the small-scale modelling endeavour in the current work is that we can be very explicit about the nature of the structure apparent within and across spokes.

      The second non-trivial issue concerns the nature of the mechanisms that allow for context-sensitive responding in large-scale language/vision models such as GPT 4. Such models are trained on web-scale language and vision and provide a means of simulating controlled behaviour with realistic stimuli, so might seem to provide a means of assessing scalability of current neuro-cognitive models. Large language/vision models rely, however, on transformer architectures whose relationship to hypothesized mechanisms of control in the mind and brain is unclear. In transformers, context-sensitive responding depends upon “attention” mechanisms that are fully distributed and integrated throughout the entire system—there is no distinction between control, representation, and short-term memory in the architecture. As a consequence, it is very difficult to understand why a model behaves the way it does, or to relate patterns of behaviour to hypothesised mechanisms in the human mind/brain. Yet transformers are currently the only models capable of exhibiting context-sensitive patterns of responding based on both language and vision. Scaling up neuro-cognitive models will require developing alternative architectures that preserve the critical hypothesised distinctions between representation and control while retaining the ability of transformers to learn from large-scale ecologically realistic corpora of language and images. In the meantime, small-scale simulations like those reported here provide some critical insights into aspects of architecture and maturation that may aid in this endeavour.

      On including a response layer. Reviewer 1 notes that our model does not separately simulate response-generation and the selective activation of relevant feature representations. We agree that there are interesting questions about how feature-potentiation and response-generation relate to one another, and that incorporating response selection in the current model would significantly complicate the analysis. The general idea that control potentiates/suppresses task-relevant feature representations in addition to simply promoting the correct response derives from classic work by Martin and others (e.g., Martin et al., 1995) showing that, for instance, regions involved in colour perception activate more strongly in tasks requiring retrieval of colour than tasks involving retrieval of action and vice versa—results consistent with the model training/testing procedure in the current work. In general, it may be counterproductive to become aware of aspects of a concept that would be irrelevant, or even actively unhelpful in making a response, suggesting guided activation is a necessary precursor to response selection (Botvinick & Cohen, 2014). Here, we focus on this important feature potentiation step.

      On the novelty of the meta-analysis. Reviewer 2 suggests the results of the meta-analysis were already known and provided motivation for the simulation. However, an important contribution of the current work is the observation that, in fact, there is little prior work on the development of semantic control. The widely known developmental delay in domain-general executive control, which did indeed motivate the study, is exclusively based on tasks requiring very different forms of executive control. Many of these involve no meaningful stimuli or require the child to completely inhibit a practiced response and generate an opposite or completely arbitrary responses, instead of requiring the child to use context to select among two or more meaningful behaviours that are equally valid in different contexts (see the introduction to Part 2). This observation, coupled with recent evidence that semantic control relies on dedicated and partially non-overlapping neural systems to executive function, illustrates the utility of the current meta-analysis: delineating the developmental trajectory of semantic control requires a task in which control is applied to the context-appropriate retrieval and manipulation of semantic knowledge, such as the triadic matching task. Moreover, the results show that semantic control, while arising later than semantic representation, nevertheless begins to mature earlier (around 2.5 years) than typical estimations of domain-general executive control (around 4). Thus, the meta-analysis contributes to our understanding of cognitive development while also testing a key prediction of the model.

    1. eLife Assessment

      The study presents valuable findings regarding the incidence and clinical impact of a mutation in a cardiac muscle protein and its association with the development of atrial fibrillation. The authors provide some convincing evidence of electrophysiological disturbances in cells with this mutation which would be of interest to cellular electrophysiologists. However, evidence supporting the conclusion that this mutation causes atrial fibrillation would benefit from more rigorous electrophysiologic approaches.

    2. Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSC-aCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses:

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: Additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs to further validate the proposed mechanism.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in the Titin (TTN) gene in this population. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2), has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSC-aCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.<br /> (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.<br /> (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency" yet is it is unclear what they are referring to by "arrhythmic frequency".

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased . . . suggesting an increased contractile force by the TTN-T32756I-iPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is separately interesting, the paper comes across as too descriptive because there is no merging of the data to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is definitely not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence at all. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written and there is a wide range of experimental techniques to probe this atrial fibrillation model.

      Weaknesses:

      (1) While the clinical data is interesting, it is extremely important to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is actually associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      (2) All of the contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A claim of shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a faster rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the calcium clock and calcium loading of the SR. See #2 above for suggestions on how to adequately probe calcium handling. Such an investigation into impulse initiation mechanisms would be very powerful in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, the general concept of linking shortened APD in isolated cells alone to AF causation is more problematic. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not really at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of

      Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSCaCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.  

      We thank the reviewer for their insightful comment. Our study identified deleterious missense variants using a stringent REVEL score threshold of ≥0.7; however, variants with a REVEL score above 0.5 are generally considered potentially pathogenic (Ioannidis, Nilah M., et al., Am J Human Genetics 2016; 9.4: 877-885). The TTN-T32756I variant (REVEL Score: 0.58758, Supplementary Table 1) was prioritized due to its occurrence in multiple unrelated individuals within our clinical AF cohort, despite no reported family history of AF in affected individuals. While no parental inheritance was observed, the possibility of a de novo origin cannot be excluded. Furthermore, this variant is located within a region overlapping a deletion mutation recently shown to cause AF in a zebrafish model (Jiang et al., iScience, 2024;27(7):110395) supporting its potential pathogenicity. Notably, the affected individuals did not carry additional loss-of-function TTN variants. We will clarify these points in the revised manuscript.

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      We acknowledge the reviewer’s concern that patient-specific iPSC lines could further validate our findings. However, due to the patients' unavailability of peripheral blood mononuclear cells (PBMCs), we utilized a healthy iPSC line and introduced the TTN-T32756I variant using CRISPR/Cas9 genome editing. This approach ensures an isogenic background, thereby minimizing genetic variability and providing a controlled system to study the direct effects of the mutation. We will acknowledge this limitation in the revised manuscript.

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      We agree that hypertension is a common comorbidity in patients with AF and could contribute to disease progression. However, all three individuals carrying TTN-T32756I exhibited early-onset AF (onset before 66 years), with one case occurring as early as 36 years. This suggests a potential two-hit mechanism, where genetic predisposition and comorbidities influence disease risk. Importantly, our iPSC model isolates the genetic effects of TTN-T32756I from other factors, supporting a direct pathogenic role. We will explicitly discuss this in the revised manuscript.

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      We appreciate the reviewer’s suggestion and agree that additional immunostaining data would strengthen the evidence for FHL2 colocalization with the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs. We will work on obtaining these additional data to validate our mechanistic findings further.

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: To further validate the proposed mechanism, additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      We agree with the reviewer that additional functional assays would further validate the proposed mechanism. We will perform contractility and electrophysiological experiments, such as multielectrode array (MEA) assays, to characterize better the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in this population in the Titin (TTN) gene. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2) has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSCaCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      We acknowledge the limitation of not including a non-AF group in our clinical analysis. Our cohort is derived from a single-center registry of individuals with AF, and we do not have a matched cohort of non-AF controls to compare the incidence of TTN missense variants. We recognize this as a limitation and will clarify that further studies are needed to define the prevalence of TTN missense variants in broader, multiethnic cohorts that include both AF and non-AF individuals.

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic, only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report that "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency," yet it is unclear what they are referring to by "arrhythmic frequency."

      We appreciate the reviewer’s request for clarification regarding "arrhythmic frequency." In our study, this term refers to the increased spontaneous beating rate and irregular action potentials observed in TTN-T32756I iPSC-aCMs compared to WT. Our findings suggest that the AF-associated TTN-T32756I variant induces ion channel remodeling and beating abnormalities, possibly contributing to an arrhythmogenic substrate for AF. We will refine our wording in the revised manuscript to enhance clarity and precision.

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased … suggesting an increased contractile force by the TTN-T32756IiPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

      We thank the reviewer for pointing this out and apologize for the inconsistency. We intended to report on contraction duration and relaxation rather than contraction force alone. The increased contraction amplitude reflects altered contractile force, whereas the reduced contraction duration and impaired relaxation indicate dysfunctional contractile dynamics. We will revise the text and corresponding figures to convey these findings accurately.

      Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is interesting, the paper comes across as too descriptive because there is no data merging to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written, and a wide range of experimental techniques are used to probe this atrial fibrillation model.

      Weaknesses

      (1) While the clinical data is interesting, it is essential to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      We recognize that AF and HFpEF frequently coexist and that HFpEF-related atrial remodeling could contribute to AF development. The primary aim of our cohort analysis was to explore the potential clinical significance of TTNmv. While we acknowledge the inherent limitations of retrospective observational data in establishing causality, our subsequent in vitro experiments were designed to demonstrate that TTNmv can alter the electrophysiological substrate, potentially predisposing individuals to AF.

      As HFpEF is a potential confounder, it is reasonable to consider whether TTNmv may also be associated with HFpEF. However, to our knowledge, no existing literature directly links TTNmv to HFpEF. In contrast, loss-of-function TTN variants are typically associated with heart failure with reduced ejection fraction (HFrEF) and dilated cardiomyopathy, and even their role in HFrEF remains controversial. To address potential confounding, our multivariable analysis for clinical outcomes was adjusted for reduced ejection fraction, and we conducted a sensitivity analysis excluding patients with nonischemic dilated cardiomyopathy (Supplementary Table 6). We will clarify these points in the revised manuscript.

      (2) All contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a quicker rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      We appreciate the reviewer’s technical concern. iPSC-derived cardiomyocytes (iPSC-CMs) exhibit spontaneous beating due to the presence of pacemaker-like currents and the absence of I<sub>k1</sub>, which allows for the study of intrinsic electrophysiological properties, ion channel function, and disease modeling. In our study, we utilized this unique property of iPSCCMs to test our hypothesis that TTNmvs alter electrophysiological properties through ion channel remodeling.

      While iPSC-CMs with identical backgrounds are expected to show comparable electrophysiological phenotypes under the same conditions, variability due to biological and technical factors (e.g., protein expression and culture handling) can result in differences between samples. We agree with the reviewer that pacing iPSC-CMs at the same rate for action potential duration (APD) and contractility measurements will control for cycle length effects and improve the reliability and interpretability of our findings. We will incorporate this approach into our revised experimental design.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the SR's calcium clock and calcium loading. See #2 above for suggestions on how to probe calcium handling adequately. Such an investigation into impulse initiation mechanisms would be compelling in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      We agree with the reviewer that investigating abnormal automaticity or triggered activity about the increased firing rate observed with the missense variant could provide valuable insights into the mechanisms underlying AF. As these processes are closely linked to calcium handling and the calcium clock, probing calcium cycling abnormalities could strengthen our understanding of how TTNmvs contribute to AF. We will incorporate additional experiments to investigate these mechanisms, further supporting our study's central hypothesis.

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, linking shortened APD in isolated cells alone to AF causation is more complicated. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

      We appreciate the reviewer’s insightful comment. We recognize that isolated cell studies cannot directly demonstrate reentrant circuits, and we agree that reentry should not be invoked solely based on cellular data. Our claim of shortened APD is based on observed abnormalities in APD and beating patterns, which may contribute to conditions conducive to reentry at the tissue level. We will clarify this distinction in the revised manuscript and refrain from directly linking APD shortening to reentry without tissue-level evidence.

    1. eLife Assessment

      Studying the biological roles of polyphosphates in metazoans has been a longstanding challenge to the field given that the polyP synthase has yet to be discovered in metazoans. This important study capitalizes on the sophisticated genetics available in the Drosophila system and uses a combination of methodologies to start to tease apart how polyphosphate participates in Drosophila development and in the clotting of Drosophila hemolymph. The data validating the tools are solid and well-documented and they will open up a field of research into the functional roles of polyP in a metazoan model.

    2. Reviewer #1 (Public review):

      Polymers of orthophosphate of varying lengths are abundant in prokaryotes and some eukaryotes where they regulate many cellular functions. Though they exist in metazoans, few tools exist to study their function. This study documents the development of tools to extract, measure, and deplete inorganic polyphosphates in *Drosophila*. Using these tools, the authors show:

      (1) that polyP levels are negligible in embryos and larvae of all stages while they are feeding. They remain high in pupae but their levels drop in adults.

      (2) that many cells in tissues such as the salivary glands, oocytes, haemocytes, imaginal discs, optic lobe, muscle, and crop, have polyP that is either cytoplasmic or nuclear (within the nucleolus).

      (3) that polyP is necessary in plasmatocytes for blood clotting in Drosophila.

      (4) that ployP controls the timing of eclosion.

      The tools developed in the study are innovative, well-designed, tested, and well-documented. I enjoyed reading about them and I appreciate that the authors have gone looking for the functional role of polyP in flies, which hasn't been demonstrated before. The documentation of polyP in cells is convincing as its role in plasmatocytes in clotting. Its control of eclosion timing, however, could result from non-specific effects of expressing an exogenous protein in all cells of an animal. The RNAseq experiments and their associated analyses on polyP-depleted animals and controls have not been discussed in sufficient detail. In its current form, the data look to be extremely variable between replicates and I'm therefore unsure of how the differentially regulated genes were identified.

      It is interesting that no kinases and phosphatases have been identified in flies. Is it possible that flies are utilising the polyP from their gut microbiota? It would be interesting to see if these signatures go away in axenic animals.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in the expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from the depletion of cytosolic polyP levels in these flies are that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      Weaknesses:

      Page 4 of the Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining and show that the staining is lost.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

    4. Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal, and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerating eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      (1) The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      (2) The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      (3) Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      Limitations:

      (1) While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do this for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      (2) The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

    5. Author response:

      Our reviewers brought three things to our notice:

      (1) PolyP has not been introduced as an abbreviation in the abstract.

      (2) 'colorimetric' is misspelled as 'calorimetric' in the following sentence of the results section.

      This method involved the digestion of polyP by recombinant S. cerevisiae exopolyphosphatase 1 (_Sc_Ppx1) followed by calorimetric measurement of the released Pi by malachite green.

      (3) A reference for hNUDT3 has been deleted due to the same technical glitch from the following sentence of introduction.

      Recently, biochemical experiments led to the discovery of endopolyphosphatase NUDT3, an enzyme known as a dinucleoside phosphatase.

    1. eLife Assessment

      This is an important study that examines the impact of Streptococcus pneumoniae genetics on its in vitro growth kinetics, aiming to identify potential targets for vaccines and therapeutics. The study identified significant variations in growth characteristics among capsular serotypes and lineages, linked to phylogeny and high heritability, but genome-wide association studies did not reveal specific genomic loci associated with growth features independent of the genetic background. The evidence supporting these findings is solid.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses a diverse isolate collection of Streptococcus pneumoniae from hospital patients in the Netherlands to understand the population-level genetic basis of growth rate variation in this pathogen, which is a key determinant of S. pneumoniae within-host fitness. Previous efforts have studied this phenomenon in strain-specific comparisons, which can lack the statistical power and scope of population-level studies. The authors collected a rigorous set of in vitro growth data for each S. pneumoniae isolate and subsequently paired growth curve analysis with whole-genome analyses to identify how phylogenetics, serotype, and specific genetic loci influence in vitro growth. While there were noticeable correlations between capsular serotype and phylogeny with growth metrics, they did not identify specific loci associated with altered in vitro growth, suggesting that these phenotypes are controlled by the collective effect of the entire genetic background of a strain. This is an important finding that lays the foundation for additional, more highly-powered studies that capture more S. pneumoniae genetic diversity to identify these genetic contributions.

      Strengths:

      (1) The authors were able to completely control the experimental and genetic analyses to ensure all isolates underwent the same analysis pipeline to enhance the rigor of their findings.

      (2) The isolate collection captures an appreciable amount of S. pneumoniae diversity and, importantly, enables disentangling the contributions of the capsule and phylogenetic background to growth rates.

      (3) This study provides a population-level, rather than strain-specific, view of how genetic background influences the growth rate in S. pneumoniae. This is an advance over previous studies that have only looked at smaller sets of strains.

      (4) The methods used are well-detailed and robust to allow replication and extension of these analyses. Moreover, the manuscript is very well written and includes a thoughtful and thorough discussion of the strengths and limitations of the current study.

      Weaknesses:

      (1) As acknowledged by the authors, the genetic diversity and sample size of this newly collected isolate set are still limited relative to the known global diversity of S. pneumoniae, which evidently limits the power to detect loci with smaller/combinatorial contributions to growth rate (and ultimately infection).

      (2) The in vitro growth data is limited to a single type of rich growth medium, which may not fully reflect the nutritional and/or selective pressures present in the host.

      (3) The current study does not use genetic manipulation or in vitro/in vivo infection models to experimentally test whether alteration of growth rates as observed in this study is linked to virulence or successful infection. The availability of a naturally diverse collection with phylogenetic and serotype combinations already identified as interesting by the authors provides a strong rationale for wet-lab studies of these phenotypes.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Chaguza et al. presents a novel perspective on pneumococcal growth kinetics, suggesting that the overall genetic background of Streptococcus pneumoniae, rather than specific loci, plays a more dominant role in determining growth dynamics. Through a genome-wide association study (GWAS) approach, the authors propose a shift in how we understand growth regulation, differing from earlier findings that pinpointed individual genes, such as wchA or cpsE, as key regulators of growth kinetics. This study highlights the importance of considering the cumulative impact of the entire genetic background rather than focusing solely on individual genetic loci.

      The study emphasizes the cumulative effects of genetic variants, each contributing small individual impacts, as the key drivers of pneumococcal growth. This polygenic model moves away from the traditional focus on single-gene influences. Through rigorous statistical analyses, the authors persuasively advocate for a more holistic approach to understanding bacterial growth regulation, highlighting the complex interplay of genetic factors across the entire genome. Their findings open new avenues for investigating the intricate mechanisms underlying bacterial growth and adaptation, providing fresh insights into bacterial pathogenesis.

      Strengths:

      This study exemplifies a holistic approach to unraveling key factors in bacterial pathogenesis. By analyzing a large dataset of whole-genome sequences and employing robust statistical methodologies, the authors provide strong evidence to support their main findings. Which is a leap forward from previous studies focused on a relatively smaller number of strains. Their integration of genome-wide association studies (GWAS) highlights the cumulative, polygenic influences on pneumococcal growth kinetics, challenging the traditional focus on individual loci. This comprehensive strategy not only advances our understanding of bacterial growth regulation but also establishes a foundation for future research into the genetic underpinnings of bacterial pathogenesis and adaptation. The amount of data generated and corresponding approaches to analyze the data are impressive as well as convincing. The figures are convincing and comprehensible too.

      Weaknesses:

      Despite the strong outcomes of the GWAS approach, this study leaves room for differing interpretations. A key point of contention lies in the title, which initially gives the impression that the research addresses growth kinetics under both in vitro and in vivo conditions. However, the study is limited to in vitro growth kinetics, with the assumption that these findings are equally applicable to in vivo scenarios-a premise that is not universally valid. To more accurately reflect the study's scope and avoid potential misrepresentation, the title should explicitly specify "in vitro" growth kinetics. This clarification would better align the title with the study's actual focus and findings.

      This study suggests that the entire genetic background significantly influences bacterial growth kinetics. However, to transform these predictions into established facts, extensive experimental validation is necessary. This would involve "bench experiments" focusing on generating and studying mutant variants of serotypes or strains with diverse genomic variations, such as targeted deletions. The growth phenotypes of these mutants should be analyzed, complemented by complementation assays to confirm the specific roles of the deleted regions. These efforts would provide critical empirical evidence to support the findings from the GWAS approach and enhance understanding of the genetic basis of bacterial growth kinetics.

      In the discussion section, the authors state that "the influence of serotype appeared to be higher than the genetic background for the average growth rate" (lines 296-298). Alongside references 13-15, this emphasizes the important role of capsular variability, which is a key determinant of serotypes, in influencing growth kinetics. However, this raises the question: why isn't a specific locus like cps, which is central to capsule biogenesis, considered a strong influencer of growth kinetics in this study?

      One plausible explanation could be the absence of "elevated signals" for cps in the GWAS analysis. GWAS relies on identifying loci with statistically significant associations to phenotypes. The lack of such signals for cps may indicate that its contribution, while biologically important, does not stand out genome-wide. This might be due to the polygenic nature of growth kinetics, where the overall genetic background exerts a cumulative effect, potentially diluting the apparent influence of individual loci like cps in statistical analyses.

    4. Reviewer #3 (Public review):

      This study provides insights into the growth kinetics of a diverse collection of Streptococcus pneumoniae, identifying capsule and lineage differences. It was not able to identify any specific loci from the genome-wide association studies (GWAS) that were associated with the growth features. It does provide a useful study linking phenotypic data with large-scale genomic population data. The methods for the large part were appropriately written in sufficient detail, and data analysis was performed with rigour. The interpretation of the results was supported by the data, although some additional explanation of the significance of e.g. ancestral state reconstruction would be useful. Efforts were made to make the underlying data fully accessible to the readers although some of the supplementary material could be formatted and explained a bit better.

    1. eLife Assessment

      This important study examines the relationship between cognition and mental health and investigates how brain, genetics, and environmental measures mediate that relationship. The methods and results are compelling and well-executed. Overall, this study will be of interest in the field of population neuroscience and in studies of mental health.

    2. Reviewer #1 (Public review):

      Summary:

      This work integrates two timepoints from the Adolescent Brain Cognitive Development (ABCD) Study to understand how neuroimaging, genetic, and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations.

      Strengths:

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors, and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from partial least squares. The authors also use a large well-characterized and diverse cohort of adolescents from the ABCD Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance

      Weaknesses:

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where many mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental-health-related measures, and how brain and genetics influence that prediction. They obtain an out-of-sample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less.

      Strengths:

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between the brain, cognition, genetics, and mental health - is interesting. Particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role.

      Weaknesses:

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extend it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work.

      In terms of more practical concerns, much of the paper relies on comparing r or R2 measures between different tests. These are always presented as point estimates without uncertainty. There would be some value, I think, in incorporating uncertainty from repeated sampling to better understand the improvements/differences between the reported correlations.

      The focus on mental health in a largely normative sample leads to the predictions being largely based on the normal range. It would be interesting to subsample the data and ask how well the extremes are predicted.

      A minor query - why are only cortical features shown in Figure 3?

    1. eLife Assessment

      This study establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution to studying social neuroscience within a laboratory setting; the approach is novel and well-executed, backed by convincing evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

    1. eLife Assessment

      The study describes a useful tool for assessing microglia morphology in a variety of experimental conditions. The MorphoCellSorter provides a solid platform for ranking microglia to reflect their morphology continuum and may offer new insight into changes in morphology associated with injury or disease. While the study provides an alternative approach to existing methods for measuring microglia morphology, the functional significance of the measured morphological changes were not determined.

    2. Reviewer #1 (Public review):

      The current manuscript by Bendeker et al. (2024) presents a new platform, MorphoCellSorter, for performing population wide microglial morphological analyses. This method adds to the many programs/platforms available to determine characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and present "big picture" views of how entire populations of microglia alter under different conditions. In their ranking system, Bendeker et al. (2024) use PCA to determine which of the morphological characteristics most define microglial populations, avoiding user subjective biases to determine these parameters. Compared to "expert" evaluators, MorphoCellSorter appears to perform consistently and accurately, including in different types of tissue preservation methods and in live cells, a key feature of the program. In addition, the researchers point out that this platform can be used across a wide array of imaging techniques and most microscopes that are available in a basic research lab. There are minor concerns about the platform's utility in analyzing embryonic microglia and primary microglial cultures, but overall, this platform will be another useful tool for microglial researchers to consider using in future studies. Furthermore, the method of morphological assessment aligns with the current direction of the field in identifying microglial cells in more nuanced ways.

      In their current revision, the authors have done an excellent job responding to concerns and have updated the manuscript accordingly.

    3. Reviewer #2 (Public review):

      The authors introduce MorphCellSorter, an open-source tool available on GitHub, designed for automated morphometric analysis of microglia. Current understanding suggests that microglia represent a heterogeneous population, especially in non-steady adult states, better characterized as a continuum rather than distinct cell groups.

      This tool was developed to classify microglia along this continuum. Using stained brain sections and microscope imaging, individual microglia are binarized and processed with MorphCellSorter, which categorizes them based on 20 morphological parameters. Notably, the tool is versatile, as it can be applied to both fluorescent and brightfield brain sections, as demonstrated by the authors. Additionally, it has been tested across various setups (both fixed and live tissues) and biological contexts (including embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures), showcasing its versatility and adaptability. Overall, the study is well-conceived and could have some value in the field.

      Numerous similar tools already exist, and the number is likely to grow, especially with advancements in AI. These tools have limited scientific utility as they provide descriptive rather than informative outputs. Microglial morphology varies due to external influences (such as developmental stages and injuries), but the significance of these variations remains largely hypothetical.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews (consolidated):

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem.

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a)  L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c)  L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain?

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it. As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g)  Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability?

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia?

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access,

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were not uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but, the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification.

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology. In Author response image 1 you will see how circularity and roundness describe differently cells

      Author response image 1.

      Correlation between Circularity and Roundness Factor in the Alzheimer disease dataset. A second order polynomial correlation exists between the two parameters in our dataset. Indeed (1) a single maximum is shared between both parameters. However, Circularity and Roundness Factor are not entirely redundant, as examplified by (2) the possible variety of Roundness Factors for a given Circularity as well as (3) the very different morphology minima of these two parameters.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be very clear we have added a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript (Supplemental figure 8).

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text.

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      PC<sub>1<.sub> is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC<sub>1</sub>, PC<sub>2</sub> as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1).

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Methods

      (1) Alzheimer's disease model: was a perfusion performed and then an hour later brains extracted? Please clarify.

      This is indeed what has been done.

      (2) For in vitro microglial studies: was a percoll gradient used for the separation of immune cells? What percentage percoll was used? Was there separation of myelin and associated debris with the percoll centrifugation? Please clarify the protocol as it is not completely clear how these cells were separated from the initial brain lysate suspension. What cell density was plated?

      The protocol has been completed, as followed: “Myelin and debris were then eliminated thanks to a Percoll® PLUS solution (E0414, Sigma-Aldrich) diluted with DPBS10X (14200075, Gibco) and enriched in MgCl<sub>2</sub> and CaCl<sub>2</sub> (for 50 mL of myelin separation buffer: 90 mL of Percoll PLUS, 10 mL of DPBS10X, 90 μL of 1 M CaCl<sub>2</sub> solution, and 50 μL of 1 M MgCl<sub>2</sub> solution).”. Thank you for your feedback.

      (3) How are the microglia "automatically cropped" in FIJI (for the Phox2b mutant)? Is there a function/macro in the program you used? This is very important for the workflow and needs to be clarified. The methods section of this manuscript is a guide for future users of this workflow and should be as descriptive as possible. It would be useful to give detailed information on the manual classification process, perhaps as a supplement. The authors do a nice job pointing out that these older methods are not effective in categorizing microglia that don't necessarily fit into a predefined phenotype.

      The protocol has been completed, as follows “. Briefly, the centroid of each detected object (i.e. microglia), except the ones on the borders, were detected, and a crop of 300x300 pixels around the objects were generated. Then, the pixels belonging to neighboring cells were manually removed on each generated crop.

      (4) Please address the concern that manual tuning and thresholding are required for this method's accuracy. Is this easily reproducible?

      Yes, it is easily reproducible for a given experimenter and is better suited than automatic thresholding. Although segmentation is not the primary focus of this paper, we leave it to users to choose the segmentation method that best fits their datasets.

      To address your question, we acknowledge that automated thresholding would theoretically be ideal. However, we encountered challenges due to non-uniform image acquisitions, even within the same sample. For instance, in ischemic brain samples, lipofuscin resulting from cell death introduced background noise that could artificially influence threshold levels. We tested both global and local algorithms for automatic binarization of cells, but these approaches often produced suboptimal segmentation results for individual cells.

      Based on our experience, manually adjusting the threshold provided more accurate, reliable, and consistent selection of cellular elements, even though it introduces a degree of subjectivity. To maintain consistency, we recommend that the same individual perform the analysis across all conditions.

      This clarification has been incorporated into the discussion as follows: “Although, automated thresholding would be ideal. In our case, image acquisitions were not entirely uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. This effect is observed even when comparing contralateral and ipsilateral sides of the same brain. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. “

      (5) How are the authors performing the PCA---what program (e.g .R)? Again, please be explicit about how these mathematical operations were computed. (lines 302-345).

      The PCA was made in Matlab, the code can be found on Github (https://github.com/Pascuallab/MorphCellSorter), as stated in the discussion.

      Other:

      (1) Can the authors comment on the challenges of the in vitro microglial analyses? The correlation of the experts v. MorphoCellSorter is much less than the fixed tissue. This is not addressed in the manuscript.

      In vitro, microglial cells exhibit a narrower range of morphological diversity compared to ex vivo or in vivo conditions. A higher proportion of cells share similar morphologies or morphologies with comparable complexities, which makes establishing a precise ranking more challenging. Consequently, the rank of many cells could be adjusted without significantly affecting the overall quality of the ranking.

      This explains why the rankings tend to show slightly greater divergence between experts. Interestingly, the ranking generated by MorphoCellSorter, which is objective and not subject to human bias, lies roughly midway between the rankings of the two experts.

      (2) You point out that the MorphoCellSorter may not be suited for embryonic/prenatal microglial analysis.

      This must be a misunderstanding because it is not what we concluded; we found that the ranking was correct but that we could not spot any differences due to transgenic alteration.

      The lack of differences observed in the embryonic microglia (Figure 5) is not necessarily surprising, as embryonic microglia have diverse morphological characteristics--- immature microglia do not possess highly ramified processes until postnatal development [see Hirosawa et al. (2005) https://doi.org/10.1002/jnr.20480 -they use an Iba1-GFP transgenic mouse to visualize prenatal microglia]. Also, see Bennett et al. (2016) [https://doi.org/10.1073/pnas.1525528113] which shows mature microglia not appearing until 14 days postnatal.

      We agree with the reviewer on that point nonetheless MorphoCellSorter provides an information on the fact that the population is homogeneous and that the mutation has no effect on the morphology.

      (3) Although a semantic issue, Figure 1's categorization of microglia shows predefined groups of microglia do not necessarily usefully bin many cells. Is still possible to categorize the microglia without using hotly debated categorization methods? The literature review in the current manuscript correctly points out the spectrum phenomenon of microglial activation states, though some of the suggestions from Paolicelli et al. (2022) are not put into action. The use of "activated" only further perpetuates the oversimplified classification of microglia. Perhaps the authors could consider using the term "reactive", as it is recognized by the Microglial nomenclature paper cited above. Are "amoeboid microglia" not "activated microglia"? "Reactive" is a less loaded term and is a recommended descriptor. Amoeboid microglia are commonly understood to be indicative of a highly proinflammatory environment, though you could potentially use "hyper-reactive" to differentiate them from the slightly ramified "reactive" cells.

      We changed activated microglia to reactive microglia as requested by the reviewer in the text. Thanks a lot for your comment

      (4) The graphs in Figures 3 B-D are visually difficult to interpret. The better color contrast between the MorphoCellSorter/Expert and Expert1/Expert2 would be useful--- perhaps a color for Expert 1 and a different color for Expert 2. Is this the ranking from the same data in Figure 1 (lines 420-421)? It is unclear what the x-axis represents in 3B-D. E-G is much more intuitive.

      We believe the confusion stems more from Figure 1 than Figure 3, as both figures use similar representations for entirely different analyses (clustering vs. ranking). To address this, we have provided an updated version of Figure 1 to help clarify this distinction and avoid any potential misinterpretation.

      Regarding Figure 3B-D, we do not fully see the need for changing the colors. These panels are histograms that display the distribution of rank differences either between experts and MorphoCellSorter or between the two experts. Assigning specific colors to the experts or MorphoCellSorter would be challenging, as the histograms represent comparative distributions involving both an expert and MorphoCellSorter or the ranking differences between the two experts.

      The same reasoning applies to Figures 3E-G. In these scatter plots, each point is defined by an ordinate (ranking value for one expert) and an abscissa (ranking value for either the other expert or MorphoCellSorter). Therefore, it would not be straightforward or meaningful to assign distinct colors to these elements within this context.

      (5) Line 217: use the term "imaged" rather than "generated" ... or "images were generated of clusters of microglia located .... using MICROSOPE and Zen software." You aren't generating microglia, rather, you are generating images.

      Thanks a lot for raising this problem, we changed the sentence as followed: “For the AD model, crops of individual microglial cells located in the secondary visual cortex were extracted from images using the Zen software (v3.5, Zeiss) and exported to the Tif image format.

      (6) Elaborate on how an "inversion operation" was applied to Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio, and skeleton processes. (Lines 299-300) Furthermore, a paragraph separation would be useful if the "inversion operation" is not what is described in the text immediately after this description.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      (7) Line 560: "measureclarke" seems to be an error associated with the reference. Please correct.

      Thanks a lot, this has been corrected

      (8) Discussion: compare MorphoCellSorter to the MIC-MAC program used by Salamanca et al. (2019). They use a similar approach, albeit not Andrew's plot.

      We have added the Salamanca reference

      Reviewer #2 (Recommendations for the authors):

      While it's not expected that the authors address the significance of the morphology in relation to function here, they could help highlight the issue and produce data that would enhance the paper's significance. Therefore, I recommend a small-scale and straightforward study where the authors couple their analysis with a marker (e.g. Lysotracker or Mitotracker) to produce data that link their morphometric analysis to more functional readouts. Furthermore, I encourage the authors to elaborate on the practical applications of these morphometric tools and the implications of their measurements, as this would provide context for their work, which, as it stands, feels like just another tool.

      We would like to thank the reviewer for their thoughtful comment and suggestion. Indeed, MorphoCellSorter is simply another tool, but one that offers a more convenient and efficient approach, producing a variety of results tailored to specific research needs. We strongly believe that MorphoCellSorter should be used in conjunction with other tools, depending on the specific research question.

      In our view, MorphoCellSorter is particularly well-suited for researchers who need a quick and efficient way to determine whether their treatment, gene invalidation, or other experimental conditions affect microglial morphology. In this context, MorphoCellSorter is fast, user-friendly, and highly effective. However, for those who aim to uncover detailed differences in cell morphology, other tools requiring more time-intensive, full reconstructions of the cells would be more appropriate.

      Providing additional data on the relationship between cellular function and morphology could certainly pave the way for new questions and more robust evidence. For instance, combining single-cell transcriptomics with morphological analysis would be an excellent approach to exploring the relationship between function and morphology. However, this would involve significant time, expense, and effort, and it represents a different line of inquiry altogether.

      While it would be ideal to clearly demonstrate the link between morphology and function, we are concerned that pursuing such a goal would considerably delay the implementation and adoption of our tool, potentially raising additional questions beyond the scope of this study.!

      Minor comments:

      (1) Can MorphCellSorter be adapted for use with other cell types (e.g., astrocytes)?

      Yes it could, we have made some pretty conclusive analysis on astrocytes but some parameters have to be adapted before being released.

      (2) What modifications would be necessary? If it is not applicable, would a name that includes "Microglia" be more descriptive?

      Modification would be quite minor, it is mainly the parameters being considered that would change, this is the reason why we will keep the MorphoCellSorter name. Thank you for the suggestion!

      (3) A common challenge with such tools is the technical expertise required to use them. Could a user-friendly interface be developed to better fulfill its intended purpose and benefit the community?

      This is a good point thank you, and the answer is yes, we will translate our Matlab code to Python to open it to a wider audience and we will certainly work on a friendly user interface!

      (4) Given that this tool relies on imaging, can users trace a cell (or group of cells) back to the original image?

      Yes, it is possible if each crop is annotated with the spatial coordinates during the segmentation step. It is not yet implemented in the actual version of the software but mainly depend on the way segmentation is performed, which is not the topic of the paper.

      (5)  Line 36: The "biologically relevant" statement is central and needs to be expanded.

      This is not easy as it is the abstract with a word limit. What we mean by this sentence is that when classifying cells we force them by mathematical tools to enter in a group of cells based on metrics that have not necessarily a biological meaning. We suggest the following modification “However, this classification may lack biological relevance, as microglial morphologies represent a continuum rather than distinct, separate groups, and do not correspond to mathematically defined, clusters irrelevant of microglial cells function.”

      (6) Line 49-50: Provide reference and elaborate. For example, does this apply during early life?

      We have slightly changed the sentence and added a reference.

      (7) Line 69: Provide reference.

      The reference, Hubert et al 2021 has been added

      (8) Lines 78-88: A table summarizing other efforts in morphometric characterization of microglia would be helpful in distinguishing your work from others.

      This has already been done in some review articles; we thus added the references to address readers to these reviews. Here is the revised version of the sentence: “ To date, the literature contains a wide variety of criteria to quantitatively describe microglial morphology, ranging from descriptive measures such as cell body surface area, perimeter, and process length to indices calculating different parameters such as circularity, roundness, branching index, and clustering (Adaikkan et al., 2019; Heindl et al., 2018; Kongsui, Beynon, Johnson, & Walker, 2014; Morrison et al., 2017; Young & Morrison, 2018)”

      (9) Lines 130, 145: Please provide complete genotype information and the sources of the animals used.

      It has been done

      (10) Materials and Methods:

      (1) Standardize the presentation of products (e.g., using # consistently).

      It has been done

      (2) Provide versions of software used.

      We have modified accordingly

      (3) Lines 372-373: A table listing the 20 parameters with brief explanations (as partially done in Materials and Methods) would greatly improve readability.

      This is done in supp figure 8

      (4) Since nomenclature is a critical issue in the literature, you used specific definitions (lines 376-383). However, please indicate (with a reference) why you use the term "activated," as it implies that the others are non-activated. Alternatively, define "activated" cluster differently.

      We change activated microglia to reactive microglia as requested by the reviewer #1.

      (4) Figure 1: In my opinion placing this figure as the first main figure is problematic as it confuses the message of the paper. Since the authors are introducing a new approach for morphological characterization in Figure 2, I recommend the latter for the sake of readability and clarity should be the first main image, while Figure 1 can move the supplements.

      We do agree with the reviewer, we thus changed figure one as explained earlier to reviewer 1. Nonetheless because it is an important step of our reflection process we believe it can stay as a figure. We hope the change made in figure one clarifies the message of the paper.

      (5) Figure 1: Please indicate on the figure the marker for the analysis.

      Figure 2 has been changed

      (6) No funding agencies are communicated.

      This has been corrected

    1. eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

    2. Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      The paper would be further strengthened by establishing the functional significance of the altered neural activity detected in the nTS (as measured by cFos and GcAMP/photometry) in the context of opioid-induced respiratory depression.

    3. Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both nalaxone and nalaxone methiodide, which does not cross the blood brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of nalaxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increases the translational value of the article. The authors address both prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of nalaxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      Comments on the latest version:

      I think the authors have adequately addressed previous critiques and I don't have any additional comments.

    4. Reviewer #3 (Public review):

      Summary

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with a potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes, and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      All weaknesses were addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

      We thank the reviewers for their insightful comments and critiques, which we have incorporated into the manuscript. We believe these revisions have significantly improved the manuscript. Additionally, following discussions among the authors, we have revised the color scheme across all figures. For example, the color of the symbols in Figure 1B-D now match the bars in Figure 1E-J, rather than the symbols. We feel that this change improves the clarity and visual consistency of the figures, making it easier to interpret the data across figures.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      A weakness of the study is the fact that the functional significance of opioid-induced changes in neural activity in the nTS (as measured by cFos and GcAMP/photometry) is not established. Does the nTS regulate fentanyl-induced respiratory depression, and are changes in nTS activity induced by naloxone and naloxone methiodide relevant to their ability to reverse respiratory depression?

      Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both naloxone and naloxone methiodide, which does not cross the blood-brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of naloxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increase the translational value of the article. The authors address both the prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated the activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of naloxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      The reviewer is correct that naloxone is the standard antidote for reversing opioid-induced respiratory depression. However, its limitations, including the risk of precipitated withdrawal, are well-documented in both preclinical and clinical studies. The likelihood of withdrawal increases when multiple doses of naloxone are administered. Since naloxone-induced withdrawal is centrally mediated, this study aimed to evaluate a peripherally restricted MOR antagonist for its ability to prevent or reverse fentanyl-induced respiratory depression. A key finding is that NLXM reversed OIRD without inducing aversive behavior. This suggests that peripheral antagonists like NLXM may be integrated into intervention strategies that save lives while preventing the adverse behavioral and physiological effects that are observed after treatment with naloxone.

      Reviewer #3 (Public review):

      Summary:

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      The major weakness of this version relates to the data analysis assessed sex-specific contributors to the results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some points for the authors to consider are:

      (1) In the Abstract, it is unclear why "high potency and lipophilicity" contribute to opioid-induced respiratory depression.

      The higher potency of fentanyl compared to other opioids significantly increases the risk of overdose and subsequent respiratory depression. Its high lipophilicity facilitates rapid absorption and central nervous system penetration, which contributes to the rapid onset of these cardiorespiratory depression. The narrow therapeutic window of fentanyl further emphasizes the critical need for timely intervention when an overdose has occurred, and effective antagonists to reverse respiratory depression and save lives. We have revised the abstract to clarify these points.

      (2) Are the doses of fentanyl used in the study (2, 20, or 50 µg/kg IV) relevant to those achieved by fentanyl-exposed human drug users?

      In these studies, we intravenously administered three doses of fentanyl. The human equivalent doses (HED) of 20ug/kg and 50 ug/kg fentanyl are ~3 ug/kg and ~8 ug/kg, respectively. These doses have previously been shown to induce respiratory depression in humans (Dahan et al.,2005).

      (3) In Figure 1, it appeared that only a small fraction of tyrosine hydroxylase-positive (TH+) neurons expressed cFos in response to fentanyl, and the degree of cFos expression was largely similar across all fentanyl doses tested. Thus, it is unclear whether TH+ neurons play a role in fentanyl-induced respiratory depression, and the value of these data is unclear (see point #6 below also).

      As shown in the mean data, the lowest dose of fentanyl, which was below the threshold for inducing OIRD, activated approximately 50% of tyrosine hydroxylase-positive (TH+) nTS neurons. In contrast, the highest dose of fentanyl resulted in a statistically significant increase, with ~75% of TH+ cells co-expressing Fos-IR.

      We included the assessment of catecholaminergic nTS cells for several reasons. The regions of the nTS evaluated in this study contains high expression of MOR and are the termination points of sensory afferent fibers transmitting cardiorespiratory information to the nTS (Aicher et al., 2000; Furdui et al., 2024). Catecholaminergic cells receive direct excitatory inputs from visceral afferents (Appleyard et al., 2007) and exhibit intensity-dependent increases in Fos-IR in rats exposed to hypoxic air (Kline et al., 2010; King et al., 2012). These neurons are essential for generating appropriate cardiorespiratory responses to hypoxic challenges (Bathina et al., 2013; King et al., 2015). As the reviewer notes, rats exposed to fentanyl exhibit a high degree of Fos-IR in the nTS, including catecholaminergic neurons. Despite the robust fentanyl-induced activation (increased Fos-IR) nTS neurons, yet there appears to be a failure to initiate appropriate chemoreflex-mediated cardiorespiratory responses. Our photometry data further indicate that fentanyl-induced changes in neuronal activity are mediated, in part, by peripheral MOR. Collectively, these findings suggest that fentanyl impacts nTS activity through alterations in peripheral afferent signaling to the nTS, which may contribute to the severity and duration of OIRD.

      (4) It would help with the flow of the paper if the pharmacokinetic data shown in Figure 6 were presented earlier (as part of Figure 2).

      We have moved the biodistribution data earlier in the manuscript, now presenting it as Figure 2. The numbering of all subsequent figures has been adjusted accordingly.

      (5) In Figure 5, there appears to be a large number of GCaMP-expressing neurons located outside the nTS. To what degree can the changes in calcium signaling, attributed to alterations in neural activity in the nTS, be explained by altered activity of neurons located outside the nTS?

      The reviewer is correct that our viral spread extends beyond the boundaries of the nTS, raising the possibility that the responses observed in Figure 5 may be influenced by neural activity of cells outside the nTS. While some viral spread beyond the target region is unavoidable, calcium transients were measured at the tip of the fiber, which was positioned directly within the nTS.

      To address this concern further, we performed Fos immunohistochemistry in a subset of animals that received bilateral GCaMP virus injections into the nTS. Following fentanyl administration (50 µg/kg IV), brains were collected two hours later. As shown in the accompanying image, we observed Fos-IR co-expression with GCaMP exclusively within the nTS boundaries. No Fos-IR was detected outside the nTS, including in GCaMP cells. Taken together, these findings support our conclusion that the data depicted in our photometry figure (now Figure 6) accurately represent fentanyl-induced activity changes in nTS neurons.

      Author response image 1.

      Arrowheads: Fos-negative GCaMP cell; Arrows: Co-labeled Fos/GCaMP cell; Asterisk: Fos+ GCaMP-negative cell

      (6) Currently, the cFos and photometry data are descriptive in nature. Are opioid-induced changes in nTS neural activity relevant to respiratory depression? If so, one might expect DREADD-mediated stimulation of the nTS neural activity (or stimulating nTS activity by some other means) would reverse fentanyl-induced respiratory depression similar to naloxone and methyl-naloxone.

      The reviewer raises an interesting point regarding the relevance of the nTS in the context of OIRD. The nTS is a major site of integration of sensory afferent information and involved in the initiation of reflex responses that facilitate a return to homeostasis. As described above, we characterized the collective response of nTS neurons to intravenous fentanyl using both Fos immunohistochemistry and fiber photometry. Our data indicate that fentanyl-induced changes in nTS activity are strongly mediated by peripheral MOR. While the suggestion to use global chemogenetic activation of nTS neurons to reverse fentanyl-induced respiratory depression is intriguing, results from these experiments may be difficult to interpret due to the extensive heterogeneity of the nTS. However, we are currently conducting similar experiments using a more selective approach that will allow us to isolate and evaluate specific nTS phenotypes to better understand their contributions to OIRD.

      (7) Are peripherally restricted mu opioid receptor (MOR) agonists available? If so, it would strengthen the paper if such compounds could be used to show that stimulation of peripheral MORs is sufficient to induce respiratory distress independent of actions on centrally located MORs.

      Peripherally acting Mu Opioid Receptor Antagonists (PAMORAs) are indeed available and currently being evaluated in our laboratory.

      Reviewer #2 (Recommendations for the authors):

      Consider having the figures/data numbered in the order that they appear in the manuscript. Right now, Figure 6 is mentioned between Figures 1 and 2 (minor).

      Thank you for this suggestion. We have reordered the figures so that the biodistribution figure appears before the MOR antagonist pretreatment and reversal figures.

      Reviewer #3 (Recommendations for the authors):

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      While this is an exciting and important study, there are a few minor to moderate critiques for the authors to consider. These are below.

      (1) Title: "devoid of aversive effects" - While CPA is a good, cumulative indicator of potential aversive effects, it is not an exhaustive one. Since no other withdrawal measures were included, this is an overstatement.

      The reviewer is correct in noting that our analysis of aversive effects is not exhaustive. Since we only assessed changes in aversive behavior between NLX and NLXM, we believe it is more accurate to modify the title accordingly. We have changed the title from “devoid of aversive effects” to “devoid of aversive behavior” better reflect the scope of the experiments conducted.

      (2) Page 3, top line: MOR (mu opioid receptor) is highly expressed...

      An article should likely be included prior to MOR or make plural and adjust the sentence.

      Thank you for this suggestion. We have reworked this section in the manuscript.

      (3) Figure 6D: this figure is very important for the interpretation of every single figure. It should either be moved to figure 1 or 2 or combined with figure 1 or 2.

      Thank you for this suggestion. The biodistribution figure has been moved to Figure 2.

      (4) Page 5, line 164, Figure 21-D: remove the 1.

      Done.

      (5) Sex differences (or lack thereof):

      Throughout the manuscript, the authors report a lack of sex differences. However, while the data is not powered for the distinction of sex differences, there appears to be a bi-modal distribution of the individual data points that likely correspond to sex across most experiments. For example, in Figure 2E there are both color and clear dots, which this reviewer assumes indicates sex (however, this wasn't easily apparent if it was commented on at all in the paper). If you look at the saline oxygen saturation (nadir) levels (2e), there is wide variability with the red-filled circles, but not the clear ones. This may indicate a bimodal distribution (and may be related to the baseline HR sex differences highlighted). This is also the case in Figure 2L but is perhaps more obvious in the CPA score data (Figure 4d), where it seems the nlx negative CPA effects were likely driven primarily by one sex. While this reviewer does not expect a full powering of experiments for sex differences (and also is very appreciative of the inclusion of both sexes), full raw data with sex indicated included in the supplemental data would greatly aid the field in general and allow for those with a specific interest in this area to build upon this data. Additionally, further discussion regarding the potential role of sex differences in the translational value of these findings is also warranted.

      For all bar graphs, open symbols represent females and filled symbols represent males. This information can be found in the first paragraph of the Materials and Methods section. We have also added this information to each figure for increased visibility. We appreciate the acknowledgement of our inclusion of both sexes. For all experiments, we attempted to balance by sex. Unfortunately, we occasionally had to exclude animals for technical reasons (with clogged catheters being the most common reason for exclusion). This sometimes led to an imbalance in sex in some groups, as the reviewer has noted. In the graph of oxygen saturation nadir values in Fig 2E (now Fig 3E in the revised manuscript, all animals received intravenous fentanyl at a dose of 20 ug/kg. The reviewer is correct that there is greater variability in the males (filled symbols) compared to the females (open symbols) in this graph. However, this variability in the distribution was not observed in Fig 1E or Fig 4E, in which male and female rats received an identical dose of 20 ug/kg. Taking this into account, our overall interpretation of the data is that there is relatively minor sex difference in the responses observed after intravenous fentanyl, and the variability in Fig 3E is primarily due to a lower n compared to Fig 1E.

      All raw data will be uploaded to a data repository.

      (6) Page 7, line 209: Figure 5D should be Figure 6D.

      We have incorporated this change.

      (7) Page 8, line 267: Cure should be Curve.

      We have incorporated this change.

      (8) Discussion: Page10, line322 states that "no detectable NLX ... was found in brain tissue". This is incorrect based on Figure 6.

      The sentence the reviewer highlighted refers to detection of NLX or NLXM in brain tissue from animals that received intravenous NLXM. As demonstrated in the biodistribution figure (now Figure 2 in the manuscript), our data demonstrate that an intravenous injection of NLXM did not result in NLX formation in the brain. We have reworked the sentence for clarity.

      (9) jGCaMP injections: Figure 5B/c shows the distribution of the gcamp across animals. The optic fiber is placed directly over the NTs. However, how are we certain there isn't a nearby nuclei/structure outside the NTS that is contributing to the photometry data presented in D-G?

      See our above comment.  

      (10) Fiber Photometry and Sex: These studies unfortunately may have had only 1 of a sex included in the fiber photometry data. While the inclusion is overall good, the single value for a sex suggests that there are differences, given the clustering of the data. While the anesthesia may be driving this potential sex effect, it is not clear based on the data presented. For reference: https://link.springer.com/article/10.1007/s12975-012-0229-y

      The reviewer is correct that there was an imbalance of sex in this dataset. While we made every attempt to balance for sex across all experiments, we unfortunately had to exclude some animals for technical reasons (clogged catheter, missed injection site, etc). This produced an imbalance in our photometry studies and did not allow us to thoroughly evaluate sex differences in fentanyl-induced changes in neural activity or in the responses to anesthesia. We have expanded on this limitation in the discussion.

      (11) Figure 5 - the bars are not the color indicated by the legend.

      We have corrected this in the figure. Thank you.

    1. eLife Assessment

      In this revised work, Barzó et al. assessed the electrophysiological and anatomical properties of a large number of layer 2/3 pyramidal neurons in brain slices of human neocortex across a wide range of ages, from infancy to elderly individuals, using whole-cell patch clamp recordings and anatomical reconstructions. This large data set represents an important contribution to our understanding of how these properties change across the human lifespan, supported by convincing data and analyses. The authors have addressed the concerns raised in previous reviews. Overall, this study strengthens our understanding of how the neural properties of human cortical neurons change with age and will contribute to building more realistic models of human cortical function.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of the human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that look at the cross-age differences biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Comments on revisions:

      Thanks to the authors for addressing my comments and providing greater clarity in the methodology. The analysis is much clearer now. I also appreciate their additional data analysis, particularly on morphology, which strengthens the paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generates recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      The authors group patient ages into developmentally organised bins, which are elaborated on in supplementary analysis - exemplifying the importance of determining early postnatal development on human neuron function

      Weaknesses:

      The author's use of (perhaps) arbitrary categorisation of spine morphology could limit the full usefulness of these data.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human brain development and implications for how different neuronal properties may influence life and disease associated with neurological conditions.

      Comments on revisions:

      Overall, the authors have satisfied my concerns. I fully appreciate their candour with their data and the potential limitations. I especially appreciate their supplementary data inclusions which I believe truly strengthen their conclusions and are a valuable resource for the field,

      I agree whole-heartedly with the authors assertion that it is perhaps better to use the most sophisticated equipment, not always being most appropriate. However, statistical rigour should still be standard. As such, my one remaining concern relates to inappropriate replicate choice of spine morphology data in figure 6. I commend the authors inclusion of additional reconstructions and morphology data from further cells in this data set. However, to me, these still represent data from 3 cells and 1 patient/age - as to the best of my interpretation. I feel it would be more helpful to plot cell averages +/- SD for each cell - even if side-by-side with data from all spines. Likewise, it is unclear what statistical test was performed on these data and did it take into account the fact that these values are a) from 3 technical replicates per group, or b) that many of the data sets consist of many zero-values (would a categorical test be more appropriate?).

    4. Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

      Comments on revisions:

      I believe that the current version has been sufficiently revised based on my comments.

    1. eLife assessment:

      This study describes a new set of genetic tools for optimized Cre-mediated gene deletion in mice. The advances are substantial and will facilitate biomedical research. Although the tools have been validated using solid methodologies, the quantitative assessment of their recombination efficiency is not yet sufficiently described. Evaluating their ability to mediate the deletion of multiple alleles in a mosaic setting would also be a highly valuable addition.

    2. Reviewer #1 (Public Review):

      Summary:

      Shi and colleagues report the use of modified Cre lines in which the coding region of Cre is disrupted by rox-STOP-rox or lox-STOP-lox sequences to prevent the expression of functional protein in the absence of Dre or Cre activity, respectively. The main purpose of these tools is to enable intersectional or tamoxifen-induced Cre activity with minimal or no leaky activity from the second, Cre-expressing allele. It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      Strengths:

      The new tools can reduce Cre leak in vivo.

      Weaknesses:

      (1) Activity of R26-loxCre line. As the authors point out, the greatest value of this approach is to accomplish a more complete Cre-mediated gene deletion using CreER transgenes that are combined with low-efficiency floxed alleles using their R26-loxCre line that is similar to the iSure Cre reported by Benedito and colleagues. The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression. Thus while the line appears to have minimal leak, as the design would predict, the question of how much of a deletion increase is obtained over simple use of the CreER transgene alone is a key question for use by investigators. This is further addressed in Figure 6 where it is compared with Alb-CreER alone to recombine the Ctnnb1 floxed allele. They demonstrate that recombination frequency is clearly improved, but the western blot in Figure 6E does not look like there was a large amount of remaining b-catenin to remove. These data are certainly promising, but the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target floxed allele. At the very least a comparision of Cre protein expression between the two lines using identical CreER activators is needed.

      (2) In vivo analysis of mCre activities. Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      (3) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

    3. Reviewer #2 (Public Review):

      Summary:

      This work presents new genetic tools for enhanced Cre-mediated gene deletion and genetic lineage tracing. The authors optimise and generate mouse models that convert temporally controlled CreER or DreER activity to constitutive Cre expression, coupled with the expression of tdT reporter for the visualizing and tracing of gene-deleted cells. This was achieved by inserting a stop cassette into the coding region of Cre, splitting it into N- and C-terminal segments. Removal of the stop cassette by Cre-lox or Dre-rox recombination results in the generation of modified Cre that is shown to exhibit similar activity to native Cre. The authors further demonstrate efficient gene knockout in cells marked by the reporter using these tools, including intersectional genetic targeting of pericentral hepatocytes.

      Strengths:

      The new models offer several important advantages. They enable tightly controlled and highly effective genetic deletion of even alleles that are difficult to recombine. By coupling Cre expression to reporter expression, these models reliably report Cre-expressing i.e. gene-targeted cells, and circumvent false positives that can complicate analyses in genetic mutants relying on separate reporter alleles. Moreover, the combinatorial use of Dre/Cre permits intersectional genetic targeting, allowing for more precise fate mapping.

      Weaknesses:

      The scenario where the lines would demonstrate their full potential compared to existing models has not been tested. Mosaic genetics is increasingly recognized as a key methodology for assessing cell-autonomous gene functions. The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      In addition, a drawback of this line is the constitutive expression of Cre. When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results. Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction. These drawbacks should be acknowledged.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors report a new version of the iSuRe-Cre approach, which was originally developed by Rui Benedito's group in Spain (https://doi.org/10.1038/s41467-019-10239-4). Shi et al claim that their approach shows reduced leakiness compared to the iSuRe-Cre line. Shi et al elaborate strongly about the leakiness of iSuRe-Cre mice, although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness (https://doi.org/10.1016/j.jbc.2021.100509). Furthermore, a new R26-roxCre-tdT mouse line was established after extensive testing, which enables efficient expression of the Cre recombinase after activation of the Dre recombinase.

      Strengths:

      The authors carefully evaluated the efficiency and leakiness of the new strains and demonstrated the applicability by marking peri-central hepatocytes in an intersectional genetics approach, amongst others. I can only find very few weaknesses in the paper, which represents the result of an enormous effort. Carefully conducted technical studies have considerable value. However, I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      Weaknesses:

      Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      The R26-GFP or R26-tdT reporters, Alb-roxCre1-tdT, Cdh5-roxCre4-tdT, Alb-roxCre7-GFP, and Cdh5-roxCre10-GFP demonstrate no leakiness without Dre-rox recombination (Figure S1-S2). Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      The enhanced efficiency of loxCre and roxCre systems holds promise for reducing the necessary tamoxifen dosage, potentially reducing toxicity and side effects. In Figure 6, the author demonstrates an enhanced recombination efficiency of loxCre mice, which makes it possible to achieve efficient deletion of Ctnnb1 with a single dose of tamoxifen, whereas a conventional driver (Alb-CreER) requires five dosages. It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256. Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCre-tdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. eLife Assessment

      This is an important study that characterizes a surprising interaction between two different cytokine/hormone receptors using nanoscale resolution (dSTORM) microscopy. The study provides solid evidence that the interaction is ligand-dependent, and is mediated by the receptor-associated intracellular signalling molecule JAK2. While at present limited to growth hormone and prolactin receptors in a limited number of cell lines, there are potentially broad implications for cytokine signalling, as such JAK2-mediated interactions could occur between a range of different cytokines. Moreover, the specific hormone interactions shown in the manuscript may have significant implications for understanding how these hormones can have differential effects in breast cancer, under different conditions.

    2. Reviewer #3 (Public review):

      Summary:

      The authors are interested in the relative importance of PRL versus GH and their interactive signaling in breast cancer. After examining GHR-PRLR interactions in response to ligands, they suggest that a reduction in cell surface GHR in response to PRL may be a mechanism whereby PRL can sometimes be protective against breast cancer.

      Strengths:

      The strengths of the study include the interesting question being addressed and the application of multiple complementary techniques, including dSTORM, which is technically very challenging, especially when using double labeling. Thus, dSTORM is used to analyze co-clustering of GHR and PRLR, and, in response to PRL, rapid internalization of GHR and increased cell surface PRLR. Conclusions from Proximity ligation assays are that some GHR and PRLR are within 40 nm (≈ 4 plasma membranes) of each other and that upon ligand stimulation, they move apart. Intact receptor knockin and knockout approaches and receptor constructs without the Jak2 binding domain demonstrate a) a requirement for the PRLR for there to be PRL- driven internalization of GHR, and b) that Jak2-PRLR interactions are necessary for stability of the GHR-PRLR colocalizations.

      Weaknesses:

      Although improved over the first version, the manuscript still suffers from a lack of detail, which in places makes it difficult to evaluate the data and would make it very difficult for the results to be replicated by others.

      Comments on revised version:

      Points for improvement of the manuscript:

      (1) There is still insufficient detail about the proximity ligation assay. For example, PLAs that use reagents from Sigma (as now reported) require primary antibodies from two different species and yet both the anti-PRLR and anti-GHR used for dSTORM were mouse monoclonals. On line 356 it says that the ECD antibodies were used for microscopy and the PLA is microscopy. Were instead the ICD antibodies used for the PLA? If so, how do we know that one or more of the proteins in the very strong "non-specific" bands seen on Figure 5A are not what is being localized? Could you do a Western blot of just cell membrane proteins? There needs to be further clarity/explanation.

      (2) Although the manuscript now shows a Western blot using the antibodies against intracellular regions of the receptor, a full Western blot is not provided for the antibodies against the S2 extracellular domain used for the dSTORM. While I haven't checked the papers showing characterization of the anti-GHR, I did re-check reference 70, which the authors say shows full characterization of the PRLR antibody, and this does not show a full Western (only portions of gels). How do we know that this antibody is not recognizing some other cell surface molecule, the surface expression of which increases upon stimulation of the cells with PRL? Is there only one band when blotting whole cell extracts with either the GHR or PRLR ECD antibodies so we can be sure of specificity? Figure S2 helps some, but these are different cells and the relative expression of the PRLR versus some other potential cell surface protein in these engineered cells may well be completely different.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The questions after reading this manuscript are what novel insights have been gained that significantly go beyond what was already known about the interaction of these receptors and, more importantly, what are the physiological implications of these findings? The proposed significance of the results in the last paragraph of the Discussion section is speculative since none of the receptor interactions have been investigated in TNBC cell lines. Moreover, no physiological experiments were conducted using the PRLR and GH knockout T47D cells to provide biological relevance for the receptor heteromers. The proposed role of JAK2 in the cell surface distribution and association of both receptors as stated in the title was only derived from the analysis of box 1 domain receptor mutants. A knockout of JAK2 was not conducted to assess heteromers formation.

      We thank the reviewer for these comments. The novel insight is that two different cytokine receptors can interact in an asymmetric, ligand-dependent manner, such that one receptor regulates the other receptor’s surface availability, mediated by JAK2. To our knowledge this has not been reported before. Beyond our observations, there is the question if this could be a much more common regulatory mechanism and if it has therapeutic relevance. However, answering these questions is beyond the scope of this work.

      Along the same line, the question regarding the biological relevance of our receptor heteromers and JAK2’s role in cell surface distribution is undoubtfully very important. Studying GHR-PRLR cell surface distributions in JAK2 knockout cells and certain TNBC cell lines as proposed by the reviewer could perhaps be insightful. However, most TNBCs down-regulate PRLR [1], so we would first have to identify TNBC cell lines that actually express PRLR at sufficiently high levels. Moreover, knocking out JAK2 is known to significantly reduce GHR surface availability [2,3], such that the proposed experiment would probably provide only limited insights.

      Unfortunately, our team is currently not in the position to perform any experiments (due to lack of funding and shortage of personnel). However, to address the reviewer’s comment as much as possible, we have revised the respective paragraph of the discussion section to emphasize the speculative nature of our statement and have added another paragraph discussing shortcoming and future experiments (see revised manuscript, pages 23-24).

      (1) López-Ozuna, V., Hachim, I., Hachim, M. et al. Prolactin Pro-Differentiation Pathway in Triple Negative Breast Cancer: Impact on Prognosis and Potential Therapy. Sci Rep 6, 30934 (2016). https://www.nature.com/articles/srep30934

      (2) He, K., Wang, X., Jiang, J., Guan, R., Bernstein, K.E., Sayeski, P.P., Frank, S.J. Janus kinase 2 determinants for growth hormone receptor association, surface assembly, and signaling. Mol Endocrinol. 2003;17(11):2211-27. doi: 10.1210/me.2003-0256. PMID: 12920237.

      (3) He, K., Loesch, K., Cowan, J.W., Li, X., Deng, L., Wang, X., Jiang, J., Frank, S.J. Janus Kinase 2 Enhances the Stability of the Mature Growth Hormone Receptor, Endocrinology, Volume 146, Issue 11, 2005, Pages 4755–4765,https://doi.org/10.1210/en.2005-0514

      (2) Except for some investigation of γ2A-JAK2 cells, most of the experiments in this study were conducted on a single breast cancer cell line. In terms of rigor and reproducibility, this is somewhat borderline. The CRISPR/Cas9 mutant T47D cells were not used for rescue experiments with the corresponding full-length receptors and the box1 mutants. A missed opportunity is the lack of an investigation correlating the number of receptors with physiological changes upon ligand stimulation (e.g., cellular clustering, proliferation, downstream signaling strength).

      We appreciate the reviewer’s comments. While we are confident in the reproducibility of our findings, including those obtained in the T47D cell line, we acknowledge that testing in additional cell lines would have strengthened the generalizability of our results. We also recognize that performing a rescue experiment using our T47D hPRLR or hGHR KO cells would have been valuable. Furthermore, examining physiological changes, such as proliferation rates and downstream signaling responses, would have provided additional insights. Unfortunately, these experiments were not conducted at the time, and we currently lack the resources to carry them out.

      (3) An obvious shortcoming of the study that was not discussed seems to be that the main methodology used in this study (super-resolution microscopy) does not distinguish the presence of various isoforms of the PRLR on the cell surface. Is it possible that the ligand stimulation changes the ratio between different isoforms? Which isoforms besides the long form may be involved in heteromers formation, presumably all that can bind JAK2?

      This is a very good point. We fully agree with the reviewer that a discussion of the results in the light of different PRLR isoforms is appropriate. We have added information on PRLR isoforms to the Introduction (see revised manuscript, page 2) and Discussion sections (see revised manuscript, pages 23-24).

      (4) Changes in the ligand-inducible activation of JAK2 and STAT5 were not investigated in the T47D knockout models for the PRL and GHR. It is also a missed opportunity to use super-resolution microscopy as a validation tool for the knockouts on the single cell level and how it might affect the distribution of the corresponding other receptor that is still expressed.

      We thank the reviewer for his comment. We fully agree that such additional experiments could be very valuable. We are sorry but, as already mentioned above, this is not something we are able to address at this stage due to lack of personnel and funding. However, we do hope to address these and other proposed experiments in the future.

      (5) Why does the binding of PRL not cause a similar decrease (internalization and downregulation) of the PRLR, and instead, an increase in cell surface localization? This seems to be contrary to previous observations in MCF-7 cells (J Biol Chem. 2005 October 7; 280(40): 33909-33916).

      It has been recently reported for GHR that not only JAK2 but also LYN binds to the box1-box2 region, creating competition that results in divergent signaling cascades and affects GHR nanoclustering [1]. So, it is reasonable to assume that similar mechanisms may be at work that regulate PRLR cell surface availability. Differences in cells’ expression of such kinases could perhaps play a role in the perceived inconsistency. Also, Lu et al. [2] studied the downregulation of the long PRLR isoform in response to PRL. All other PRLR isoforms were not detectable in MCF-7 cells. So, differences between MCF-7 and T47D may lead to this perceived contradiction.

      At this stage, we can only speculate about the actual reasons for these seemingly contradictory results. However, for full transparency, we are now mentioning this apparent contradiction in the Discussion section (see page 23) and have added the references below.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      https://www.cell.com/cell-reports/pdf/S2211-1247(23)00501-6.pdf

      (2) Lu, J.C., Piazza, T.M., Schuler, L.A. Proteasomes mediate prolactin-induced receptor down-regulation and fragment generation in breast cancer cells. J Biol Chem. 2005 Oct 7;280(40):33909-16. doi: 10.1074/jbc.M508118200. PMID: 16103113; PMCID: PMC1976473.

      (6) Some figures and illustrations are of poor quality and were put together without paying attention to detail. For example, in Fig 5A, the GHR was cut off, possibly to omit other nonspecific bands, the WB images look 'washed out'. 5B, 5D: the labels are not in one line over the bars, and what is the point of showing all individual data points when the bar graphs with all annotations and SD lines are disappearing? As done for the y2A cells, the illustrations in 5B-5E should indicate what cell lines were used. No loading controls in Fig 5F, is there any protein in the first lane? No loading controls in Fig 6B and 6H.

      We thank the reviewer for pointing this out. We have amended Fig. 5A to now show larger crops of the two GHR and PRLR Western Blot images and thus a greater range of proteins present in the extracts. Please note that the bands in the WBs other than what is identified as GHR and PRLR are non-specific and reflect roughly equivalent loading of protein in each lane.

      We also made some changes to Figures 5B-5E.

      (7) The proximity ligation method was not described in the M&M section of the manuscript.

      We thank the reviewer for pointing this out. We have added a description of the PL method to the Methods section.

      Reviewer #1 (Recommendations for the Authors):

      A final suggestion for future investigations: Instead of focusing on the heteromer formation of the GHR/PRLR which both signal all through the same downstream effectors (JAK2, STAT5), it would have been more cancer-relevant, and perhaps even more interesting, to look for heteromers between the PRLR and receptors of the IL-6 family since it had been shown that PRL can stimulate STAT3, which is a unique feature of cancer cells. If that is the case, this would require a different modality of the interaction between different JAK kinases.

      We highly appreciate the reviewer’s recommendation and hope to follow up on it in the near future.

      Reviewer #2 (Public Review):

      (1) I could not fully evaluate some of the data, mainly because several details on acquisition and analysis are lacking. It would be useful to know what the background signal was in dSTORM and how the authors distinguished the specific signal from unspecific background fluorescence, which can be quite prominent in these experiments. Typically, one would evaluate the signal coming from antibodies randomly bound to a substrate around the cells to determine the switching properties of the dyes in their buffer and the average number of localisations representing one antibody. This would help evaluate if GHR or PRLR appeared as monomers or multimers in the plasma membrane before stimulation, which is currently a matter of debate. It would also provide better support for the model proposed in Figure 8.

      We are grateful for the reviewer’s comment. In our experience, the background signal is more relevant in dSTORM when imaging proteins that are located at deeper depths (> 3 μm) above the coverslip surface. In our experiments, cells are attached to the coverslip surface and the proteins being imaged are on the cell membrane. In addition, we employed dSTORM’s TIRF (total internal reflection fluorescence) microscopy mode to image membrane receptor proteins. TIRFM exploits the unique properties of an induced evanescent field in a limited specimen region immediately adjacent to the interface between two media having different refractive indices. It thereby dramatically reduces background by rejecting fluorescence from out-of-focus areas in the detection path and illuminating only the area right near the surface.

      Having said that, a few other sources such as auto-fluorescence, scattering, and non-bleached fluorescent molecules close to and distant from the focal plane can contribute to the background signal. We tried to reduce auto-fluorescence by ensuring that cells are grown in phenol-red-free media, imaging is performed in STORM buffer which reduces autofluorescence, and our immunostaining protocol includes a quenching step aside from using blocking buffer with different serum, in addition to BSA. Moreover, we employed extensive washing steps following antibody incubations to eliminate non-specifically bound antibodies. Ensuring that the TIRF illumination field is uniform helps reduce scatter. Additionally, an extended bleach step prior to the acquisition of frames to determine localizations helped further reduce the probability of non-bleached fluorescent molecules.

      In short, due to the experimental design we do not expect much background. However, in the future, we will address this concern and estimate background in a subtype dependent manner. To this end we will distinguish two types of background noise: (A) background with a small change between subsequent frames, which mainly consists of auto-fluorescence and non-bleached out-of-focus fluorescent molecules; and (B) background that changes every imaging frame, which is mainly from non-bleached fluorescent molecules near the focal plane. For type (A) background, temporal filters must be used for background estimation [1]; for type (B) background, low-pass filters (e.g., wavelet transform) should be used for background estimation [2].

      (1) Hoogendoorn, Crosby, Leyton-Puig, Breedijk, Jalink, Gadella, and Postma (2014). The fidelity of stochastic single-molecule super-resolution reconstructions critically depends upon robust background estimation. Scientific reports, 4, 3854. https://doi.org/10.1038/srep03854

      (2) Patel, Williamson, Owen, and Cohen (2021). Blinking statistics and molecular counting in direct stochastic reconstruction microscopy (dSTORM). Bioinformatics, Volume 37, Issue 17, September 2021, Pages 2730–2737, https://doi.org/10.1093/bioinformatics/btab136

      (2) Since many of the findings in this work come from the evaluation of localisation clusters, an image showing actual localisations would help support the main conclusions. I believe that the dSTORM images in Figures 1 and 2 are density maps, although this was not explicitly stated. Alexa 568 and Alexa 647 typically give a very different number of localisations, and this is also dependent on the concentration of BME. Did the authors take that into account when interpreting the results and creating the model in Figures 2 and 8?

      I believe that including this information is important as findings in this paper heavily rely on the number of localisations detected under different conditions.

      Including information on proximity labelling and CRISPR/Cas9 in the methods section would help with the reproducibility of these findings by other groups.

      Figures 1 and 2 show Gaussian interpolations of actual localizations, not density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting. In other words, we show reconstructed images based on identifying true localizations using gaussian fitting and some strict parameters to identify true fluorophore blinking. This allowed us to identify true localizations with high confidence and generate a high-resolution image for membrane receptors.

      Indeed, Alexa 568 and 647 give different numbers of localization. This is dependent on the intrinsic photo-physics of the fluorophores. Specifically, each fluorophore has a different duty cycle, switching cycle, and survival fraction. However, we note that we focused on capturing the relative changes in receptor numbers over time, before and after stimulation by ligands, not the absolute numbers of surface GHR and PRLR. We are not comparing the absolute numbers of localizations or drawing comparisons for localization numbers between 568 and 647. For all these different conditions/times, the photo-physics for a particular fluorophore remains the same. This allows us to make relative comparisons.

      As far as the effect of BME is concerned, the concentration of mercaptoethanol needs to be carefully optimized, as too high a concentration can potentially quench the fluorescence or affect the overall stability of the sample. However, we are using an optimized concentration which has been previously validated across multiple STORM experiments. This makes the concerns relating to the concentration of BME irrelevant to the current experimental design. Besides, the concentration of BME is maintained across all experimental conditions.

      We have added information regarding PL and CRISPR/Cas9 for generating hGHR KO and hPRLR KO cells in two new subsections to the Methods section.

      Reviewer #2 (Recommendations for the authors):

      In the methods please include:<br /> (1) A section with details on proximity ligation assays.

      We have added a description of the PL method to the Methods section.

      (2) A section on CRISPR/Cas9 technology.

      We have added two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR  or hPRLR knockout” to the Methods section.

      (3) List the precise composition of the buffer or cite the paper that you followed.

      We used the buffer recipe described in this protocol [1] and have added the components with concentrations as well as the following reference to the manuscript.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      (4) Exposure time used for image acquisition to put 40 000 frames in the context of total imaging time and clarify why you decided to take 40 000 images per channel.

      Our Nikon Ti2 N-STORM microscope is equipped with an iXon DU-897 Ultra EMCCD camera from Andor (Oxford Instruments). According to the camera’s manufacturer, this camera platform uses a back-illuminated 512 x 512 frame transfer sensor and overclocks readout to 17 MHz, pushing speed performance to 56 fps (in full frame mode). We note that we always tried to acquire STORM images at the maximal frame rate. As for the exposure time, according to the manufacturer it can be as short as 17.8 ms. We would like to emphasize that we did not specify/alter the exposure time.

      See also: https://andor.oxinst.com/assets/uploads/products/andor/documents/andor-ixon-ultra-emccd-specifications.pdf

      The decision to take 40,000 images per frame was based on our intention to identify the true population of the molecules of interest that are localized and accurately represented in the final reconstruction image. The total number of frames depends on the sample complexity, density of sample labeling and desired resolution. We tested a range of frames between 20,000 and 60,000 and found for our experimental design and output requirements that 40,000 frames provided the best balance between achieving maximal resolution and desired localizations to make consistent and accurate localization estimates across different stimulation conditions compared to basal controls.

      (5) The lasers used to switch Alexa 568 and Alexa 647. Were you alternating between the lasers for switching and imaging of dyes? Intermittent and continuous illumination will produce very different unspecific background fluorescence.

      Yes, we used an alternating approach for the lasers exciting Alexa 647 and Alexa 568, for both switching and imaging of the dyes.

      (6) A paragraph with a detailed description of methods used to differentiate the background fluorescence from the signal.

      We have addressed the background fluorescence under Point 1 (Public Review). We have added a paragraph in the Methods section on this issue.

      (7) Minor corrections to the text:

      It appears as though there is a large difference in the expression level of GHR and PRLR in basal conditions in Figure 1. This can be due to the switching properties of the dyes, which is related to the amount of BME in the buffer, or it can be because there is indeed more PRL. Would the authors be able to comment on this?

      We thank the reviewer for this suggestions. According to expression data available online there is indeed more PRLR than GHR in T47D cells. According to CellMiner [1], T47D cells have an RNA-Seq gene expression level log2(FPKM + 1) of 6.814 for PRLR, and 3.587 for GHR, strongly suggesting that there is more PRLR than GHR in basal conditions, matching the reviewer’s interpretation of our images in Fig. 1 (basal). However, we would advise against using STORM images for direct comparisons of receptor expression. First, with TIRF images, we are only looking at the membrane fraction (~150 nm close to the coverslip membrane interface) that is attached to the coverslip. Secondly, as discussed above, our data represent relative cell surface receptor levels that allow for comparison of different conditions (basal vs. stimulation) and does not represent absolute quantifications. Everything is relative and in comparison to controls.

      Also, BME is not going to change the level of expression. The differences in growth factor expression as estimated by relative comparison can be attributed to the actual changes in growth factors and is not an artifact of the amount of BME in the buffer or the properties of dyes. These factors are maintained across all experimental conditions and do not influence the final outcome.

      (1) https://discover.nci.nih.gov/cellminer/

      (8) I would encourage the authors to use unspecific binding to characterize the signal coming from single antibodies bound to the substrate. This would provide a mean number of localizations that a single antibody generates. With this information, one can evaluate how many receptors there are per cluster, which would strengthen the findings and potentially provide additional support for the model presented in Figure 8. It would also explain why the distributions of localisations per cluster in Fig. 3B look very different for hGHR and hPRLR. As the authors point out in the discussion, the results on predimerization of these receptors in basal conditions are conflicting and therefore it is important to shed more light on this topic.

      We thank the reviewer for this suggestions. While we are unable to perform this experiment at this stage, we will keep it in mind for future experiments.

      (9) Minor corrections to the figures:

      Figure 1:

      In the legend, please say what representation was used. Are these density maps or another representation? Please provide examples of actual localisations (either as dots or crosses representing the peaks of the Gaussians). Most findings of this work rely on the characterisation of the clusters of localisations and therefore it is of essence to show what the clusters look like. This could potentially go to the supplemental info to minimise additional work. It's very hard to see the puncta in this figure.

      If the authors created zoomed regions in each of the images (as in Figure 3), it would be much easier to evaluate the expression level and the extent of colocalisation. Halfway through GHR 3 min green pixels become grey, but this may be the issue with the document that was created. Please check. Either increase the font on the scale bars in this figure or delete it.

      As described above, Figure 1 does not show density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting and smoothing.

      We have generated zoomed regions. In our files (original as well as pdf) we do not see pixels become grey. We increased the font size above one of the scale bars and removed all others.

      Figure 3:

      In A, the GHR clusters are colour coded but PRLR are not. Are both DBSCN images? Explain the meaning of colour coding or show it as black and white. Was brightness also increased in the PRLR image? The font on the scale bars is too small. In B, right panels, the font on the axes is too small. In the figure legend explain the meaning of 33.3 and 16.7

      In our document, both GHR and PRLR are color coded but the hGHR clusters are certainly bigger and therefore appear brighter than the hPRLR clusters. Both are DBSCAN images. The color coding allows to distinguish different clusters (there is no other meaning). We have kept the color-coding but have added a sentence to the caption addressing this. Brightness was increased in both images of Panel B equally. 33.3 and 16.7 are the median cluster sizes. We have added a sentence to the caption explaining this. We have increased the font on the axes in B (right panels).

      Figure 4:

      I struggled to see any colocalization in the 2nd and the 3rd image. Please show zoomed-in sections. In the panels B and C, the data are presented as fractions. Is this per cell? My interpretation is that ~80% of PRL clusters also contain GHR.

      Is this in agreement with Figures 1 and 2? In Figure 1, PRL 3 min, Merge, colocalization seems much smaller. Could the authors give the total numbers of GHR and PRLR from which the fractions were calculated at least in basal conditions?

      We have provided zoom-in views. As for panels B and C, fractions are number of clusters containing both receptors divided by the total number of clusters. We used the same strategy that we had used for calculating the localization changes: We randomly selected 4 ROIs (regions of interest) per cell to calculate fractions and then calculated the average of three different cells from independently repeated experiments. We did not calculate total numbers of GHR/PRLR. The numbers are fractions of cluster numbers.

      Moreover, the reviewer interprets results in panels B and C that ~80% of PRLR clusters also contain GHR. We assume the reviewer refers to Basal state. Now, the reviewer’s interpretation is not correct for the following reason: ~80% of clusters have both receptors. How many of the remaining (~20%) clusters have only PRLR or only GHR is not revealed in the panels. Only if 100% of clusters have PRLR, we can conclude that 80% of PRLR clusters also contain GHR.

      Also, while Figures 1 and 2 show localization based on dSTORM images, Figure 3 indicates and quantifies co-localization based on proximity ligation assays following DBSCAN analysis using Clus-DoC. We do not think that the results are directly comparable.

      Reviewer #3 (Public Review):

      (1) The manuscript suffers from a lack of detail, which in places makes it difficult to evaluate the data and would make it very difficult for the results to be replicated by others. In addition, the manuscript would very much benefit from a full discussion of the limitations of the study. For example, the manuscript is written as if there is only one form of the PRLR while the anti-PRLR antibody used for dSTORM would also recognize the intermediate form and short forms 1a and 1b on the T47D cells. Given the very different roles of these other PRLR forms in breast cancer (Dufau, Vonderhaar, Clevenger, Walker and other labs), this limitation should at the very least be discussed. Similarly, the manuscript is written as if Jak2 essentially only signals through STAT5 but Jak2 is involved in multiple other signaling pathways from the multiple PRLRs, including the long form. Also, while there are papers suggesting that PRL can be protective in breast cancer, the majority of publications in this area find that PRL promotes breast cancer. How then would the authors interpret the effect of PRL on GHR in light of all those non-protective results? [Check papers by Hallgeir Rui]

      We thank the reviewer for such thoughtful comments. We have added a paragraph in the Discussion section on the limitations of our study, including sole focus on T47D and γ2A-JAK2 cells and lack of PRLR isoform-specific data. Also, we are now mentioning that these isoforms play different roles in breast cancer, citing papers by Dufau, Vonderhaar, Clevenger, and Walker labs.

      We did not mean to imply that JAK2 signals only via STAT5 or by only binding the long form. We have made this point clear in the Introduction as well as in our revised Discussion section. Moreover, we have added information and references on JAK2 signaling and PRLR isoform specific signaling.

      In our Discussions section we are also mentioning the findings that PRL is promoting breast cancer. We would like to point out that it is well perceivable that PRL is protective in BC by reducing surface hGHR availability but that this effect may depend on JAK2 levels as well as on expression levels of other kinases that competitively bind Box1 and/or Box2 [1]. Besides, could it not be that PRL’s effect is BC stage dependent? In any case, we have emphasized the speculative nature of our statement.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      Reviewer #3 (Recommendations for the authors):

      Points for improvement of the manuscript:

      (1) Method details -

      a) "we utilized CRISPR/Cas9 to generate hPRLR knockout T47D cells ......" Exactly how? Nothing is said under methods. Can we be sure that you knocked out the whole gene?

      We have addressed this point by adding two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR or hPRLR knockout” to the Methods section.

      b) Some of the Western blots are missing mol wt markers. How specific are the various antibodies used for Westerns? For example, the previous publications are quoted as providing characterization of the antibodies also seem to use just band cutouts and do not show the full molecular weight range of whole cell extracts blotted. Anti-PRLR antibodies are notoriously bad and so this is important.

      There is an antibody referred to in Figure 5 that is not listed under "antibodies" in the methods.

      We have modified Figure 5a, showing the entire gel as well as molecular weight markers. As for specificity of our antibodies, we used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48, which have been previously tested and used. In addition, we did our own control experiments to ensure specificity. We have added some of our many control results as Supplementary Figures S2 and S3.

      We thank the reviewer for noticing the missing antibody in the Methods section. We have now added information about this antibody.

      c) There is no description of the proximity ligation assay.

      We have addressed this by adding a paragraph on PLA in the Methods section.

      d) What is the level of expression of GHR, PRLR, and Jak2 in the gamma2A-JAK2 cells compared to the T47D cells? Artifacts of overexpression are always a worry.

      γ2A-JAK2 cell series are over-expressing the receptors. That’s the reason we did not only rely on the observation in γ2A-JAK2 cell lines but also did the experiment in T47D cell lines.

      e) There are no concentrations given for components of the dSTORM imaging buffer. On line 380, I think the authors mean alternating lasers not alternatively.

      Thank you. Indeed, we meant alternating lasers. We are referring to [1] (the protocol we followed) for information on the imaging buffer.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      f) In general, a read-through to determine whether there is enough detail for others to replicate is required. 4% PFA in what? Do you mean PBS or should it be Dulbecco's PBS etc., etc.?

      We prepared a 4% PFA in PBS solution. We mean Dulbecco's PBS.

      (2) There are no controls shown or described for the dSTORM. For example, non-specific primary antibody and second antibodies alone for non-specific sticking. Do the second antibodies cross-react with the other primary antibody? Is there only one band when blotting whole cell extracts with the GHR antibody so we can be sure of specificity?

      We used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48 (but also tested several other antibodies). While these antibodies have been previously tested and used, we performed additional control experiments to ensure specificity of our primary antibodies and absence of non-specific binding of our secondary antibodies. We have added some of our many control results as Supplementary Figures S2 and S3.

      (3) Writing/figures-

      a) As discussed in the public review regarding different forms of the PRLR and the presence of other Jak2-dependent signaling

      We have added paragraphs on PRLR isoforms and other JAK2-dependent signaling pathways to the Introduction. Also, we have added a paragraph on PRLR isoforms (in the context of our findings) to the Discussion section.

      b) What are the units for figure 3c and d?

      The figures show numbers of localizations (obtained from fluorophore blinking events). In the figure caption to 3C and 3D, we have specified the unit (i.e. counts).

      c) The wheat germ agglutinin stains more than the plasma membrane and so this sentence needs some adjustment.

      We thank the reviewer for this comment. We have rephrased this sentence (see caption to Fig. 4).

      d) It might be better not to use the term "downregulation" since this is usually associated with expression and not internalization.

      While we understand the reviewer’s discomfort with the use of the word “downregulation”, we still think that it best describes the observed effect. Moreover, we would like to note that in the field of receptorology “downregulation” is a specific term for trafficking of cell surface receptors in response to ligands. That said, to address the reviewer’s comment, we are now using the terms “cell surface downregulation” or “downregulation of cell surface [..] receptor” throughout the manuscript in order to explicitly distinguish it from gene downregulation.

      e) Line 420 talks about "previous work", a term that usually indicates work from the same lab. My apologies if I am wrong, but the reference doesn't seem to be associated with the authors.

      At the end of the sentence containing the phrase “previous work”, we are referring to reference [57], which has Dr. Stuart Frank as senior and corresponding author. Dr. Frank is also a co-corresponding author on this manuscript. While in our opinion, “previous work” does not imply some sort of ownership, we are happy to confirm that one of us was responsible for the work we are referencing.

      Reviewing Editor's recommendations:

      The reviewers have all provided a very constructive assessment of the work and offered many useful suggestions to improve the manuscript. I'd advise thinking carefully about how many of these can be reasonably addressed. Most will not require further experiments. I consider it essential to improve the methods to ensure others could repeat the work. This includes adding methods for the PLA and including detail about the controls for the dSTORM. The reviewers have offered suggestions about types of controls to include if these have not already been done.

      We thank the editor for their recommendations. We have revised the methods section, which now includes a paragraph on PLA as well as on CRISPR/Cas9-based generation of mutant cell lines. We have also added information on the dSTORM buffer to the manuscript. Data of controls indicating antibody specificity (using confocal microscopy) have been added to the manuscript’s supplementary material (see Fig. S2 and S3).

      I agree with the reviewers that the different isoforms of the prolactin receptor need to be considered. I think this could be done as an acknowledgment and point of discussion.

      We have revised the discussions section and have added a paragraph on the different PRLR isoforms, among others.

      For Figure 2E, make it clear in the figure (or at least in legend) that the middle line is the basal condition.

      We thank the editor for their comment. We have made changes to Fig 2E and have added a sentence to the legend making it clear that the middle depicts the basal condition.

      My biggest concern overall was the fact that this is all largely conducted in a single cell line. This was echoed by at least one of the reviewers. I wonder if you have replicated this in other breast cancer cell lines or mammary epithelial cells? I don't think this is necessary for the current manuscript but would increase confidence if available.

      We thank the editor for their comment and fully agree with their assessment. Unfortunately, we have not replicated these experiments in other BC cell lines nor mammary epithelial cells but would certainly want to do so in the near future.

    1. eLife Assessment

      In their valuable study, Lee et al. explore a role for the Hippo signaling pathway, specifically wts-1/LATS and the downstream regulator yap, in age-dependent neurodegeneration and microtubule dynamics using C. elegans mechanosensory neurons as a model. The authors demonstrate that disruption of wts-1/LATS leads to age-associated morphological and functional neuronal abnormalities, linked to enhanced microtubule stabilization, and show a genetic connection between yap and microtubule stability. Overall, the study employs robust genetic and molecular approaches to reveal a convincing link between the Hippo pathway, microtubule dynamics, and neurodegeneration.

    2. Joint Public Review:

      The Lee et al. study has been revised in response to reviewer comments. It presents a valuable investigation into the role of the Hippo signaling pathway (specifically wts-1/LATS and yap) in age-dependent neurodegeneration and microtubule dynamics in C. elegans TRNs. The authors convincingly demonstrated that disruption of wts-1/LATS leads to age-associated neuronal abnormalities and enhanced microtubule stabilization, with a genetic link to yap. While the study was praised for its well-conducted and well-controlled approaches, reviewers raised concerns about the specificity of the Hippo pathway's effects to TRNs, the correlation of Hpo signaling decline in TRNs with age, and the mechanistic link between Hpo-mediated gene expression and microtubule regulation. The authors addressed the TRN specificity by suggesting the unique microtubule structure of these neurons might contribute to their susceptibility. They acknowledged the difficulty in detecting Hpo signaling decline specifically in aged TRNs but noted increased YAP-1 nuclear localization in other tissues. Importantly, the authors provided evidence suggesting that YAP-TEAD-mediated transcriptional regulation is responsible for neuronal degeneration, as loss of yap-1 or egl-44 restored the wts-1 mutant phenotype. However, the specific transcriptional targets of YAP-1 regulating microtubule stability remain unidentified, representing a key limitation. The authors also discussed the possibility of non-cell-autonomous effects of YAP-1 and offered explanations for the seemingly moderate impairment of the touch response despite structural damage. Finally, they attributed the shorter lifespan of wts-1 and wts-1; yap-1 mutants to roles of wts-1 beyond TRNs and potential synergistic effects of yap-1. Overall, the study provides significant insights into the Hippo pathway's role in neuronal aging and microtubule dynamics, while acknowledging remaining mechanistic gaps.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of microtubule dynamics and its effects on neuronal aging. Using C. elegans as a model, the authors investigate the role of evolutionarily conserved Hippo pathway in microtubule dynamics of touch receptor neurons (TRNs) in an age-dependent manner. Using genetic, molecular, behavioral, and pharmacological approaches, the authors show that age-dependent loss of microtubule dynamics might underlie structural and functional aging of TRNs. Further, the authors show that the Hippo pathway specifically functions in these neurons to regulate microtubule dynamics. Specifically, authors show that hyperactivation of YAP-1, a downstream component of the Hippo pathway that is usually inhibited by the kinase activity of the upstream components of the pathway, results in microtubule stabilization and that might underlie the structural and functional decline of TRNs with age. However, how the Hippo pathway regulates microtubule dynamics and neuronal aging was not investigated by the authors.

      Strengths:

      This is a well-conducted and well-controlled study, and the authors have used multiple approaches to address different questions.

      Weaknesses:

      There are no major weaknesses identified, except that the effect of the Hippo pathway seems to be specific to only a subset of neurons. I would like the authors to address the specificity of the effect of the Hippo pathway in TRNs, in their resubmission.

      Although our genetic experiments, including TRNs-specific rescue/overexpression of YAP-1 and knockdown of WTS-1, strongly suggest that a cell-autonomous function of WTS-1-YAP-1 axis in TRNs, the Hpo pathway could have broader roles in neuroprotection. While this pathway may regulate microtubules stability in multiple neurons, other characteristics of TRNs, such as their anatomical localization near the cuticle or their long projections along body axis, could contribute to their susceptibilities to age-related deformation. Otherwise, the Hpo pathway may be truly TRNs-specific. TRNs have unique microtubules in both terms of composition and structure. Among nine α-, six β-tubulin genes in C. elegans, one α-tubulin (mec-12) and one β-tubulin (mec-7) showed highly enriched expression in TRNs [1, 2] and TRNs contain special 15-protofilament microtubule structure, while all other neurons in C. elegans have 11-protofilament microtubules [3]. Transcriptional regulation through YAP-1 may affect the specific microtubule structure of TRNs, leading to premature neuronal deformation. We have included this in the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons.

      Strengths:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons. Strong pharmacological and especially genetic manipulations of MT-stabilizing or severing proteins show a strong genetic link between yap and regulation of MTs stability. The study is strong and uses robust approaches, especially strong genetics. The demonstrations on the aging-related roles of the Hpo signaling pathway, and the link to MTs, are novel and compelling. Nevertheless, the study also has mechanistic weaknesses (see below).

      Weaknesses:

      Specific comments:

      (1) The study demonstrates age-specific roles of the Hpo pathway, specifically of wts-1/LATS and yap, specifically in TRN mechanosensory neurons, without observing developmental defects in these neurons, or effects in other neurons. This is a strong demonstration. Nevertheless, the study does not address whether there is a correlation of Hpo signaling pathway activity decline specifically in these neurons, and not other neurons, and at the observed L4 stage and onwards (including the first day of adulthood, 1DA stage). Such demonstrations of spatio-temporal regulation of the Hpo signaling pathway and its activation seem important for linking the Hpo pathway with the observed age-related neurodegeneration. Can this age-related response be correlated to indeed a decline in Hpo signaling during adulthood? Especially at L4 and onwards? It will be informative to measure this by examining the decline in wts1 as well as yap levels and yap nuclear localization.

      As described above, we have included possible explanations for the specificity of the Hpo pathway in TRNs. Since components of the Hpo pathway are expressed in various tissues, including the intestine and hypodermis, this pathway could have broader neuroprotective roles across multiple neurons. Alternatively, it could function in TRNs. Given that the TRNs possess unique microtubules in both structure and composition, and that Hpo pathway has crucial roles in microtubule stability regulation, the roles of the Hpo pathway may indeed be TRNs-specific. As we described in the manuscript, our observations, along with those of others, indicate that neuronal deformation of TRNs begins around the 4th day of adulthood. Additionally, the degree of morphological deformation in wts-1 mutants at the L4 stage is comparable to that of aged wild-type worms on the 15th day of adulthood. Therefore, to assess the functional decline of WTS-1 or nuclear localization of YAP-1, observations should begin in 4-day-old animals. Using fluorescence-tagged YAP-1 under the mec-4 promoter, we couldn’t detect a significant increase in nuclear YAP-1 in TRNs of 4-day-old adult. Additionally, we were unable to assess YAP-1 intercellular localization in older animals, such as 10-day-old animals, possibly due to the small cell size of neurons or morphological alteration along with aging of TRNs. Although we did not detect functional decline of WTS-1 or increased nuclear YAP-1 in TRNs, nuclear localization of YAP-1 increases with age in other tissues, such as the intestine and hypodermis (Author response image. 1). This may result from inactivation of the Hippo (Hpo) pathway, an indirect consequence of structural and functional decline—such as tissue stiffness associated with aging—or a combination of both. Additionally, given that morphological deformation of TRNs appears to begin around fourth day of adulthood, nuclear localization of YAP-1 in the intestine and hypodermis seems to have a later onset and be more moderate. It is possible that YAP-1 nuclear localization in TRNs occurs earlier or that other factors contribute early-stage touch neuronal deformation.

      Author response image 1.

      Quantification of the proportion of worms exhibiting nuclear localization of YAP-1. We used GFP-tagged YAP-1 driven by its own 4 kb promoter. A total of 90 animals were observed each day.

      (2) The Hpo pathway eventually activates gene expression via yap. Although the study uses robust genetic manipulations of yap and wts-1/LATS, it is not clear whether the observed effects are attributed to yap-mediated regulation of gene expression (see 3).

      Given that the neuronal deformation in the wts-1 mutant was completely restored by the loss of yap-1 or egl-44, it strongly suggests that YAP-TEAD-mediated transcriptional regulation is responsible for the premature neuronal degeneration of the wts-1 mutant. However, in this study, we were unable to identify specific transcriptional target genes associated with these phenomena, which represents a limitation of our research (please see below).

      (3) The observations on the abnormal MT stabilization, and the subsequent genetic examinations of MT-stability/severing genes, are a significant strength of the study. Nevertheless, despite the strong genetic links to yap and wts-1/LATS, it is not clear whether MT-regulatory genes are regulated by transcription downstream of the Hpo pathway, thus not enabling a strong causal link between MT regulation and Hpo-mediated gene expression, making this strong part of the study mechanistically circumstantial. Specifically, it will be good to examine whether the genes addressed herein, for example, Spastin, are transcriptionally regulated downstream of the Hpo pathway. This comment is augmented by the finding that in the wts-1/ yap-1 double mutants, MT abnormality, and subsequent neuronal morphology and touch responses are restored, clearly indicating that there is an associated transcriptional regulation

      If the target genes of YAP-1 are not identified, it will be difficult to fully understand how YAP-1 regulates microtubule stability. Microtubule-stabilizing genes, whose knockdown alleviates wts-1 mutant neuronal deformation, could be potential transcriptional targets of YAP-1. Among these genes, PTRN-1 and DLK-1 contain MCAT sequences (CATTCCA/T), a well-conserved DNA motif recognized by the TEAD transcription factor, in their promoters near the transcription start site (TSS). We hypothesized that the expression of fluorescence-tagged reporters of promoter regions containing these MCAT sequences would be enhanced in the absence of wts-1 activity. Although both reporters were expressed in TRNs, they did not show significant changes in the wts-1 mutant background. We also focused on spv-1, a worm homolog of ARHGAP29, which negatively regulates RhoA. YAP is known to modulate actin cytoskeleton rigidity through transcriptional regulation of ARHGAP29 [4]. The promoter of spv-1 contains 2 MCAT sequences and loss of spv-1 mitigated neuronal deformation of the wts-1 mutant. However, reporters of promoter regions containing MCAT sequences only weakly expressed in the process of TRNs. More importantly, ectopic expression of dominant-negative form of rho-1/rhoA did not lead to significant deformation of TRNs. While YAP typically functions as a transcriptional co-activator, it has also been reported to repress target gene expression, such as DDIT4 and Trail, in collaborated with TEAD transcriptional factor [5].  As a reviewer pointed out, spas-1 might be transcriptionally repressed by yap-1, given that its loss leads to premature deformation of TRNs. However, since the phenotype of the spas-1 mutant has a later onset than the wts-1 mutant and is relatively restricted to ALM, we excluded it from our candidate gene search. Despite extensive genetic approaches, we were unable to establish a strong causal link between YAP-1 and the regulation of microtubule stability. Unbiased screenings, such as tissue-specific transcriptome analysis, may help address the remaining questions. We have outlined the limitations of this study in the discussion section of the revised manuscript.

      Other comments:

      (1) The TRN-specific knockdown of wts-1 and yap-1 is a clear strength. Nevertheless, these do not necessarily show cell-autonomous effects, as the yap transcription factor may regulate the expression of external cues, secreted or otherwise, thus generating non-cell autonomous effects. For example, it is known that yap regulates TGF-beat expression and signaling.

      In the absence of LATS1/2 activity, activated YAP has been reported to drive biliary epithelial cell lineage specification by directly regulating TGF-β transcription during and after liver development [6]. Even when functioning in an autocrine manner, TGF-β can exhibit non-cell autonomous effects. While it primarily acts on the same cell that secretes it, some molecules may also affect neighboring cells, leading to paracrine effects. Additionally, TGF-β can modify the extracellular matrix (ECM), indirectly affecting surrounding cells. Similarly, if YAP regulates transcription of secretory protein in TRNs, the resulting extracellular factors or surrounding cells may influence touch neuronal microtubules in a non-cell-autonomous manner. Although our genetic data strongly suggest a cell-autonomous function of WTS-1-YAP-1 in TRNs, we could not exclude the possibility that YAP-1 functions non-cell-autonomously, as we were unable to identify its transcriptional targets. We have included this in the discussion section of the revised manuscript.

      (2) Continuing from comment (3) above, it seems that many of the MT-regulators chosen here for genetic examinations were chosen based on demonstrated roles in neurodegeneration in other studies. It would be good to show whether these MT-associated genes are directly regulated by transcription by the Hpo pathway.

      As we described above, several MT-associated genes­­, such as ptrn-1, dlk-1 and spv-1, contain MCAT sequences in their promoter and their knockdown alleviated wts-1-induced neuronal deformation. These genes were tested to determine whether they were directly regulated by WTS-1-YAP-1. Based on our findings, we concluded that they were unlikely to be regulated by the Hpo pathway in TRNs.

      (3) The impairment of the touch response may not be robust: it is only a 30-40% reduction at L4, and even less reduction at 1DA. It would be good to offer possible explanations for this finding.

      As pointed out by the reviewer, the impairment of touch responses of wts-1 mutants showed an approximately 33% reduction at both L4 and 1DA compared to age-matched wild-type animals. At the L4 stage, control worms responded to nearly every gentle touch (94%), whereas wts-1 mutants responded to only 60% of stimuli. By 1DA, control worms exhibited slightly decline in touch responses compared to L4 (82.5%), whereas wts-1 mutants displayed more pronounced impairment (55.7%) (Fig 1E). Regarding the severity and frequency of structural degeneration of wts-1 mutant at both stages, it appears to be relatively moderate. As we noted in the manuscript, our observations, along with those of others, indicate that structural abnormalities in ALM and PLM neurons begin to appear around the fourth day of adulthood and progressively worsen as the worms age [7]. In a previous study, Tank et al. categorized day 10-aged worms into two groups based on their movement ability and then assessed structural deformation in each animal to determine whether structural and functional degeneration of TRNs were correlated. In this same group of animals, they examined the gentle touch response and found that animals responded to gentle touch 46 ± 5.1 %, 84 ± 12.2 %, respectively [8]. It could be said that, on average, day 10 animals had 65% touch response on average, which is consistent with our observation in day 10 animals (Fig. 5E, 56.3%). Given these observations, the function of TRNs of wts-1 mutant or aged animals appears to be preserved despite severe structure failures. The gentle touch response evokes an escape behavior in which animals quickly move away from the stimulus; thus proper touch responses are essential for avoiding predators and ensuring survival. It has been reported to be necessary for evading fungal predation, such as escaping from a constricting hyphal ring [9]. Given that the gentle touch response is crucial for survival, its function is likely well preserved despite structural abnormalities, such as age-related deformation.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) Why is the effect of the Hippo pathway on microtubule dynamics specific to TRNs? Is it the structure of TRNs that makes them prone to the effects of age-dependent decline in microtubule dynamics? The authors are advised to discuss it in their resubmission.

      As described above, we have included possible explanations for the tissue specificity of the Hpo pathway in TRNs and the vulnerability of TRNs to age-associated decline in the discussion section of the revised manuscript.

      (2) The authors are advised to explain the shorter life span of wts-1; yap-1 double mutants (with restored TRNs) compared to wts-1 single mutants in Figure 2F. The life span of yap-1 single mutants should be included in Figure 2F. Further, based on the data, the shorter lifespan of wts-1 mutants cannot be attributed to abnormal TRNs as the lifespan of wts-1; yap-1 double mutants is even shorter. The authors are advised to explain the shorter life span of wts-1 mutants compared to wild-type controls.

      wts-1 is known to be involved in various developmental processes, including the maintenance of apicobasal polarity in the intestine, growth rate control, and dauer formation [10-12]. Since WTS-1 activity is restored in the intestine of the mutant used for lifespan measurement, the shorter lifespan of the wts-1 mutant may result from the loss of WTS-1 in tissues other than the intestine. Although we were unable to include lifespan data for the yap-1 mutant, recent studies indicate that the yap-1(tm1416) mutant or yap-1 RNAi treated worms exhibit a shortened lifespan [13, 14]. Thus, our data showing a slightly shorter lifespan of the wts-1; yap-1 mutant compared with the wts-1 mutant may result from the synergistic action of yap-1 and yap-1-independent downstream factors of wts-1. While this study does not provide an explanation for the shortened lifespan of wts-1 or wts-1; yap-1 mutants, the fact that the wts-1; yap-1 double mutant with restored TRNs still have a shorter lifespan compared with the wts-1 mutant strongly suggests that premature deformation of the wts-1 neurons appear to be a touch neuron-specific event, rather than being associated with whole body, as described in the manuscript..

      Minor comments:

      (1) In the abstract, please provide definitions for LATS and YAP. Authors can mention that LATS is a kinase and YAP a transcriptional co-activator in the Hippo pathway.

      (2) In the last paragraph on page 9, change "these function" to "this function", and change "knock-downed" to "knocked down".

      (3) On page 10, paragraph 2, change "regarding the action mechanism" to "regarding the mechanism of action".

      (4) On page 11, paragraph 1, change "endogenous WTS-1 could inhibits" to "endogenous WTS-1 could inhibit".

      (5) On page 16, paragraph 1, change "consistent to the hypothesis" to "consistent with this hypothesis".

      (6) Overall, the paper is well written. However, there is still room to improve the language and diction used by the authors.

      We have revised all minor comments suggested by the reviewer in the revised manuscript.

      References

      (1) Hamelin M, Scott IM, Way JC, Culotti JG. The mec-7 beta-tubulin gene of Caenorhabditis elegans is expressed primarily in the touch receptor neurons. EMBO J. 1992;11(8):2885-93. Epub 1992/08/01. doi: 10.1002/j.1460-2075.1992.tb05357.x. PubMed PMID: 1639062; PubMed Central PMCID: PMCPMC556769.

      (2) Fukushige T, Siddiqui ZK, Chou M, Culotti JG, Gogonea CB, Siddiqui SS, et al. MEC-12, an alpha-tubulin required for touch sensitivity in C. elegans. J Cell Sci. 1999;112 ( Pt 3):395-403. Epub 1999/01/14. doi: 10.1242/jcs.112.3.395. PubMed PMID: 9885292.

      (3) Chalfie M, Thomson JN. Structural and functional diversity in the neuronal microtubules of Caenorhabditis elegans. J Cell Biol. 1982;93(1):15-23. Epub 1982/04/01. doi: 10.1083/jcb.93.1.15. PubMed PMID: 7068753; PubMed Central PMCID: PMCPMC2112106.

      (4) Qiao Y, Chen J, Lim YB, Finch-Edmondson ML, Seshachalam VP, Qin L, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell Rep. 2017;19(8):1495-502. Epub 2017/05/26. doi: 10.1016/j.celrep.2017.04.075. PubMed PMID: 28538170.

      (5) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional co-repressor function of the hippo pathway transducers YAP and TAZ. Cell Rep. 2015;11(2):270-82. Epub 2015/04/07. doi: 10.1016/j.celrep.2015.03.015. PubMed PMID: 25843714.

      (6) Lee DH, Park JO, Kim TS, Kim SK, Kim TH, Kim MC, et al. LATS-YAP/TAZ controls lineage specification by regulating TGFbeta signaling and Hnf4alpha expression during liver development. Nat Commun. 2016;7:11961. Epub 2016/07/01. doi: 10.1038/ncomms11961. PubMed PMID: 27358050; PubMed Central PMCID: PMCPMC4931324.

      (7) Toth ML, Melentijevic I, Shah L, Bhatia A, Lu K, Talwar A, et al. Neurite sprouting and synapse deterioration in the aging Caenorhabditis elegans nervous system. J Neurosci. 2012;32(26):8778-90. Epub 2012/06/30. doi: 10.1523/JNEUROSCI.1494-11.2012. PubMed PMID: 22745480; PubMed Central PMCID: PMCPMC3427745.

      (8) Tank EM, Rodgers KE, Kenyon C. Spontaneous age-related neurite branching in Caenorhabditis elegans. J Neurosci. 2011;31(25):9279-88. Epub 2011/06/24. doi: 10.1523/JNEUROSCI.6606-10.2011. PubMed PMID: 21697377; PubMed Central PMCID: PMCPMC3148144.

      (9) Maguire SM, Clark CM, Nunnari J, Pirri JK, Alkema MJ. The C. elegans touch response facilitates escape from predacious fungi. Curr Biol. 2011;21(15):1326-30. Epub 2011/08/02. doi: 10.1016/j.cub.2011.06.063. PubMed PMID: 21802299; PubMed Central PMCID: PMCPMC3266163.

      (10) Cai Q, Wang W, Gao Y, Yang Y, Zhu Z, Fan Q. Ce-wts-1 plays important roles in Caenorhabditis elegans development. FEBS Lett. 2009;583(19):3158-64. Epub 2009/09/10. doi: 10.1016/j.febslet.2009.09.002. PubMed PMID: 19737560.

      (11) Kang J, Shin D, Yu JR, Lee J. Lats kinase is involved in the intestinal apical membrane integrity in the nematode Caenorhabditis elegans. Development. 2009;136(16):2705-15. Epub 20090715. doi: 10.1242/dev.035485. PubMed PMID: 19605499.

      (12) Lee H, Kang J, Ahn S, Lee J. The Hippo Pathway Is Essential for Maintenance of Apicobasal Polarity in the Growing Intestine of Caenorhabditis elegans. Genetics. 2019;213(2):501-15. Epub 20190729. doi: 10.1534/genetics.119.302477. PubMed PMID: 31358532; PubMed Central PMCID: PMCPMC6781910.

      (13) Teuscher AC, Statzer C, Goyala A, Domenig SA, Schoen I, Hess M, et al. Longevity interventions modulate mechanotransduction and extracellular matrix homeostasis in C. elegans. Nat Commun. 2024;15(1):276. Epub 2024/01/05. doi: 10.1038/s41467-023-44409-2. PubMed PMID: 38177158; PubMed Central PMCID: PMCPMC10766642.

      (14) Saul N, Dhondt I, Kuokkanen M, Perola M, Verschuuren C, Wouters B, et al. Identification of healthspan-promoting genes in Caenorhabditis elegans based on a human GWAS study. Biogerontology. 2022;23(4):431-52. Epub 2022/06/25. doi: 10.1007/s10522-022-09969-8. PubMed PMID: 35748965; PubMed Central PMCID: PMCPMC9388463.

    1. eLife Assessment

      This important study aims to understand the function of ProSAP-interacting protein 1 (Prosapip1) in the brain. Using a conditional Prosapip1 KO mouse (floxed prosapip1 crossed with Syn1-Cre line), the authors performed analysis including protein biochemistry, synaptic physiology, and behavioral learning. Convincing evidence from this study supports a role of Prosapip 1 in synaptic protein composition, synaptic NMDA responses, LTP, and spatial memory.

    2. Reviewer #1 (Public review):

      Summary:

      Summary of what author's were trying to achieve: In the manuscript by Hoisington et al., the authors utilized a novel conditional neuronal prosap2-interacting protein 1 (Prosapip1) knockout mouse to delineate the effects of both neuronal and dorsal hippocampal (dHP)-specific knockout of Prosapip1 impacts biochemical and electrophysiological neuroadaptations within the dHP that may mediate behaviors associated with this brain region.

      Strengths:

      (1) Methodological Strengths

      a) The generation and use of a conditional neuronal knockout of Prosapip1 is a strength. These mice will be useful for anyone interested in studying or comparing and contrasting the effects of loss of Prosapip1 in different brain regions or in non-neuronal tissues.<br /> b) The use of biochemical, electrophysiological, and behavioral approaches are a strength. By providing data across multiple domains, a picture begins to emerge about the mechanistic role for Prosapip1. While questions still remain, the use of the 3 domains is a strength.<br /> c) The use of both global, constitutive neuronal loss of Prosapip1 and postnatal dHP-specific knockout of Prosapip1 help support and validate the behavioral conclusions.

      (2) Strengths of the results

      a) It is interesting that loss of Prosapip1 leads to specific alterations in the expression of GluN2B and PSD95 but not GluA1 or GluN2A in a post homogenization fraction that the author's term a "synaptic" fraction. Therefore, these results suggest protein-specific modulation of glutamatergic receptors within a "synaptic" fraction.<br /> b) The electrophysiological data demonstrate an NMDAR-dependent alteration in measures of hippocampal synaptic plasticity, including long-term potentiation (LTP) and NMDAR input/output. These data correspond with the biochemical data demonstrating a biochemical effect on GluN2B localization. Therefore, the conclusion that loss of Prosapip1 influences NMDAR function is well supported.<br /> c) The behavioral data suggest deficits in memory in particular novel object recognition and spatial memory, in the Prosapip1 knockout mice. These data are strongly bolstered by both the pan neuronal knockout and the dHP Cre transduction.

      The authors highlight potential future studies to further the understanding of Prosapip1.

    3. Reviewer #2 (Public review):

      The authors provide valuable findings characterizing a Prosapip1 conditional knockout mouse and the effects of knockout on hippocampal excitatory transmission, NMDAR transmission, and several learning behaviors. Furthermore, the authors selectively and conditionally knockout Prosapip1 in the dorsal hippocampus and show that it is required for the same spatial learning and memory assessed in the conditional knockout mice. The study uncovers how Prosapip1 is involved PSD organization and is a functional and critical player in dorsal Hippocampal LTP via its interaction with GluN2B subunits. The study is well controlled, detailed, and data in the paper match the conclusions.

      Comments on revisions:

      The authors have addressed all concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      The biochemical fractionation and use of the term "synaptic" were my biggest issues. I would recommend using a more targeted approach to measure the PSD or compare and contrast synaptic from extrasynaptic. For instance, PMID 16797717 does a PSD purification, whereas other papers have fractionated extrasynaptic from synaptic. Moreover, a PSD95 immunoprecipitation may be of interest as one question that could arise is since you see decreases in PSD95 GluN2B, but not 2A or GluA1, could the association of PSD95 with the different proteins be altered? To evaluate this, proteomics or some other unbiased methodology could enhance an understanding of the full panoply of changes induced by Prosapip1 within the dHP.

      The reviewer makes value points; however, this is a large endeavor, which we will address in future experiments.

      There seems to be a missed opportunity to really determine how Prosapip1 is influencing protein expression and/or phosphorylation at the PSD.

      There is no indication that Prosapip1 is linked to transcription or translation machinery; therefore, we don’t see the value of examining protein expression in this context. Phosphorylation is a broad term, and although this can be answered through phosphoproteomics, this is outside the scope of this study.

      At the very least, additional discussion within this realm would help the reader contextualize the biochemical data.

      Further studies are needed to determine the mechanism by which Prosapip1 controls the localization of PSD95, GlunN2B, and potentially others. It is plausible that posttranslational modifications are responsible for Prosapip1 function. For example, the Prosapip1 sequence contains a potential glycosylation site (Ser622), and several potential phosphorylation sites (https://glygen.org/protein/O60299#Glycosylation, https://www.phosphosite.org/proteinAction.action?id=18395&showAllSites=true#appletMsg). These posttranslational modifications can contribute to the stabilization of the synaptic localization of GluN2B and PSD95.

      We added to the discussion the paragraph above as well as the caveat that proteomic studies are needed for a comprehensive study of the role of Prosapip1 in the PSD.

      Weaknesses:

      (1) Methodological Weaknesses

      a. The synapsin-Cre mice may more broadly express Cre-recombinase than just in neuronal tissues. Specifically, according to Jackson Laboratories, there is a concern with these mice expressing Cre-recombinase germline. As the human protein atlas suggests that Prosapip1 protein is expressed extraneuronally, validation of neuron or at least brain-specific knockout would be helpful in interpreting the data. Having said that, the data demonstrating that the brain region-specific knockout has similar behavioral impacts helps alleviate this concern somewhat; however, there are no biochemical or electrophysiological readouts from these animals, and therefore an alternative mechanism in this adult knockout cannot be excluded.

      This is a valuable insight from the reviewer, especially considering the information from Jackson Laboratories. As mentioned in the paper, we exclusively used female Syn1-Cre carrying breeders to avoid germline recombination. Furthermore, we consistently assessed the prevalence of the Prosapip1 flox sites alongside the presence of Syn1-Cre with our regular litter genotyping, confirming the presence of Prosapip1. Additionally, Prosapip1 protein expression was directly examined in rats in Wendholdt et al., 2006, where this group reported that Prosapip1 is a brain-specific protein, minimizing the potential consequences of a peripheral loss of Prosapip1. In addition, to confirm that Prosapip1 is a brain-specific protein in mice, we performed a western blot analysis on the dorsal hippocampus, liver, and kidney of a C57BL/6 mouse (Author response image 1), and found that Prosapip1 protein is not found in these peripheral organs, aligning with the findings in rats reported by Wendholdt et al.

      Author response image 1. Prosapip1 protein in the dorsal hippocampus, liver, and kidney of C57BL/6 mice.

      b. The use of the word synaptic and the crude fractionation make some of the data difficult to interpret/contextualize. It is unclear how a single centrifugation that eliminates the staining of a nuclear protein can be considered a "synaptic" fraction. This is highlighted by the presence of GAPDH in this fraction which is a cytosolically-enriched protein. While GAPDH may be associated with some membranes it is not a synaptic protein. There is no quantification of GAPDH against total protein to validate that it is not enriched in this fraction over control. Moreover, it should not be used as a loading control in the synaptic fraction. There are multiple different ways to enrich membranes, extrasynaptic fractions, and PSDs and a better discussion on the caveats of the biochemical fractionation is a minimum to help contextualize the changes in PSD95 and GluN2B.

      We apologize for the confusion. As we described in the methods section, the crude synaptosome was isolated by several centrifugations as depicted in the figure which we are now including in the manuscript. As shown in Extended Figure 2, the P2 fraction does contain PSD-95 and synapsin, as well as GluN2B, GluN2A, and GluA1; however, it does not contain the transcription factor CREB, indicating the isolation of the crude synaptosomal fraction. As shown in the figure, a small amount of GAPDH is present in the crude synaptosomal fraction. The presence of GAPDH in the crude synaptosomal fraction has been previously reported in (Atsushi et al., 2003; Lee et al. 2016; Wang et al. 2012). As we have added to the discussion, there remains a caveat that we cannot differentiate the pre- and post-synaptic fraction, and as a result we do not know if Prosapip1 plays a role in the assembly of axonal proteins.

      c. Also, the word synaptosomal on page 7 is not correct. One issue is this is more than synaptosomes and another issue is synaptosomes are exclusively presynaptic terminals. The correct term to use is synaptoneurosome, which includes both pre and postsynaptic components. Moreover, as stated above, this may contain these components but is most likely not a pure or even enriched fraction.

      Since we cannot exclude the possibility that Prosapip1 is also expressed in glia, we do not believe that the term synaptoneurosome is accurate.

      d. The age at which the mice underwent injection of the Cre virus was not mentioned.

      We apologize for the oversight. As now noted in the methods, the mice used for experiments underwent surgery to infect neurons with the AAV-GFP or AAV-Cre viruses between 5 and 6 weeks of age to ensure full viral expression by the experimental window beginning at 8 weeks old.

      (2) Weaknesses of Results

      a. There were no measures of GluN1 or GluA2 in the biochemical assays. As GluN1 is the obligate subunit, how it is impacted by the loss of Prosapip1 may help contextualize the fact that GluN2B, but not GluN2A, is altered. Moreover, as GluA2 has different calcium permeance, alterations in it may be informative.

      Since we detect NMDAR current, which requires the obligatory subunit GluN1 and at least one GluN2 subunit (GluN2A, GluN2B, GluN2C, GluN2D), we did not see the rationale behind examining the level of GluN1 in the Prosapip1 knockout mice.

      b. While there was no difference in GluA1 expression in the "synaptic" fraction, it does not mean that AMPAR function is not impacted by the loss of Prosapip1. This is particularly important as Prosapip1 may interact with kinases or phosphatases or their targeting proteins. Therefore, measuring AMPAR function electrophysiologically or synaptic protein phosphorylation would be informative.

      We agree with the reviewer that the loss of Prosapip1 could potentially impact AMPAR function. To address this, we measured spontaneous excitatory postsynaptic currents (sEPSCs) in hippocampal pyramidal neurons from both Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice. Given that neurons were voltage-clamped at -70 mV and extracellular Mg<sup>2+</sup> was maintained at 1.3 mM, the sEPSCs we recorded were primarily mediated by AMPARs.

      We found no significant differences in either the frequency or amplitude of these AMPA-mediated sEPSCs between Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice, suggesting that AMPAR function in hippocampal pyramidal neurons is not noticeably affected by the loss of Prosapip1 (see Author response image 2 below).

      Author response image 2. Comparison of hippocampal sEPSCs between Prosapip1(flx/flx); Syn1-Cre(-) (Cre(-)) and Prosapip1(flx/flx);Syn1-Cre(+) (Cre(+)) mice. sEPSCs were recorded in the presence of 1.3 mM Mg²⁺ and 0.1 mM picrotoxin, with neurons clamped at -70 mV. (A) Sample sEPSC traces from Prosapip1(flx/flx); Syn1-Cre(-) (top) and Prosapip1(flx/flx); Syn1-Cre(+) (bottom) mice. (B, C) Bar graphs showing no significant differences in sEPSC frequency (B) or amplitude (C) between Prosapip1(flx/flx); Syn1-Cre(-)and Prosapip1(flx/flx); Syn1-Cre(+) mice. Statistical analysis was performed using an unpaired t-test; p > 0.05, n.s. (not significant). Data represent 11 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(-) mice (11/3) and 8 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(+) mice (8/3).

      c. There is a lack of mechanistic data on what specifically and how GluN2B and PSD95 expression is altered. This is due to some of the challenges with interpreting the biochemical fractionation and a lack of results regarding changes in protein posttranslational modifications.

      See response above.

      d. The loss of social novelty measures in both the global and dHP-specific Prosapip1 knockout mice were not very robust. As they were consistently lost in both approaches and as there were other consistent memory deficits, this does not impact the conclusions, but may be important to temper discussion to match these smaller deficits within this domain.

      There is a clear difference between the Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice as well as the AAV-GFP and AAV-Cre mice in the loss of social novelty metric. We have emphasized that the Prosapip1(flx/flx);Syn1-Cre(+) mice and AAV-Cre mice do not recognize social novelty, which is supported by the statistics.

      4E: Two-way ANOVA: Effect of Social Novelty F<sub>(1,20)</sub> = 17.60, p = 0.0002; Post hoc Familiar vs. Novel (Cre(-)) p = 0.0008, Familiar vs. Novel (Cre(+)) p = 0.1451.

      5I: Two-way ANOVA: Effect of Social Novelty F<sub>(1,31)</sub> = 9.777, p = 0.0038; Post hoc Familiar vs. Novel (AAV-GFP) p = 0.0303, Familiar vs. Novel (AAV-Cre) p = 0.1319.

      e. Alterations in presynaptic paired-pulse ratio measures are intriguing and may point to a role for Prosapip1 in synapse development, as discussed in the manuscript. It would be interesting to delineate if these PPR changes also occur in the adult knockout to help detail the specific Prosapip1-induced neuroadaptations that link to the alterations in novelty-induced behaviors.

      This interesting question will be addressed in future studies.

      Reviewer #2 (Recommendations for the authors):

      (1) The test statistics are required for each experiment for completeness. Currently, only p-values, tests used, and N are included.

      The entirety of the statistical information can be found in TYable 1, including test statistics and degrees of freedom (see Column 7, ‘Result’).

      (2) The authors claim that the function of Prosapip1 is not known in vivo, yet detail a study in the NAc where they investigated its function in vivo. The wording or discussion around what is and is not known should be altered to reflect this.

      The reviewer is correct to point to our previous manuscript (Laguesse et al. Neuron. 2017.) in which we found that Prosapip1 is important in mechanisms underlying alcohol-associated molecular, cellular and behavioral adaptations. However, these findings are specific to alcohol-related paradigms. Since the normal physiological role of Prosapip1 has never been delineated, this study was aimed to start addressing this gap in knowledge.

      References

      Wang, M., Li, S., Zhang, H. et al. Direct interaction between GluR2 and GAPDH regulates AMPAR-mediated excitotoxicity. Mol Brain 5, 13 (2012). https://doi.org/10.1186/1756-6606-5-13

      Atsushi Ikemoto, David G. Bole, Tetsufumi Ueda, Glycolysis and Glutamate Accumulation into Synaptic Vesicles: Role of Glyceraldehyde Phosphate Dehydrogenase and 3-Phosphoglycerate Kinase, Journal of Biological Chemistry, 8, 278 (2003). https://doi.org/10.1074/jbc.M211617200.

      Lee, F., Su, P., Xie, YF. et al. Disrupting GluA2-GAPDH Interaction Affects Axon and Dendrite Development. Sci Rep 6, 30458 (2016). https://doi.org/10.1038/srep30458

    1. eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements, and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      Weaknesses:

      A formal analysis and quantification of how head movement may have contributed to the results should be included in the paper or supplemental material. The type of correlated head movements coming from vigorous key presses aren't necessarily visible to the naked eye, and even if arms etc are restricted, this will not preclude shoulder, neck or head movement necessarily; if ICA was conducted, for example, the authors are in the position to show the components that relate to such movement; but eye-balling the data would not seem sufficient. The related issue of eye movements is addressed via classifier analysis. A formal analysis which directly accounts for finger/eye movements in the same analysis as the main result (ie any variance related to these factors) should be presented.

      This reviewer recommends inclusion of a formal analysis that the intra-vs inter parcels are indeed completely independent. For example, the authors state that the inter-parcel features reflect "lower spatially resolved whole-brain activity patterns or global brain dynamics". A formal quantitative demonstration that the signals indeed show "complete independence" (as claimed by the authors) and are orthogonal would be helpful

    3. Reviewer #2 (Public review):

      Summary:

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%). This part seems driven almost by pure engineering interest in gaining as high decoding accuracy as possible.<br /> In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked more scientific questions about how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

      Strengths:

      Each part has its own strength. For the first part, the use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. For the second part, the finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

      Weaknesses:

      Despite the strengths raised, the specific goal for each part of the current paper, i.e., achieving high decoding accuracy and answering the scientific question of early skill learning, seems not to harmonize with each other very well. In short, the current approach, which is solely optimized for achieving high decoding accuracy, does not provide enough support and interpretability for the paper's interesting scientific claim. This reminds me of the accuracy-explainability tradeoff in machine learning studies (e.g., Linardatos et al., 2020). More details follow.

      There are a number of different neural processes occurring before and after a key press, such as planning of upcoming movement and ahead around premotor/parietal cortices, motor command generation in primary motor cortex, sensory feedback related processes in sensory cortices, and performance monitoring/evaluation around the prefrontal area. Some of these may show learning-dependent change and others may not.

      Given the use of whole-brain MEG features with a wide time window (up to ~200 ms after each key press) under the situation of 3~4 Hz (i.e., 250~330 ms press interval) typing speed, these different processes in different brain regions could have contributed to the expression of the "contextualization," making it difficult to interpret what really contributed to the "contextualization" and whether it is learning related. Critically, the majority of data used for decoder training has the chance of such potential overlap of signal, as the typing speed almost reached a plateau already at the end of the 11th trial and stayed until the 36th trial. Thus, the decoder could have relied on such overlapping features related to the future presses. If that is the case, a gradual increase in "contextualization" (pattern separation) during earlier trials makes sense, simply because the temporal overlap of the MEG feature was insufficient for the earlier trials due to slower typing speed.

      Several direct ways to address the above concern, at the cost of decoding accuracy to some degree, would be either using the shorter temporal window for the MEG feature or training the model with the early learning period data only (trials 1 through 11) to see if the main results are unaffected would be some example.

    4. Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training, and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design (mainly the use of a single sequence) and which are described below, question the neurobiological implications proposed by the authors, and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      Specifically:<br /> The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence, and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least {plus minus}100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.<br /> During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution, and, of course, provides no evidence of absence. The authors also pointed out that such "mixing" would hamper the discriminability of the two ordinal positions of the index finger, given that "ordinal position 5" is systematically followed by "ordinal position 1". This is a valid point which, however, cannot rule out that "contextualization" nevertheless reflects the described "mixing".

      During the review process, the authors responded to my concern that training of a single sequence introduces the potential confound of "mixing" described above, which could have been avoided by training on several sequences, as in Kornysheva et al. (Neuron 2019), by arguing that Day 2 in their study did include control sequences. However, the authors' findings regarding these control sequences are fundamentally different from the findings in Kornysheva et al. (2019), and do not provide any indication of effector-independent ordinal information in the described contextualization - but, actually, the contrary. In Kornysehva et al. (Neuron 2019), ordinal, or positional, information refers purely to the rank of a movement in a sequence. In line with the idea of competitive queuing, Kornysheva et al. (2019) have shown that humans prepare for a motor sequence via a simultaneous representation of several of the upcoming movements, weighted by their rank in the sequence. Importantly, they could show that this gradient carries information that is largely devoid of information about the order of specific effectors involved in a sequence, or their timing, in line with competitive queuing. They showed this by training a classifier to discriminate between the five consecutive movements that constituted one specific sequence of finger movements (five classes: 1st, 2nd, 3rd, 4th, 5th movement in the sequence) and then testing whether that classifier could identify the rank (1st, 2nd, 3rd, etc) of movements in another sequence, in which the fingers moved in a different order, and with different timings. Importantly, this approach demonstrated that the graded representations observed during preparation were largely maintained after this cross-decoding, indicating that the sequence was represented via ordinal position information that was largely devoid of information about the specific effectors or timings involved in sequence execution. This result differs completely from the findings in the current manuscript. Dash et al. report a drop in detected ordinal position information (degree of contextualization in figure 5C) when testing for contextualization in their novel, untrained sequences on Day 2, indicating that context and ordinal information as defined in Dash et al. is not at all devoid of information about the specific effectors involved in a sequence. In this regard, a main concern in my public review, as well as the second reviewer's public review, is that Dash et al. cannot tell apart, by design, whether there is truly contextualization in the neural representation of a sequence (which they claim), or whether their results regarding "contextualization" are explained by what they call "mixing" in their author response, i.e., an overlap of representations of consecutive movements, as suggested as an alternative explanation by Reviewer 2 and myself.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - figure supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - figure supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time, and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physcial context should be controlled for).

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence, but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. During the review process, the authors reported a confusion matrix from a classification of asterisks position based on eye tracking data recorded during the task, and concluded that the classifier performed at chance level and gaze was, thus, apparently not biased by the visual stimulation. However, the confusion matrix showed a huge bias that was difficult to interpret (a very strong tendency to predict one of the five asterisk positions, despite chance-level performance). Without including additional information for this analysis (or simply the gaze position as a function of the number of astersisk on the screen) in the manuscript, this important control anaylsis cannot be properly assessed, and is not available to the public.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, this does not address the question whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - i.e., the question whether performance changes (micro-offline gains) are less pronounced across rest periods for which the change in "contextualization" is relatively low. The single-subject correlation between contextualization changes "during" rest and micro-offline gains (Figure 5 - figure supplement 4) addresses this question, however, the critical statistical test (are correlation coefficients significantly different from zero) is not included. Given the displayed distribution, it seems unlikely that correlation coefficients are significantly above zero.

      The authors follow the assumption that micro-offline gains reflect offline learning. However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

      Along these lines, the authors' claim, based on Bönstrup et al. 2020, that "retroactive interference immediately following practice periods reduces micro-offline learning", is not supported by that very reference. Citing Bönstrup et al. (2020), "Regarding early learning dynamics (trials 1-5), we found no differences in microscale learning parameters (micro-online/offline) or total early learning between both interference groups." That is, contrary to Dash et al.'s current claim, Bönstrup et al. (2020) did not find any retroactive interference effect on the specific behavioral readout (micro-offline gains) that the authors assume to reflect consolidation.

      The authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods (see, e.g., abstract). However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition). That is, the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes. This becomes strikingly clear in the recent Nature paper by Griffin et al. (2025), who computed micro-offline gains as the difference in average performance across the first five sequences in a practice period (a block, in their terminology) and the last five sequences in the previous practice period. Averaging across sequences in this way minimises the chance to detect online performance changes, and inflates changes in performance "offline". The problem that "offline" gains (or contextualization) is actually computed from data entirely generated online, and therefore subject to processes that occur online, is inherent in the very definition of micro-offline gains, whether, or not, they computed from averaged performance.

      A simple control analysis based on shuffled class labels could lend further support to the authors' complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance-level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). During the review process, the authors reported this analysis to the reviewers. Given that readers may consider following the presented decoding approach in their own work, it would have been important to include that control analysis in the manuscript to convince readers of its validity.

      Furthermore, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - it is unclear what the authors refer to when they talk about the sign of the "average source", line 477).

    1. eLife Assessment

      This work investigates the functional difference between the most commonly expressed form of PTH, and a mutant form of PTH, identified in a patient with chronic hypocalcemia and hyperphosphatemia which characterizes hypoparathyroidism. The authors investigate the hypothesis that this mutant PTH assumes a dimeric form in vivo and serves anabolic functions in the bone. The data are compelling and the translational aspects are fundamental in understanding PTH-1 Receptor activation.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Comments on revisions: No further recommendations for revisions. Acceptable as the paper stands.

      [Editors' note: the original reviews are here, https://doi.org/10.7554/eLife.97579.1.sa1]

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Recommendations for the authors:

      (1) In your response to the reviewers you included a figure. You said it was for the reviewers only. We are *not* including it here. Is that correct or should it be in the Public Reviews?

      We apologize for any confusion and appreciate your thorough review. The phrase “data only for reviewers” was intended to indicate that the content was included in the revision based on reviewers’ comments, not in the main text (article). However, we acknowledge that this phrasing may be inappropriate. We are agree to make the figure included in the previous author response of the public reviews. Accordingly, we propose to revise the previous author response as follows:

      - Remove "(data only for reviewers)".

      -  Correct the typo from "perosteal" to "periosteal".

      - “Thank you for your comment. First, we ensured that the bones sampled during the experiment showed no defects, and we carefully separated the femur bones from the mice to preserve their integrity. In the 3-point bending test, PTH treatment significantly increased the maximum load of the femur bone compared to the OVX-control group. Additionally, the maximum load in the PTH treatment group was significantly greater than that observed in the PTH dimer group. Furthermore, structural factors influencing bone strength, such as the periosteal perimeter and the endocortical bone perimeter, were also increased in the PTH treatment group compared to the PTH dimer group.”

      (2) Do you mean to always have R<sup>0</sup> (have a superscript) and RG (never have a superscript) or should they be shown in the same way throughout your paper?

      Thank you for your thorough review. Based on previous studies that addressed the conformation of PTH1R, R<sup>0</sup> is typically shown with a superscript, while RG is not (Hoare et al., 2001; Dean et al., 2006; Okazaki et al., 2008). We have followed this notation and will ensure consistency throughout our paper.

      Hoare, S. R., Gardella, T. J., & Usdin, T. B. (2001). Evaluating the signal transduction mechanism of the parathyroid hormone 1 receptor: effect of receptor-G-protein interaction on the ligand binding mechanism and receptor conformation. Journal of Biological Chemistry, 276(11), 7741-7753.

      Dean, T., Linglart, A., Mahon, M. J., Bastepe, M., Jüppner, H., Potts Jr, J. T., & Gardella, T. J. (2006). Mechanisms of ligand binding to the parathyroid hormone (PTH)/PTH-related protein receptor: selectivity of a modified PTH (1–15) radioligand for GαS-coupled receptor conformations. Molecular endocrinology, 20(4), 931-943.

      Okazaki, M., Ferrandon, S., Vilardaga, J. P., Bouxsein, M. L., Potts Jr, J. T., & Gardella, T. J. (2008). Prolonged signaling at the parathyroid hormone receptor by peptide ligands targeted to a specific receptor conformation. Proceedings of the National Academy of Sciences, 105(43), 16525-16530.

      (3) The following grammatical and fact changes and word changes are requested.

      We appreciate the thoughtful review and thank you for pointing out the grammatical, factual, and word changes required. We have carefully reviewed and addressed each of these corrections to ensure the paper's accuracy and readability.

      We appreciate the reviewers' detailed and constructive reviews. We have addressed all the comments to improve the quality of our paper.

    1. eLife Assessment

      Catani and colleagues provide data on antigenic properties of neuraminidase proteins of pandemic H1N1 and show that antigenic diversity of the neuraminidase from 2009 to 2020 largely falls into two groups. These antigenic groups map to two phylogenetic groups, and substitutions at positions 432 and 321 are likely associated with the antigenic change. These data and results allow useful insights into the antigenic properties of N1 influenza and the evidence supporting the conclusions is solid.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have performed an antigenic assay for human seasonal N1 neuraminidase using antigens and mouse sera from 2009-2020 (with one avian N1 antigen). This shows two distinct antigen groups. There is poorer reactivity with sera from 2009-2012 against antigens from 2015-2019, and poorer reactivity with sera from 2015-2020 against antigens from 2009-2013. There is a long branch separating these two groups. However, 321 and 423 are the only two positions that are consistently different between the two groups. Therefore these are the most likely cause of these antigenic differences.

      Strengths:

      (1) A sensible rationale was given for the choice of sera, in terms of the genetic diversity.

      (2) There were two independent batches of one of the antigens used for generating sera, which demonstrated the level of heterogeneity in the experimental process.

      (3) Replicate of the Wisconsin/588/2019 antigen (as H1 and H6) is another useful measure of heterogeneity.

      (4) The presentation of the data, e.g. Figure 2, clearly shows two main antigenic groups.

      (5) The most modern sera are more recent than other related papers, which demonstrates that has been no major antigenic change.

      Weaknesses:

      (1) Issues with experimental methods<br /> As I am not an experimentalist, I cannot comment fully on the experimental methods. However, I note that BALB/c mice sera were used, whereas outbred ferret sera are typically used in influenza antigenic characterisation, so the antigenic difference observed may not be relevant in humans. Similarly, the mice were immunised with an artificial NA immunogen where the typical approach would be to infect the ferret with live virus intra-nasally.

      (2) Five mice sera were generated per immunogen and then pooled, but data was not presented that demonstrated these sera were sufficiently homogenous that this approach is valid.

      (3) There were no homologous antigens for most of the sera. This makes the responses difficult to interpret as the homologous titre is often used to assess the overall reactivity of a serum. The sequence of the antigens used is not described, which again makes it difficult to interpret the results.

      (4) To be able to untangle the effects of the individual substitutions at 321, 386, and 432, it would have been useful to have included the naturally occurring variants at these positions, or to have generated mutants at these positions. Gao et al clearly show an antigenic difference with ferret sera correlated separately with N386K and I321V/K432E.

      (5) The challenge experiments in Gao et al showed that NI titre was not a good correlate of protection, so that limits the interpretation of these results.

      Issues with the computational methods

      (6) The NAI titres were normalised using the ELISA results, and the motivation for this is not explained. It would be nice to see the raw values.

      (7) It is not clear what value the random forest analysis adds here, given that positions 321 and 432 are the only two that consistently differ between the two groups.

      (8) As with the previous N2 paper, the metric for antigenic distance (the root mean square of the difference between the titres for two sera) is not one that would be consistent when different sera are included. More usual metrics of distance are Archetti-Horsfall, fold down from homologous, or fold down from maximum.

      (9) Antigenic cartography of these data is fraught. I wonder whether 2 dimensions are required for what seems like a 1-dimensional antigenic difference - certainly, the antigens, excluding the H5N1, are in a line. The map may be skewed by the high reactivity Brisbane/18 antigen. It is not clear if the column bases (normalisation factors for calculating antigenic distance) have been adjusted to account for the lack of homologous antigens. It is typical to present antigenic maps with a 1:1 x:y ratio.

      Issues with interpretation

      (10) Figure 2 shows the NAI titres split into two groups for the antigens, however, A/Brisbane is an outlier in the second antigenic group with high reactivity.

      (11) Following Gao et al, I think you can claim that it is more likely that the antigenic change is due to K432E than I321V, based on a comparison of the amino acid change.

      Appraisal:

      Taking into account the limitations of the experimental techniques (which I appreciate are due to resource constraints), this paper meets its aim of measuring the antigenic relationships between 2009-2020 seasonal N1s, showing that there were two main groups. The authors discovered that the difference between the two antigenic groups was likely attributable to positions 321 and 432, as these were the only two positions that were consistently different between the two groups. They came to this finding by using a random forest model, but other simpler methods could have been used.

      Impact:

      This paper contributes to the growing literature on the potential benefit of NA in the influenza vaccine.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Catani et al. have immunized mice with 17 recombinant N1 neuraminidases (NAs) from human isolates circulating between 2009-2020 to investigate antigenic diversity. NA inhibition (NAI) titers revealed two groups that were antigenically and phylogenetically distinct. Machine learning was used to estimate the antigenic distances between the N1 NAs and mutations at residues K432E and I321V were identified as key determinants of N1 NA antigenicity.

      Strengths:

      Observation of mutations associated with N1 antigenic drift.

      Weaknesses:

      Validation that K432E and I321V are responsible for antigenic drift was not determined in a background strain with native K432 and I321 or the restitution of antibody binding by reversion to K432 and I321 in strains that evaded sera.

    4. Author rsponse:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have performed an antigenic assay for human seasonal N1 neuraminidase using antigens and mouse sera from 2009-2020 (with one avian N1 antigen). This shows two distinct antigen groups. There is poorer reactivity with sera from 2009-2012 against antigens from 2015-2019, and poorer reactivity with sera from 2015-2020 against antigens from 2009-2013. There is a long branch separating these two groups. However, 321 and 423 are the only two positions that are consistently different between the two groups. Therefore these are the most likely cause of these antigenic differences.

      Strengths:

      (1) A sensible rationale was given for the choice of sera, in terms of the genetic diversity.

      (2) There were two independent batches of one of the antigens used for generating sera, which demonstrated the level of heterogeneity in the experimental process.

      (3) Replicate of the Wisconsin/588/2019 antigen (as H1 and H6) is another useful measure of heterogeneity.

      (4) The presentation of the data, e.g. Figure 2, clearly shows two main antigenic groups.

      (5) The most modern sera are more recent than other related papers, which demonstrates that has been no major antigenic change.

      Weaknesses:

      (1) Issues with experimental methods

      As I am not an experimentalist, I cannot comment fully on the experimental methods. However, I note that BALB/c mice sera were used, whereas outbred ferret sera are typically used in influenza antigenic characterisation, so the antigenic difference observed may not be relevant in humans. Similarly, the mice were immunised with an artificial NA immunogen where the typical approach would be to infect the ferret with live virus intra-nasally.

      Indeed, ferrets are the gold standard model for the study of influenza. The main reason for this is the susceptibility of ferrets to infection with primary human influenza virus isolates and their ability to transmit human influenza A and B viruses. Although mouse models often require the use of mouse-adapted influenza virus strains, it is still the most used model to study new developments on influenza vaccine.

      In our previous publication we performed a parallel analysis of sera of ferrets that were primed by infection and boosted by recombinant protein, as well as mice that, like in this study that focuses on N1 NA, were prime-boosted with purified recombinant NA proteins in the presence of an adjuvant. Our data indicate that the NAI responses in immune sera from infected ferrets after infection and after boost enables similar antigenic classification and correlated strongly with those induced in mice that had been prime-boosted with adjuvanted recombinant NA (Catani et al., eLife 2024). To a large extend, the immunogenicity of an antigen relies on epitope accessibility, which may dictate a universal rule of immunogenicity and antigenicity (Altman et al., 2015).

      (2) Five mice sera were generated per immunogen and then pooled, but data was not presented that demonstrated these sera were sufficiently homogenous that this approach is valid.

      Although individual sera was not tested here. Based on previous studies from our group we are confident that a prime-boost schedule with 1 µg of adjuvanted soluble tetrameric NA, induces a highly homogeneous response in mice (Catani et al., 2022).

      (3) There were no homologous antigens for most of the sera. This makes the responses difficult to interpret as the homologous titre is often used to assess the overall reactivity of a serum. The sequence of the antigens used is not described, which again makes it difficult to interpret the results.

      The absence of homologous antigens may indeed make interpretation more difficult. However, we have observed that homologous sera do not always coincide with the highest reactivity, although highest reactivity is always found within an antigenic cluster. A sequence comparison would be appropriate to improve interpretability of the data. Therefore, a sequence alignment and a pairwise comparison will be provided in the revised manuscript as supplement. 

      (4) To be able to untangle the effects of the individual substitutions at 321, 386, and 432, it would have been useful to have included the naturally occurring variants at these positions, or to have generated mutants at these positions. Gao et al clearly show an antigenic difference with ferret sera correlated separately with N386K and I321V/K432E.

      The prevalence of single amino acid substitutions in N1 NA of clinical H1N1 virus strains isolated between 2009 and 2024 is minimal, which may indicate reduced fitness (see Author response image 1) in strains with these substitutions in NA. Nevertheless, we agree that the rescue of single mutants would provide important evidence to untangle those individual impacts on antigenicity. We plan to generate mutants with substitution at these positions in NA of A/Wisconsin/588/2019 H1N1 and determine the NAI against our panel of sera.

      Author response image 1.

      Prevalence of the indicated N1 NA substitutions in all clinical human H1N1 isolates with unique sequences deposited in the GISAID data bank since 2009.

      (5) The challenge experiments in Gao et al showed that NI titre was not a good correlate of protection, so that limits the interpretation of these results.

      On the contrary, challenges experiments confirmed that drift occurred in NA from H1N1 viruses isolated between 2009 (CA/09) and 2015 (MI/15). The dilution of transferred sera to equal inhibitory titers indicate that the homologous ferret sera (shown in figure 5e-f)(Gao et al., 2019) is still effective in protecting against infection while heterologous sera are not. This result emphasises that the nature of the homologous NAI response is well-suited for protection against a homologous challenge, although mechanistic data was not provided.

      Issues with the computational methods

      (6) The NAI titres were normalised using the ELISA results, and the motivation for this is not explained. It would be nice to see the raw values.

      Mice were immunized with different batches of recombinant protein. Each of those batches may have distinct intrinsic immunogenicity, as observed in Figure 1d. For that reason, NAI values were normalized using homologous ELISA titers induced by each respective NA antigen. A table with the raw values will be included in the revised manuscript.

      (7) It is not clear what value the random forest analysis adds here, given that positions 321 and 432 are the only two that consistently differ between the two groups.

      The substitutions at position 321 and 432 are indeed the only 2 consistently differing amino acids among the tested N1s. Although their correlation with antigenic clustering may be obvious after analysis, a random forest analysis would enable to reveal less obvious substitutions that contribute to the antigenic diversity. In the future, we intend to expand this methodology to strains that are not currently included in the panel. A random forest model is a relatively simple and performant method to deal with a new dataset.

      (8) As with the previous N2 paper, the metric for antigenic distance (the root mean square of the difference between the titres for two sera) is not one that would be consistent when different sera are included. More usual metrics of distance are Archetti-Horsfall, fold down from homologous, or fold down from maximum.

      The antigenic distances calculated prior to our random forest does use fold-difference as metrics as log2(max(EC50) / EC50). After having obtained the fold-difference values, a pairwise dissimilarity matrix was calculated to obtain the average antigenic distance between pairs of sera. A more detailed description of the methodology will be included in the methods session, including the R-code.

      (9) Antigenic cartography of these data is fraught. I wonder whether 2 dimensions are required for what seems like a 1-dimensional antigenic difference - certainly, the antigens, excluding the H5N1, are in a line. The map may be skewed by the high reactivity Brisbane/18 antigen. It is not clear if the column bases (normalisation factors for calculating antigenic distance) have been adjusted to account for the lack of homologous antigens. It is typical to present antigenic maps with a 1:1 x:y ratio.

      Antigenic cartography will be repeated excluding H5N1 and/or Brisbane/18 antigen. Data will be provided in the final rebuttal letter.

      Issues with interpretation

      (10) Figure 2 shows the NAI titres split into two groups for the antigens, however, A/Brisbane is an outlier in the second antigenic group with high reactivity.

      Indeed, A/Brisbane/02/2018 has overall higher IC50 values. However, it still falls into the same cluster that we called AG2. Highlighting A/Brisbane/02/2018 may lead to the misinterpretation of a non-existent antigenic group. 

      (11) Following Gao et al, I think you can claim that it is more likely that the antigenic change is due to K432E than I321V, based on a comparison of the amino acid change.

      Indeed, we would expect that substitution of the basic arginine to an acidic glutamate is more likely to impact antigenicity than the isoleucine-to-valine apolar substitution. Testing of mutant reassortants with single mutations may provide the definitive answer for that question.

      Appraisal:

      Taking into account the limitations of the experimental techniques (which I appreciate are due to resource constraints), this paper meets its aim of measuring the antigenic relationships between 2009-2020 seasonal N1s, showing that there were two main groups. The authors discovered that the difference between the two antigenic groups was likely attributable to positions 321 and 432, as these were the only two positions that were consistently different between the two groups. They came to this finding by using a random forest model, but other simpler methods could have been used.

      Impact:

      This paper contributes to the growing literature on the potential benefit of NA in the influenza vaccine.

      Reviewer #2 (Public review):

      Summary:

      In this study, Catani et al. have immunized mice with 17 recombinant N1 neuraminidases (NAs) from human isolates circulating between 2009-2020 to investigate antigenic diversity. NA inhibition (NAI) titers revealed two groups that were antigenically and phylogenetically distinct. Machine learning was used to estimate the antigenic distances between the N1 NAs and mutations at residues K432E and I321V were identified as key determinants of N1 NA antigenicity.

      Strengths:

      Observation of mutations associated with N1 antigenic drift.

      Weaknesses:

      Validation that K432E and I321V are responsible for antigenic drift was not determined in a background strain with native K432 and I321 or the restitution of antibody binding by reversion to K432 and I321 in strains that evaded sera.

      Reassortant A/Wisconsin/588/2019 with E432K, V321I and also K386N single mutations will be rescued and tested against the panel of sera.

    1. eLife Assessment

      The study by Chi and colleagues presents important new tools for precise genetic manipulation and lineage tracing in mice. The characterization of these new models was conducted using validated, state-of-the-art methodologies and convincingly demonstrates their ability to enhance the precision of genetic manipulation in distinct cell types. This work will be of great interest to many laboratories worldwide and will facilitate future research across various biomedical disciplines.

    2. Reviewer #1 (Public review):

      Summary:

      Shi and colleagues report the use of modified Cre lines in which the coding region of Cre is disrupted by rox-STOP-rox or lox-STOP-lox sequences to prevent the expression of functional protein in the absence of Dre or Cre activity, respectively. The main purpose of these tools is to enable intersectional or tamoxifen-induced Cre activity with minimal or no leaky activity from the second, Cre-expressing allele. It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      Strengths:

      The new tools can reduce Cre leak in vivo.

      Comments on revisions:

      The major improvement in my mind is the inclusion of Supp Fig 7 where the authors compare their loxCre to iSureCre. The discussion is somewhat improved, but still fails to discuss significant issues such as Cre toxicity in detail. As noted by most reviewers, without a biological question the paper is entirely a technical description of a a couple of new tools. However, I do feel that these tools will be of use to the field.

    3. Reviewer #2 (Public review):

      This work present new genetic tools for enhanced Cre-mediated gene deletion and genetic lineage tracing. The authors optimise and generate mouse models that convert temporally controlled CreER or DreER activity to constitutive Cre expression, coupled with the expression of tdT reporter for the visualizing and tracing of gene-deleted cells. This was achieved by inserting a stop cassette into the coding region of Cre, splitting it into N- and C-terminal segments. Removal of the stop cassette by Cre-lox or Dre-rox recombination results in the generation of modified Cre that is shown to exhibit similar activity to native Cre. The authors further demonstrate efficient gene knockout in cells marked by the reporter using these tools, including intersectional genetic targeting of pericentral hepatocytes.

      The new models offer several important advantages. They enable tightly controlled and highly effective genetic deletion of even alleles that are difficult to recombine. By coupling Cre expression to reporter expression, these models reliably report Cre-expressing i.e. gene-targeted cells and circumvent false positives that can complicate analyses in genetic mutants relying on separate reporter alleles. Moreover, the combinatorial use of Dre/Cre permits intersectional genetic targeting, allowing for more precise fate mapping.

      The study and the new models have also some limitations. The demonstration of efficient deletion of multiple floxed alleles in a mosaic fashion, a scenario where the lines would demonstrate their full potential compared to existing models, has not been tested in the current study. Mosaic genetics is increasingly recognized as a key methodology for assessing cell-autonomous gene functions. The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. In addition, as discussed by the authors, a limitation of this line is the constitutive expression of Cre, which is associated with toxicity in some cases.

    4. Reviewer #3 (Public review):

      Shi et al describe a new set of tools to facilitate Cre or Dre-recombinase-mediated recombination in mice. The strategies are not completely novel but have been pursued previously by the lab, which is world-leading in this field, and by others. The authors report a new version of the iSuRe-Cre approach, which was originally developed by Rui Benedito's group in Spain. Shi et al describe that their approach shows reduced leakiness compared to the iSuRe-Cre line. Furthermore, a new R26-roxCre-tdT mouse line was established after extensive testing, which enables efficient expression of the Cre recombinase after activation of the Dre recombinase. The authors carefully evaluated efficiency and leakiness of the new line and demonstrated the applicability by marking peri-central hepatocytes in an intersectional genetics approach. The paper represents the result of enormous, carefully executed efforts. Although I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, carefully conducted technical studies have a considerable value for the scientific community, justifying publication.

      It seems very likely that the new mouse lines generated in this study will enhance the precision of genetic manipulation in distinct cell types and greatly facilitate future work in numerous laboratories. The authors expertly have eradicated weaknesses from the initial submission. One minor issue remains. The authors did not investigate potential toxic effects that might be caused by high level expression of a combination of "foreign" genes such as recombinases and fluorescence reporters. The authors refer to published studies about toxic effects, speculating that they can only be prevented by removing recombinases in an additional step. Although this is a valid argument, I would have appreciated to see an assessment of putative toxic effects by RNA-sequencing, since different combinations of recombinases and fluorescence reporters sometimes can generate unexpected effects. However, this minor issue does not compromise the value of this important study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for bringing up this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence, as shown in Supplementary Figure 9.

      (3) The most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      Thank you for your valuable and insightful comment. The comparison results of R26-loxCre-tdT with iSuRe-Cre using Alb-CreER and targeting R26-Confetti can be found in Supplementary Figure 7 C-E, according to the reviewer’s suggestion.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      We appreciate your thoughtful suggestions. The schematic figures, along with the nucleotide sequences for the generation of mice, can be found in the revised Supplementary Figure 9.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      Thank you for your thoughtful and constructive comment. The comparative analysis of R26-loxCre-tdT with iSuRe-Cre, employing Alb-CreER to target R26-Confetti, is provided in Supplementary Figure 7 C-E.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc.2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in Author response image 1.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated the R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. Additional pertinent experimental data can be referenced in Figure S4C, Figure S7A-B, and Figure S8A.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We value your feedback and have incorporated your suggestion to strengthen our study. Relevant experimental data can be referenced in Figure S8E-G.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      The staining in Figure 4F in the revision is intended to deliver optimized and high-resolution images.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high-resolution images here. Author response image 2 shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 2.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

      Recommendations for the authors:

      Reviewer #1:

      (1) However, for it to be useful to investigators a more direct comparison with the Benedito iSure line (or the latest version) is required as that is the crux of the study.

      Thank you for emphasizing this point, which we have now addressed in the revised manuscript and in Figure S7D-G.

      (2) I would like to know how the authors will make these new lines available to outside investigators.

      Please contact the lead author by email to consult about the availability of new mouse lines developed in this study.

      (3) The discussion is overly long and fails to address potential weaknesses. Much of it reiterates what was already said in the results section.

      We are thankful for your critical evaluation, which has helped us improve our discussion.

      Reviewer #2:

      (1) Assessing the efficiency and accuracy of the lines in mosaic deletions of multiple alleles and reporting them in single cells after low-dose tamoxifen exposure would be highly beneficial to demonstrate the full potential of the models.

      We appreciate your careful consideration of this issue. Our future endeavors will focus on mosaic analysis utilizing sparse labeling and efficient gene deletion, employing both roxCre and loxCre strategies.

      (2) Performing FACS analysis to confirm that all targeted (Cre reporter-positive) cells are also tdT-positive would provide more precise data and avoid vague statements like 'virtually all' or 'almost complete' in the results section:

      Line 166: Although mCre efficiently labeled virtually all targeted cells (Figure S3A-E)…

      Line 293: ... and not a single tdT+ hepatocyte 293 expressed Cyp2e1 (Figure 6D)... However, the authors do not provide any quantification. FACS would be ideal here.

      Line 244: ...expression of beta-catenin and GS almost disappeared in the 4W mutant sample... The resolution in the provided PDF is not adequate for assessment.

      Line 296: ... revealed almost complete deletion of Ctnnb1 in the Alb-CreER;R26-tdT2;Ctnnb1flox/flox mice...

      Thank you for suggesting these improvements, which have strengthened the robustness of our conclusions. In the revised version, we have incorporated FACS results that correspond to related sections. Additionally, a quantification statement has been included in the statistical analysis section. We appreciate your meticulous review and comments, which have significantly improved the clarity of our manuscript.

      (3) In the beginning of the results section, it is not clear which results are from this study and which are known background information (like Figure 1A). For example, it is not clear if Figure 1C presents data from R26-iSuRe-Cre. Please revise the text to more clearly present the experimental details and new findings.

      Thank you for this observation. Figure 1C belongs to this study, and the revised version has been modified to the related statement for improved clarity.

      (4) Experimental details regarding the genetic constructs and genotyping of the new knock-in lines are missing. Are R26 constructs driven by the endogenous R26 promoter or were additional enhancers used?

      Thank you for emphasizing this point. The schematic figures and nucleotide sequences for the generation of mice can be found in the revised Supplementary Figure 9, which can help to address this issue.

      (5) The method used to quantify mCre activity in terms of reporter+ target cells is not specified. From images or by FACS?

      Additionally, if images were used for quantification, it would be important to provide details on the number of images analyzed, the number of cells counted per image, and how individual cells were identified.

      Thank you for your comment. We have included the quantification statement in the statistical analysis section. Analyzing R26-Confetti+ target cells using FACS is challenging due to the limitations of the sorting instrument. Consequently, we quantified the related data by images. Each dot on the chart represents one sample, and the quantification for each mouse was conducted by averaging the data from five 10x fields taken from different sections.

      (6) Line 160: These data demonstrate that roxCre was functionally efficient yet non-leaky. Functional efficiency in vivo was not shown in the preceding experiments.

      Functional efficiency in vivo can be referred to in Figures S1-S2 and S4C.

      (7) It would be useful to provide a reference for easy vs low-efficiency recombination of different reporter alleles (lines 56-58).

      We are grateful for this comment, as it has allowed us to improve the clarity of our explanation. Consequently, we have made the necessary modifications.

      (8) Discussion on the potential drawbacks and limitations of the lines would be useful.

      We are thankful for your evaluation, which has significantly contributed to the enhancement of our discourse.

    1. eLife Assessment

      This important study examined orientation representations along the visual hierarchy during perception and working memory. The authors provide results suggesting that during working memory there is a gradient where representations are more categorical in nature later in the visual hierarchy. The evidence presented is solid, most notably a match between behavioral data, though minor weakness can be attributed to the tasks and behaviors not being designed to address this question. The findings should be of interest to a relatively broad audience, namely those interested in the relationship between sensory coding and memory.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Chunharas and colleagues compared the representational differences of orientation information during a sensory task and a working memory task. By reanalyzing data from a previous fMRI study and applying representational similarity analysis (RSA), they observed that orientation information was represented differently in the two tasks: during visual perception, orientation representation resembled the veridical model, which captures the known naturalistic statistics of orientation information; whereas during visual working memory, a categorical model, which assumes different psychological distances between orientations, better explained the data, particularly in more anterior retinotopic regions. The authors suggest fundamental differences in the representational geometry of visual perception and working memory along the human retinotopic cortex.

      Strengths:

      Examining the differences in representational geometry between perception and working memory has important implications for the understanding of the nature of working memory. This study presents a carefully-executed reanalysis of previous data to address this question. The authors developed a novel method (model construction combined with RSA) to examine the representational geometry of orientation information under different tasks, and the control analyses provide rich, convincing support for their claims.

      Weaknesses:

      Although the control analyses are convincing, I still have alternative explanations for some of the results. I'm also concerned about the low sample size (n = 6 in the fMRI experiment). Overall, I think additional analyses may help to further clarify the issues and strengthen the claims.

      (1) The central claim of the current study is that orientation information is represented in a veridical manner during the sensory task, and in a categorical manner during working memory. However, In the sensory task, a third type of representational geometry was observed, especially in brain regions from V3AB and beyond. These regions showed a symmetric pattern in which oblique orientations (45 and 135 degrees) appeared more similar to each other. In fact, a similar pattern can even be found in V1-V3, although the effect looked weaker. The authors raised two possible explanations for this in the discussion, one being that participants might have used verbal labels (e.g., diagonal) for both orientations, and the other being a lack of attention to orientation. Either way, this suggests that a veridical model may not be the best fit for these ROIs. How would this symmetric model explain the sensory data, in comparison to the veridical model?

      (2) If the symmetric model also explains the sensory data well, I wonder whether this result challenges the authors' central claim, or instead suggests that the sensory task is not ideal for the purpose of the study. One way to address this issue might be to use the sample period of the working memory task as the perception task, as some other studies have been doing (e.g., Kwak & Curtis, 2022). This epoch of data might function as a stronger version of the attention task as the authors discussed in the discussion. What would the representational geometry look like in the sample period? I would also like to note that the current analyses used 5.6-13.6 s after stimulus onset for the memory task, which I think may reflect a mix of sample- and delay-related activity.

      (3) When comparing the veridical and categorical models, it is important to first show the significance of each model before making comparisons. For instance, was the veridical model significant in different ROIs in the memory task? And was either model significant in IPS1-3 in the two tasks? I'm asking about this because the two models appear to be both significant in the memory task, whereas only the veridical model was significant in the sensory task (with overall lower correlation coefficients than the categorical model in the memory task).

      (4) The current study has a low sample size of six participants. With such a small sample, it would be helpful to show results from individual participants. For example, I appreciate that Figures 2D and 3C showed individual data points, but additionally showing the representational geometry plot (i.e., Figure 1C) for each subject could better illustrate the robustness of the effect. Alternatively, the original paper from which the fMRI data were drawn actually had two fMRI experiments with similar task designs. I wonder if the authors could replicate these patterns using data from the second experiment with seven participants. This might provide even stronger support for the current findings with a more reasonable sample size.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examined the representational geometry of orientation representations during visual perception and working memory along the visual hierarchy. Using representational similarity analysis, they found that similarity was relatively evenly distributed among all orientations during perception, while higher around oblique orientations during WM. There were some noticeable differences along the visual hierarchy. IPS showed the most pronounced oblique orientation preferences during WM but no clear patterns during perception, likely due to the different task demands for the WM orientation task and the perception contrast discrimination task. The authors proposed two models to capture the differences. The veridical model estimated the representational geometry in perception by assuming an efficient coding framework, while the categorical model estimated the pattern in WM using psychological distances to measure the differences among orientations (including estimates from a separate psychophysical study performed outside the scanner). Therefore, I think this work is valuable and advances our understanding of the transition from perception to memory.

      Strengths:

      The use of RSA to identify representational biases goes beyond simply relying on response patterns and helps identify how representational formats change from perception to WM. The study nicely leverages ideas about efficient coding to explain perceptual representations that are more veridical, while leaning on ideas about abstractions of percepts that are more categorical-psychological in nature (but see (1) below). Moreover, the match between memory biases of orientation and the patterns estimated with RSA were compelling (but see (2) below). I found the analyses showing how RSA and decoding (eg, cross-generalization) are associated and how/why they may differ to be particularly interesting.

      Weaknesses:

      (1) The idea that later visual maps (ie, IPS0) encode perceptions of orientation in a veridical form and then in a categorical form during WM is an attractive idea. However, the support is somewhat weakened by a few issues. The RSA plots in Figure 1C for IPS0 appear to show a similar pattern, but just of lower amplitude during perception. But in the model fits either for orientation statistics or estimated from the psychophysics task, the Veridical model fits best for perception and the Categorical model fits best for memory in IPS0. By my eye, the modeled RSMs in Figures 2 & 3 do not look like the observed ones in Figure 1C. Those modeled RSMs look way more categorical than the observed IPS0. They look like something in between.

      (2) My biggest concern is the omission of the in-scanner behavioral data. Yes, on the one hand, they used the N=17 outside the scanner psychophysics dataset for the analyses in Figure 3. On the other hand, they do not even mention the behavioral data collected in the scanner along with the BOLD data. Those data had clear oblique effects if I recall correctly. Why use the data from the psychophysics experiment? Also, perhaps a missed opportunity; I wonder if the Veridical/Categorical models fit a single subject's RSA data matches that subject's behavioral biases. That would really be compelling if found.

      The data were collected (reanalysis of published study) without consideration for the aims of the current study, and are therefore not optimized to test their goals. The biggest issue is that "The distractors are really distracting me." I'm somewhat concerned about how the distractors may have impacted the results. I honestly did not notice that the authors were using delay periods that had 11s of distractor stimuli until way into the paper. On the one hand, the "patterns" of the model fits across the ROIs appear to be qualitatively similar. That's good if you want to pool data like the authors did. But, while the authors state on line 350 "..we also confirmed that the presence of distractors during the delay did not impact the pattern of results in the memory task (Supplementary Figure 5)." When looking at Supplementary Figure 5, I noticed that there are a couple of exceptions to this. In the Gratings distractor data, V1 shows a better fit to the Veridical model, while V4 and IPS0 shows no better fit to either model. And in the Noise distractor data, neither model fits better for any ROI. At first glance, I was concerned, but then looking at the No distractor data, the pattern is identical to that of the combined data. Thus, this can be seen as a glass half full/empty issue as almost all of the ROIs show a similar pattern, but still it would concern me if I were leading this study. This gets me to my key question, why even use the distractor trials at all, where the interpretation can get dicey? For instance, the authors have shown in this exact data that the impact of distraction affects the fidelity of representations differently along the visual hierarchy (Rademaker, 2019), consistent with several other studies (eg., Bettencourt & Xu, 2016; Lorenc, 2018; Hallenbeck et al., 2022) and with one of the author's preprints (Rademaker & Serences, 2024). My guess is that without the full dataset, some of the RSA analyses are underpowered. If that is the case, I'm fine with it, but it might be nice to state that.

    1. eLife Assessment

      The songbird vocal motor nucleus HVC contains cells that project to the basal ganglia, the auditory system, or downstream vocal motor structures. In this fundamental study, the authors conduct optogenetic circuit mapping to clarify how four distinct inputs to HVC act on these distinct HVC cell types. They provide compelling evidence that all long-range projections engage inhibitory circuits in HVC and can also exhibit cell-type specific preferences in monosynaptic input strength. Understanding the HVC microcircuit at this microcircuit level is critical for informing models of song learning and production.

    2. Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron afferents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

    4. Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

    1. eLife Assessment

      This work presents important findings that the human frontal cortex is involved in a flexible, dual role in both maintaining information in short-term memory, and controlling this memory content to guide adaptive behavior and decisions. The evidence supporting the conclusions is compelling, with a well-designed task, best-practice decoding methods, and careful control analyses. The work will be of broad interest to cognitive neuroscience researchers working on working memory and cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shao et al. investigate the contribution of different cortical areas to working memory maintenance and control processes, an important topic involving different ideas about how the human brain represents and uses information when no longer available to sensory systems. In two fMRI experiments, they demonstrate that human frontal cortex (area sPCS) represents stimulus (orientation) information both during typical maintenance, but even more so when a categorical response demand is present. That is, when participants have to apply an added level of decision control to the WM stimulus, sPCS areas encode stimulus information more than conditions without this added demand. These effects are then expanded upon using multi-area neural network models, recapitulating the empirical gradient of memory vs control effects from visual to parietal and frontal cortices. Multiple experiments and analysis frameworks provide support for the authors' conclusions, and control experiments and analysis are provided to help interpret and isolate the frontal cortex effect of interest. While some alternative explanations/theories may explain the roles of frontal cortex in this study and experiments, important additional analyses have been added that help ensure a strong level of support for these results and interpretations.

      Strengths:

      - The authors use an interesting and clever task design across two fMRI experiments that is able to parse out contributions of WM maintenance alone along with categorical, rule-based decisions. Importantly, the second experiments only uses one fixed rule, providing both an internal replication of Experiment 1's effects and extending them to a different situation when rule switching effects are not involved across mini-blocks.

      - The reported analyses using both inverted encoding models (IEM) and decoders (SVM) demonstrate the stimulus reconstruction effects across different methods, which may be sensitive to different aspects of the relationship between patterns of brain activity and the experimental stimuli.

      - Linking the multivariate activity patterns to memory behavior is critical in thinking about the potential differential roles of cortical areas in sub-serving successful working memory. Figure 3's nicely shows a similar interaction to that of Figure 2 in the role of sPCS in the categorization vs. maintenance tasks. This is an important contribution to the field when we consider how a distributed set of interacting cortical areas support successful working memory behavior.

      - The cross-decoding analysis in Figure 4 is a clever and interesting way to parse out how stimulus and rule/category information may be intertwined, which would have been one of the foremost potential questions or analyses requested by careful readers.

      - Additional ROI analyses in more anterior regions of the PFC help to contextualize the main effects of interest in the sPCS (and no effect in the inferior frontal areas, which are also retinotopic, adds specificity). And, more explanation for how motor areas or preparation are likely not involved strengthens the takeaways of the study (M1 control analysis).

      - Quantitative link via RDM-style analyses between the RNNs constructed and fMRI data.

      Weaknesses:

      - In the given tasks, multiple types of information codes may be present, and more detail on this possibility could always be added analytically or in discussion. However, the authors have added beneficial support to this comparison in this version of the manuscript.

      - The space of possible RNN architectures and their biological feasibility could always be explored more, but links between the fMRI and RNN data provide a good foundation for this work moving forward.

    3. Reviewer #2 (Public review):

      Summary:

      The author provide evidence that helps resolve long-standing questions about the differential involvement of frontal and posterior cortex in working memory. They show that whereas early visual cortex shows stronger decoding of memory content in a memorization task vs a more complex categorization task, frontal cortex shows stronger decoding during categorization tasks than memorization tasks. They find that task-optimized RNNs trained to reproduce the memorized orientations show some similarities in neural decoding to people. Together, this paper presents interesting evidence for differential responsibilities of brain areas in working memory.

      Strengths:

      This paper was overall strong. It had a well-designed task, best-practice decoding methods, and careful control analyses. The neural network modeling adds additional insight into the potential computational roles of different regions.

      Weaknesses:

      Few. The RNN-fMRI correspondence could be a little more comprehensive, but the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to sincerely thank the reviewers again for their insightful comments on the previous version of our manuscript. In the last round of review, the reviewers were mostly satisfied with our revision but raised a few suggestions and/or remaining concerns. We have further edited the manuscript to address these concerns.

      Reviewer #1:

      - An explicit, quantitative link between the RNN and fMRI data is perhaps a last point that would integrate the RNN conclusion and analyses in line with the human imaging data.

      Reviewer #2:

      - Few. While more could be perhaps done to understand the RNN-fMRI correspondence, the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

      To better align the RNN and fMRI results qualitatively, we performed an additional representational similarity analysis (RSA) on the data. Specifically, we computed the representational dissimilarity matrices (RDMs) for fMRI and RNN data separately, and calculated the correlation between the RDMs to quantify the similarity between fMRI data and different RNN models. We found that, consistent with our main claims, RNN2 generally demonstrated higher similarity with the fMRI data compared to RNN1. These results provide further support that RNN2 aligns better with human neuroimaging data. We have included this result (lines 496-505) and the corresponding figure (Figure 7) in the manuscript.

      Reviewer #1:

      - As Rev 2 mentions, multiple types of information codes may be present, and the response letter Figure 5 using representational similarity (RSA) gets at this question. It would strengthen the work to, at minimum, include this analysis as an extended or supplemental figure.

      Following this suggestion, we have now included Response Letter Figure 5 from the previous round of review in the manuscript (lines 381-387 and Appendix 1 – figure 7).

      Reviewer #1:

      - To sum up the results, a possible, brief schematic of each cortical area analyzed and its contribution to information coding in WM and successful subsequent behavior may help readers take away important conclusions of the cortical circuitry involved.

      Following this suggestion, we have added a schematic figure illustrating the contribution of each cortical region in our experiment to better summarize our findings (Figure 8).

      We hope that these changes further clarify the issues and strengthen the key claims in our manuscript.

    1. eLife Assessment

      This study presents a fundamental finding on how levels of m6A levels are controlled, invoking a consolidated model where degradation of modified RNAs in the cytoplasm plays a primary role in shaping m6A patterns and dynamics, rather than needing active regulation by m6A erasers and other related processes. The evidence is compelling through its use of transcriptome-wide data and mechanistic modeling. Relevant for any reader with an interest in RNA metabolism, this new framework consolidates previous observations and highlights the importance of careful experimentation for evaluation m6A levels.

    2. Reviewer #1 (Public review):

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epitranscriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

    3. Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells is influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes Mrna decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which shows high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels is rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

      Comments on revisions:

      The authors have done a good job with the revision.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epi transcriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

      Weaknesses:

      It is essential to notice that m6ADyn does not exactly recapitulate the observed m6A changes. First, this can be due to m6ADyn's limitations. The authors do a great job in the Discussion highlighting these limitations. Indeed, they mention how m6ADyn cannot interpret m6A's implications on nuclear degradation or splicing and cannot model more complex scenario predictions (i.e., a scenario in which m6A both impacts export and degradation) or the contribution of single sites within a gene.

      Secondly, since predictions do not exactly recapitulate the observed m6A changes, "active" regulatory events may still play a partial role in regulating m6A changes. The authors themselves highlight situations in which data do not support m6ADyn predictions. Active mechanisms to control m6A degradation levels or mRNA export levels could exist and may still play an essential role.

      We are grateful for the reviewer’s appreciation of our findings and their implications, and are in full agreement with the reviewer regarding the limitations of our model, and the discrepancies in some cases - with our experimental measurements, potentially pointing at more complex biology than is captured by m6ADyn. We certainly cannot dismiss the possibility that active mechanisms may play a role in shaping m6A dynamics at some sites, or in some contexts. Our study aims to broaden the discussion in the field, and to introduce the possibility that passive models can explain a substantial extent of the variability observed in m6A levels.

      (1) "We next sought to assess whether alternative models could readily predict the positive correlation between m6A and nuclear localization and the negative correlations between m6A and mRNA stability. We assessed how nuclear decay might impact these associations by introducing nuclear decay as an additional rate, δ. We found that both associations were robust to this additional rate (Supplementary Figure 2a-c)."

      Based on the data, I would say that model 2 (m6A-dep + nuclear degradation) is better than model 1. The discussion of these findings in the Discussion could help clarify how to interpret this prediction. Is nuclear degradation playing a significant role, more than expected by previous studies?

      This is an important point, which we’ve now clarified in the discussion. Including nonspecific nuclear degradation in the m6ADyn framework provides a model that better aligns with the observed data, particularly by mitigating unrealistic predictions such as excessive nuclear accumulation for genes with very low sampled export rates. This adjustment addresses potential artifacts in nuclear abundance and half-life estimations. However, we continued to use the simpler version of m6ADyn for most analyses, as it captures the key dynamics and relationships effectively without introducing additional complexity. While including nuclear degradation enhances the model's robustness, it does not fundamentally alter the primary conclusions or outcomes. This balance allows for a more straightforward interpretation of the results.

      (2) The authors classify m6A levels as "low" or "high," and it is unclear how "low" differs from unmethylated mRNAs.

      We thank the reviewer for this observation. We analyzed gene methylation levels using the m6A-GI (m6A gene index) metric, which reflects the enrichment of the IP fraction across the entire gene body (CDS + 3UTR). While some genes may have minimal or no methylation, most genes likely exist along a spectrum from low to high methylation levels. Unlike earlier analyses that relied on arbitrary thresholds to classify sites as methylated, GLORI data highlight the presence of many low-stoichiometry sites that are typically overlooked. To capture this spectrum, we binned genes into equal-sized groups based on their m6A-GI values, allowing a more nuanced interpretation of methylation patterns as a continuum rather than a binary or discrete classification (e.g. no- , low- , high methylation).

      (3) The authors explore whether m6A changes could be linked with differences in mRNA subcellular localization. They tested this hypothesis by looking at mRNA changes during heat stress, a complex scenario to predict with m6ADyn. According to the collected data, heat shock is not associated with dramatic changes in m6A levels. However, the authors observe a redistribution of m6A mRNAs during the treatment and recovery time, with highly methylated mRNAs getting retained in the nucleus being associated with a shorter half-life, and being transcriptional induced by HSF1. Based on this observation, the authors use m6Adyn to predict the contribution of RNA export, RNA degradation, and RNA transcription to the observed m6A changes. However:

      (a) Do the authors have a comparison of m6ADyn predictions based on the assumption that RNA export and RNA transcription may change at the same time?

      We thank the reviewer for this point. Under the simple framework of m6ADyn in which RNA transcription and RNA export are independent of each other, the effect of simultaneously modulating two rates is additive. In Author response image 1, we simulate some scenarios wherein we simultaneously modulate two rates. For example, transcriptional upregulation and decreased export during heat shock could reinforce m6A increases, whereas transcriptional downregulation might counteract the effects of reduced export. Note that while production and export can act in similar or opposing directions, the former can only lead to temporary changes in m6A levels but without impacting steady-state levels, whereas the latter (changes in export) can alter steady-state levels. We have clarified this in the manuscript results to better contextualize how these dynamics interact.

      Author response image 1.

      m6ADyn predictions of m6A gene levels (left) and Nuc to Cyt ratio (right) upon varying perturbations of a sampled gene. The left panel depicts the simulated dynamics of log2-transformed m6A gene levels under varying conditions. The lines represent the following perturbations: (1) export is reduced to 10% (β), (2) production is increased 10-fold (α) while export is reduced to 10% (β), (3) export is reduced to 10% (β) and production is reduced to 10% (α), and (4) export is only decreased for methylated transcripts (β^m6A) to 10%. The right panel shows the corresponding nuclear:cytoplasmic (log2 Nuc:Cyt) ratios for perturbations 1 and 4.

      (b) They arbitrarily set the global reduction of export to 10%, but I'm not sure we can completely rule out whether m6A mRNAs have an export rate during heat shock similar to the non-methylated mRNAs. What happens if the authors simulate that the block in export could be preferential for m6A mRNAs only?

      We thank the reviewer for this interesting suggestion. While we cannot fully rule out such a scenario, we can identify arguments against it being an exclusive explanation. Specifically, an exclusive reduction in the export rate of methylated transcripts would be expected to increase the relationship between steady-state m6A levels (the ratio of methylated to unmethylated transcripts) and changes in localization, such that genes with higher m6A levels would exhibit a greater relative increase in the nuclear-to-cytoplasmic (Nuc:Cyt) ratio. However, the attached analysis shows only a weak association during heat stress, where genes with higher m6A-GI levels tend to increase just a little more in the Nuc:Cyt ratio, likely due to cytoplasmic depletion. A global reduction of export (β 10%) produces a similar association, while a scenario where only the export of methylated transcripts is reduced (β^m6A 10%) results in a significantly stronger association (Author response image 2). This supports the plausibility of a global export reduction. Additionally, genes with very low methylation levels in control conditions also show a significant increase in the Nuc:Cyt ratio, which is inconsistent with a scenario of preferential export reduction for methylated transcripts (data not shown).

      Author response image 2.

      Wild-type MEFs m6A-GIs (x-axis) vs. fold change nuclear:cytoplasmic localization heat shock 1.5 h and control (y-axis), Pearson’s correlation indicated (left panel). m6ADyn, rates sampled for 100 genes based on gamma distributions and simulation based on reducing the global export rate (β) to 10% (middle panel). m6ADyn simulation for reducing the export rate for m6A methylated transcripts (β^m6A) to 10% (right panel).

      (c) The dramatic increase in the nucleus: cytoplasmic ratio of mRNA upon heat stress may not reflect the overall m6A mRNA distribution upon heat stress. It would be interesting to repeat the same experiment in METTL3 KO cells. Of note, m6A mRNA granules have been observed within 30 minutes of heat shock. Thus, some m6A mRNAs may still be preferentially enriched in these granules for storage rather than being directly degraded. Overall, it would be interesting to understand the authors' position relative to previous studies of m6A during heat stress.

      The reviewer suggests that methylation is actively driving localization during heat shock, rather than being passively regulated. To address this question, we have now knocked down WTAP, an essential component of the methylation machinery, and monitored nuclear:cytoplasmic localization over the course of a heat shock response. Even with reduced m6A levels, high PC1 genes exhibit increased nuclear abundance during heat shock. Notably, the dynamics of this trend are altered, with the peak effect delayed from 1.5h heat shock in siCTRL samples to 4 hours in siWTAP samples (Supplementary Figure 4). This finding underscores that m6A is not the primary driver of these mRNA localization changes but rather reflects broader mRNA metabolic shifts during heat shock. These findings have been added as a panel e) to Supplementary Figure 4.

      (d) Gene Ontology analysis based on the top 1000 PC1 genes shows an enrichment of GOs involved in post-translational protein modification more than GOs involved in cellular response to stress, which is highlighted by the authors and used as justification to study RNA transcriptional events upon heat shock. How do the authors think that GOs involved in post-translational protein modification may contribute to the observed data?

      High PC1 genes exhibit increased methylation and a shift in nuclear-to-cytoplasmic localization during heat stress. While the enriched GO terms for these genes are not exclusively related to stress-response proteins, one could speculate that their nuclear retention reduces translation during heat stress. The heat stress response genes are of particular interest, which are massively transcriptionally induced and display increased methylation. This observation supports m6ADyn predictions that elevated methylation levels in these genes are driven by transcriptional induction rather than solely by decreased export rates.

      (e) Additionally, the authors first mention that there is no dramatic change in m6A levels upon heat shock, "subtle quantitative differences were apparent," but then mention a "systematic increase in m6A levels observed in heat stress". It is unclear to which systematic increase they are referring to. Are the authors referring to previous studies? It is confusing in the field what exactly is going on after heat stress. For instance, in some papers, a preferential increase of 5'UTR m6A has been proposed rather than a systematic and general increase.

      We thank the reviewer for raising this point. In our manuscript, we sought to emphasize, on the one hand, the fact that m6A profiles are - at first approximation - “constitutive”, as indicated by high Pearson correlations between conditions (Supplementary Figure 4a). On the other hand, we sought to emphasize that the above notwithstanding, subtle quantitative differences are apparent in heat shock, encompassing large numbers of genes, and these differences are coherent with time following heat shock (and in this sense ‘systematic’), rather than randomly fluctuating across time points. Based on our analysis, these changes do not appear to be preferentially enriched at 5′UTR sites but occur more broadly across gene bodies (potentially a slight 3’ bias). A quick analysis of the HSF1-induced heat stress response genes, focusing on their relative enrichment of methylation upon heat shock, shows that the 5'UTR regions exhibit a roughly similar increase in methylation after 1.5 hours of heat stress compared to the rest of the gene body (Author response image 3). A prominent previous publication (Zhou et al. 2015) suggested that m6A levels specifically increase in the 5'UTR of HSPA1A in a YTHDF2- and HSF1-dependent manner, and highlighted the role of 5'UTR m6A methylation in regulating cap-independent translation, our findings do not support a 5'UTR-specific enrichment. However, we do observe that the methylation changes are still HSF1-dependent. Off note, the m6A-GI (m6A gene level) as a metric that captures the m6A enrichment of gene body excluding the 5’UTR, due to an overlap of transcription start site associated m6Am derived signal.

      Author response image 3.

      Fold change of m6A enrichment (m6A-IP / input) comparing 1.5 h heat shock and control conditions for 5UTR region and the rest of the gene body (CDS and 3UTR) in the 10 HSF! dependent stress response genes.

      Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation, and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

      We thank the reviewer for the spot-on suggestions and comments on this manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells are influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes mRNA decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which show high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels are rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

      We sincerely thank the reviewer for their thoughtful evaluation and constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Supplementary Figure 5A Data: Please double-check the label of the y-axis and the matching legend.

      We corrected this.

      (2) A better description of how the nuclear: cytoplasmic fractionation is performed.

      We added missing information to the Material & Methods section.

      (3) Rec 1hr or Rec 4hr instead of r1 and r4 to indicate the recovery.

      For brevity in Figure panels, we have chosen to stick with r1 and r4.

      (4) Figure 2D: are hours plotted?

      Plotted is the fold change (FC) of the calculated half-lives in hours (right). For the model (left) hours are the fold change of a dimension-less time-unit of the conditions with m6A facilitated degradation vs without. We have now clarified this in the legend.

      (5) How many genes do we have in each category? How many genes are you investigating each time?

      We thank the reviewer for this question. In all cases where we binned genes, we used equal-sized bins of genes that met the required coverage thresholds. We have reviewed the manuscript to ensure that the number of genes included in each analysis or the specific coverage thresholds used are clearly stated throughout the text.

      (6) Simulations on 1000 genes or 2000 genes?

      We thank the reviewer for this question and went over the text to correct for cases in which this was not clearly stated.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) The manuscript is very clear and well-written. However, some arguments are a bit difficult to understand. It would be helpful to clearly discriminate between active and passive events. For example, in the sentence: "For example, increasing the m6A deposition rate (⍺m6A) results in increased nuclear localization of a transcript, due to the increased cytoplasmic decay to which m6A-containing transcripts are subjected", I would directly write "increased relative nuclear localization" or "apparent increase in nuclear localization".

      We thank the reviewer for this careful observation. We have modified the quoted sentence, and also sought to correct additional instances of ambiguity in the text.

      Also, it is important to ensure that all relationships are described correctly. For example, in the sentence: "This model recovers the positive association between m6A and nuclear localization but gives rise to a positive association between m6A and decay", I think "decay" should be replaced with "stability". Similarly, the sentence: "Both the decrease in mRNA production rates and the reduction in export are predicted by m6ADyn to result in increasing m6A levels, ..." should it be "Both the increase in mRNA production and..."?

      We have corrected this.

      This sentence was difficult for me to understand: "Our findings raise the possibility that such changes could, at least in part, also be indirect and be mediated by the redistribution of mRNAs secondary to loss of cytoplasmic m6A-dependent decay." Please consider rephrasing it.

      We rephrased this sentence as suggested.

      (2) Figure 2d: "A final set of predictions of m6ADyn concerns m6A-dependent decay. m6ADyn predicts that (a) cytoplasmic genes will be more susceptible to increased m6A mediated decay, independent of their m6A levels, and (b) more methylated genes will undergo increased decay, independently of their relative localization (Figure 2d left) ... Strikingly, the experimental data supported the dual, independent impact of m6A levels and localization on mRNA stability (Figure 2d, right)."

      I do not understand, either from the text or from the figure, why the authors claim that m6A levels and localization independently affect mRNA stability. It is clear that "cytoplasmic genes will be more susceptible to increased m6A mediated decay", as they always show shorter half-lives (top-to-bottom perspective in Figure 2d). Nonetheless, as I understand it, the effect is not "independent of their m6A levels", as half-lives are clearly the shortest with the highest m6A levels (left-to-right perspective in each row).

      The two-dimensional heatmaps allow for exploring conditional independence between conditions. If an effect (in this case delta half-life) is a function of the X axis (in this case m6A levels), continuous increases should be seen going from one column to another. Conversely, if it is a function of the Y axis (in this case localization), a continuous effect should be observed from one row to another. Given that effects are generally observed both across rows and across columns, we concluded that the two act independently. The fact that half-life is shortest when genes are most cytoplasmic and have the highest m6A levels is therefore not necessarily inconsistent with two effects acting independently, but instead interpreted by us as the additive outcome of two independent effects. Having said this, a close inspection of this plot does reveal a very low impact of localization in contexts where m6A levels are very low, which could point at some degree of synergism between m6A levels and localization. We have therefore now revised the text to avoid describing the effects as "independent."

      (3) The methods part should be extended. For example, the description of the mRNA half-life estimation is far too short and lacks details. Also, information on the PCA analysis (Figure 4e & f) is completely missing. The code should be made available, at least for the differential model.

      We thank the reviewer for this point and expanded the methods section on mRNA stability analysis and PCA. Additionally, we added a supplementary file, providing R code for a basic m6ADyn simulation of m6A depleted to normal conditions (added Source Code 1).

      https://docs.google.com/spreadsheets/d/1Wy42QGDEPdfT-OAnmH01Bzq83hWVrYLsjy_B4n CJGFA/edit?usp=sharing

      (4) Figure 4e, f: The authors use a PCA analysis to achieve an unbiased ranking of genes based on their m6A level changes. From the present text and figures, it is unclear how this PCA was performed. Besides a description in the methods sections, the authors could show additional evidence that the PCA results in a meaningful clustering and that PC1 indeed captures induced/reduced m6A level changes for high/low-PC1 genes.

      We have added passages to the text, hoping to clarify the analysis approach.

      (5) In Figure 4i, I was surprised about the m6A dynamics for the HSF1-independent genes, with two clusters of increasing or decreasing m6A levels across the time course. Can the model explain these changes? Since expression does not seem to be systematically altered, are there differences in subcellular localization between the two clusters after heat shock?

      A general aspect of our manuscript is attributing changes in m6A levels during heat stress to alterations in mRNA metabolism, such as production or export. As shown in Supplementary Figure 4d, even in WT conditions, m6A level changes are not strictly associated with apparent changes in expression, but we try to show that these are a reflection of the decreased export rate. In the specific context of HSF1-dependent stress response genes, we observe a clear co-occurrence of increased m6A levels with increased expression levels, which we propose to be attributed to enhanced production rates during heat stress. This suggests that transcriptional induction can drive the apparent rise in m6A levels. We try to control this with the HSF1 KO cells, in which the m6A level changes, as the increased production rates are absent for the specific cluster of stress-induced genes, further supporting the role of transcriptional activation in shaping m6A levels for these genes. For HSF1-independent genes, the HSF-KO cells mirror the behavior of WT conditions when looking at 500 highest and lowest PC1 (based on the prior analysis in WT cells), suggesting that changes in m6A levels are primarily driven by altered export rates rather than changes in production.

      Among the HSF1 targets, Hspa1a seems to show an inverse behaviour, with the highest methylation in ctrl, even though expression strongly goes up after heat shock. Is this related to the subcellular localization of this particular transcript before and after heat shock?

      Upon reviewing the heat stress target genes, we identified an issue with the proper labeling of the gene symbols, which has now been corrected (Figure 4 panel i). The inverse behavior observed for Hspb1 and partially for Hsp90aa1 is not accounted for by the m6ADyn model, and is indeed an interesting exception with respect to all other induced genes. Further investigation will be required to understand the methylation dynamics of Hspb1 during the response to heat stress.

      Reviewer #3 (Recommendations for the authors):

      Page 4. Indicate reference for "a more recent study finding reduced m6A levels in chromatin-associated RNA.".

      We thank the reviewer for this point and added two publications with a very recent one, both showing that chromatin-associated nascent RNA has less m6A methylation

      The manuscript is perhaps a bit too long. It took me a long time to get to the end. The findings can be clearly presented in a more concise manner and that will ensure that anyone starting to read will finish it. This is not a weakness, but a hope that the authors can reduce the text.

      We have respectfully chosen to maintain the length of the manuscript. The model, its predictions and their relationship to experimental observations are somewhat complex, and we felt that further reduction of the text would come at the expense of clarity.

    1. eLife Assessment

      This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based microendoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors provide convincing evidence showing that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (esp. Antonini et al, 2020; eLife), filling out the quiver of available extended-field-of-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.

      Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in piriform cortex, which is difficult to access, especially in chronic preparations.

      The design, characterization, and simulations are clear and thorough, but they do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes to their own research.

      Strengths:

      The text is clearly written, the ex vivo analysis is thorough and well supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.

      The authors did a good job of addressing issues I raised in initial review, including analyses of chromaticity and the axial field of view, descriptions of manufacturing and assembly yield, explanations in the text of differences between ex vivo and in vivo imaging conditions, and basic analysis of the in vivo recordings relative to odor presentations. They have also shortened the text, reduced repetition, and better motivated their approach in the introduction.

      Weaknesses:

      As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate. I see no reason to include these inaccurate measurements.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.

      This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral region of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.

      Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.

      The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.

      Comments on revisions:

      The authors have clearly addressed all my comments.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents the development, characterization and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two microendoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.

      Strengths:

      (1) The paper is generally clear and well written. The scientific approach is well structured and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected microendoscopes:

      - PSFs measured with corrected microendoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected microendoscopes.

      - Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected microendoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.

      - Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.

      (2) There is a strong need for high quality microendoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient and potentially easy to disseminate within the neuroscience community.

      Weaknesses:

      Weaknesses that were present in the first version of the paper were carefully addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Life Assessment

      This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based micro endoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors show that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. The evidence supporting the claims of the authors is solid, although some aspects of the manuscript should be clarified and missing information provided. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.

      We thank the Referees for their interest in the paper and for the constructive feedback. We have taken the time necessary to address all of their comments, acquiring new data and performing additional analyses. With the inclusion of these new results, we modified four main figures (Figures 1, 6, 7, and 8), added three new Supplementary Figures (Supplementary Figures 1, 2, and 3), and significantly edited the text. Based on the additional work suggested by the Referees, we believe that we have improved our manuscript, provided missing information, and clarified some aspects of the manuscript, which the Referees pointed our attention to.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Referee’s comment: Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (e.g. Antonini et al, 2020; eLife), filling out the quiver of available extended-fieldof-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.

      Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in the piriform cortex, which is difficult to access, especially in chronic preparations.

      The design, characterization, and simulations are clear and thorough, but not exhaustive (see below), and do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications not mentioned in the present text such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes.

      Strengths:

      The text is clearly written, the ex vivo analysis is thorough and well-supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.

      Weaknesses:

      Referee’s comment: (1) The novelty of the present work over previous efforts from the same group is not well explained. What needed to be done differently to correct these longer GRIN lenses?

      We thank the Referee for the positive evaluation of our work. The optical properties of GRIN lenses depend on the geometrical and optical features of the specific GRIN lens type considered, i.e. its diameter, length, numerical aperture, pitch, and radial modulation of the refractive index. Our approach is based on the addition of a corrective optical element at the back end of the GRIN lens to compensate for aberrations that light encounters as it travels through the GRIN lens. The corrective optical element must, therefore, be specifically tailored to the specific GRIN lens type we aim to correct the aberrations of. The novelty of the present article lies in the successful execution of the ray-trace simulations and two-photon lithography fabrication of corrective optical elements necessary to achieve aberration correction in the two novel and long GRIN lens types, i.e. NEM-050-25-15-860-S-1.5p and NEM-050-23-15-860-S-2.0p (GRIN length, 6.4 mm and 8.8 mm, respectively). Our previous work (Antonini et al. eLife 2020) demonstrated aberration correction with GRIN lenses shorter than 4.1 mm. The design and fabrication of a single corrective optical element suitable to enlarge the field-of-view (FOV) in these longer GRIN lenses is not obvious, especially because longer GRIN lenses are affected by stronger aberrations. To better clarify this point, we revised the Introduction at page 5 (lines 3-10 from bottom) as follows:

      “Recently, a novel method based on 3D microprinting of polymer optics was developed to correct for GRIN aberrations by placing specifically designed aspherical corrective lenses at the back end of the GRIN lens 7. This approach is attractive because it is built-in on the GRIN lens and corrected microendoscopes are ready-to-use, requiring no change in the optical set-up. However, previous work demonstrated the feasibility of this method only for GRIN lenses of length < 4.1 mm 7, which are too short to reach the most ventral regions of the mouse brain. The applicability of this technology to longer GRIN lenses, which are affected by stronger optical aberrations 19, remained to be proven.”

      (2) Some strong motivations for the method are not presented. For example, the introduction (page 3) focuses on identifying neurons with different coding properties, but this can be done with electrophysiology (albeit with different strengths and weaknesses). Compared to electrophysiology, optical methods more clearly excel at genetic targeting, subcellular measurements, and molecular specificity; these could be mentioned.

      Thank you for the comment. We added a paragraph in the Introduction (page 3, lines 2-8) according to what suggested by the Reviewer:

      “High resolution 2P fluorescence imaging of the awake brain is a fundamental tool to investigate the relationship between the structure and the function of brain circuits 1. Compared to electrophysiological techniques, functional imaging in combination with genetically encoded indicators allows monitoring the activity of genetically targeted cell types, access to subcellular compartments, and tracking the dynamics of many biochemical signals in the brain (2). However, a critical limitation of multiphoton microscopy lies in its limited (< 1 mm) penetration depth in scattering biological media 3”.

      Another example, in comparing microfabricated lenses to other approaches, an unmentioned advantage is miniaturization and potential application to mini-2P microscopes, which use GRIN lenses.

      We added the concept suggested by the Reviewer in the Discussion (page 21, lines 4-7 from bottom). The text now reads:

      “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes 42-44, allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.

      (3) Some potentially useful information is lacking, leaving critical questions for potential adopters:

      How sensitive is the assembly to decenter between the corrective optic and the GRIN lens?

      Following the Referee’s comment, we conducted new optical simulations to evaluate the decrease in optical performance of the corrected endoscopes as a function of the radial shift of the corrective lens from the optical axis of the GRIN rod (decentering, new Supplementary Figure 3), using light rays passing either off- or on-axis. For off-axis rays, we found that the Strehl ratio remained above 0.8 (Maréchal criterion) for positive translations in the range 6-11.5 microns and 16-50 microns for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, while the Strehl ratio decreased below 0.8 for negative translations of amplitude ~ 5 microns. Please note that for the most marginal rays, a negative translation produces a mismatch between the corrective microlens and the GRIN lens such that the light rays no longer pass through the corrective lens. In contrast, rays passing near the optical axis were still focused by the corrected probe with Strehl ratio above 0.8 in a range of radial shifts of -40 – 40 microns for both microendoscope types. Altogether, these novel simulations suggest that decentering between the corrective microlens and the GRIN lens < 5 microns do not majorly affect the optical properties of the corrected endoscopes. These new results are now displayed in Supplementary Figure 3 and described on page 7 (lines 3-5 from bottom).

      What is the yield of fabrication and of assembly?

      The fabrication yield using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with a stereomicrscope and, in case of air bubble formation, they were discarded.

      The assembly yield, i.e. correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).

      We added this information in the Methods at page 29 (lines 1-12), as follows:

      “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”. 

      Supplementary Figure 1: Is this really a good agreement between the design and measured profile? Does the figure error (~10 um in some cases on average) noticeably degrade the image?

      As the Reviewer correctly noticed, the discrepancy between the simulated profile and the experimentally measured profile can be up to 5-10 microns at specific radial positions. This discrepancy could be due to issues with: (i) the fabrication of the microlens; (ii) the experimental measurement of the lens profile with the stylus profilometer. To discriminate among these two possibilities, we asked what would be the expected optical properties of the corrected endoscope should the corrective lens have the experimentally measured (not the simulated) profile. To this aim, we performed new optical simulations of the point spread function (PSF) of the corrected probe using, as corrective microlens profile, the average, experimentally measured, profile of a fabricated corrective lens. For both microendoscope types, we first fitted the mean experimentally measured profile of the fabricated lens with the aspherical function reported in equation (1) of the main text:

      where:

      -                is the radial distance from the optical axis;

      -                is equal to 1⁄ , where R is the radius of curvature;

      -                is the conic constant;

      -                − are asphericity coefficients;

      -                is the height of the microlens profile on-axis.

      The fitting values of the parameters of equation (1) for the two lenses are reported for the Referee’s inspection here below (variables describing distances are expressed in mm):

      Author response table 1.

      Fitting values for the parameters of Equation (1) describing the profile of corrective microlens replicas measured with the stylus profilometer. Distances are expressed in mm.

      We then assumed that the profile of the corrective microlenses were equal to the mean experimentally measured profiles and used the aspherical fitting functions in the optical simulations to compute the performance of corrected microendoscopes. For both microendoscope types, we found that the Strehl ratio was lower than 0.35, well below the theoretical diffractionlimited threshold of 0.8 (Maréchal criterion) at moderate distances from the optical axis (68 μm94 μm and 67 μm-92 μm on the focal plane in the object space, after the front end of the GRIN lens, for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, Author response image 1A, C), and the PSF was strongly distorted (Author response image 1B, D).

      Author response image 1.

      Simulated optical performance of corrected probes with profiles of corrective microlenses equal to the mean experimentally measured profiles of fabricated corrective lenses. A) The Strehl ratio for the 6.4 mm-long corrected microendoscope with measured microlens profile (black dots) is computed on-axis (distance from the center of the FOV d = 0 µm) and at two radial distances off-axis (d = 68 μm and 94 μm on the focal plane in the object space) and compared to the Strehl ratio of the uncorrected (red line) and corrected (blue line) microendoscopes. B) Lateral (x,y) and axial (x,z) fluorescence intensity (F) profiles of simulated PSFs on-axis (left) and off-axis (right, at the indicated distance d computed on the focal plane in the object space) for the 6.4 mm-long corrected microendoscope with measured microlens profile. C) Same as in (A) for the 8.8 mm-long corrected microendoscope (off-axis d = 67 μm and 92 μm on the focal plane in the object space). D) Same as in (B) for the 8.8 mm-long corrected microendoscope.

      These simulated findings are in contrast with the experimentally measured optical properties of our corrected endoscopes (Figure 3). In other words, these novel simulated results show that experimentally measured profiles of the corrected lenses are incompatible with the experimental measurements of the optical properties of the corrected endoscopes. Therefore, our experimental recording of the lens profile shown in Supplementary Figure 1 of the first submission (now Supplementary Figure 4) should be used only as a coarse measure of the lens shape and cannot be used to precisely compare simulated lens profiles with measured lens profiles.

      How do individual radial profiles compare to the presented means?

      We provide below a modified version of Supplementary Figure 4 (Supplementary Figure 1 in the first submission), where individual profiles measured with the stylus profilometer and the mean profile are displayed for both microendoscope types (Author response image 2). In the manuscript (Supplementary Figure 4), we would suggest to keep showing mean profiles ± standard errors of the mean, as we did in the original submission.

      Author response image 2.

      Characterization of polymeric corrective lens replicas. A) Stylus profilometer measurements were performed along the radius of the corrective polymer microlens replica for the 6.4 mm-long corrected microendoscope. Individual measured profiles (grey solid lines) obtained from n = 3 profile measurements on m = 3 different corrective lens replicas, plus the mean profile (black solid line) are displayed. B) Same as (A) for the 8.8 mm-long microendoscope.

      What is the practical effect of the strong field curvature? Are the edges of the field, which come very close to the lens surface, a practical limitation?

      A first practical effect of the field curvature is that structures at different z coordinates are sampled. The observed field curvature of corrected endoscopes may therefore impact imaging in brain regions characterized by strong axially organized anatomy (e.g., the pyramidal layer of the hippocampus), but would not significantly affect imaging in regions with homogeneous cell density within the axial extension of the field curvature (< 170 µm, see more details below). A second consequence of the field curvature, as the Referee correctly points out, is that cell at the border of the FOV are closer to the front end of the GRIN lens. In measurements of subresolved fluorescent layers (Figure 3A-D), we observed that the field curvature extends in the axial direction to ~ 110 μm and ~170 μm for the 6.4 mm- and the 8.8 mm-long microendoscopes, respectively. Considered that the nominal working distances on the object side of the 6.4 mm- and the 8.8 mm-long microendoscopes were, respectively, 210 μm and 178 μm (Table 3), structures positioned at the very edge of the FOV were ~ 100 μm and ~ 8 μm away from the GRIN front end for the 6.4 mm-long and for the 8.8 mm-long probe, respectively. Previous studies have shown that brain tissue within 50-100 μm from the GRIN front end may show signs of tissue reaction to the implant (Curreli et al. PLOS Biology 2022, Attardo et al. Nature 2015). Therefore, structures at the very edge of the FOV of the 8.8 mm-long endoscopes, but not those at the edge of the 6.4 mm-long endoscopes, may be within the volume showing tissue reaction. We added a paragraph in the text to discuss these points (page 18 lines 10-14).

      The lenses appear to be corrected for monochromatic light; high-performance microscopes are generally achromatic. Is the bandwidth of two-photon excitation sufficient to warrant optimization over multiple wavelengths?

      Thanks for this comment. All optical simulations described in the first submission were performed at a fixed wavelength (λ = 920 nm). Following the Referee’s request, we explored the effect of changing wavelength on the Strehl ratio using new optical simulations. We found that the Strehl ratio remains > 0.8 at least within ± 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained, but at different z planes (new Supplementary Figure 1A-D, right panels). This means that the corrective lens is working as expected also for wavelengths which are different from 920 nm, with different wavelengths having the most enlarged FOV located at different working distances. These new results are now described on page 7 (lines 8-10).

      GRIN lenses are often used to access a 3D volume by scanning in z (including in this study). How does the corrective lens affect imaging performance over the 3D field of view?

      The optical simulations we did to design the corrective lenses were performed maximizing aberration correction only in the focal plane of the endoscope. Following the Referee’s comment, we explored the effect of aberration correction outside the focal plane using new optical simulations. In corrected endoscopes, we found that for off-axis rays (radial distance from the optical axis > 40 μm) the Strehl ratio was > 0.8 (Maréchal criterion) in a larger volume compared to uncorrected endoscopes (new Supplementary Figure 2), demonstrating that the aberration correction method developed in this study does extend beyond the focal plane for short distances. For example, at a radial distance of ~ 90 μm from the optical axis, the axial range in which the Strehl ratio was > 0.8 in corrected endoscopes was 28 μm and 19 μm for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. These new results are now described on page 7 (10-19).

      (4) The in vivo images (Figure 7D) have a less impressive resolution and field than the ex vivo images (Figure 4B), and the reason for this is not clear. Given the difference in performance, how does this compare to an uncorrected endoscope in the same preparation? Is the reduced performance related to uncorrected motion, field curvature, working distance, etc?

      In comparing images in Figure 4B with images shown in Figure 7D, the following points should be considered:

      (1) Figure 4B is a maximum fluorescence intensity projection of multiple axial planes of a z-stack acquired through a thin brain slice (slice thickness: 50 µm) using 8 frame averages for each plane. In contrast, images in Figure 7D are median projection of a t-series acquired on a single plane in the awake mouse at 30 Hz resonant scanning imaging (8 min, 14,400 frames).

      (2) Images of the fixed brain slice in Figure 4B were acquired at 1024 pixels x 1024 pixels resolution, nominal pixel size 0.45 µm/pixel, and with objective NA = 0.50, whereas in vivo images in Figure 7D were acquired at 512 pixels x 512 pixels resolution, nominal pixel size 0.72 - 0.84 µm/pixel, and with objective NA = 0.45.

      (3) In the in vivo preparation (Figure 7D), excitation and emission light travel through > 180 µm of scattering and absorbing brain tissue, reducing spatial resolution and the SNR of the collected fluorescence signal.

      (4) By shifting the sample in the x, y plane, in Figure 4B we could chose a FOV containing homogenously stained cells. x, y shifting and selecting across multiple FOVs was not possible in vivo, as the GRIN lens was cemented on the animal skull.

      (5) Images in Figure 7D were motion corrected, but we cannot exclude that part of the decrease in resolution observed in Figure 7D when compared to images in Figure 4B are due to incomplete correction of motion artifacts.

      For all the reasons listed above, we believe that it is expected to see smaller resolution and contrast in images recorded in vivo (Figure 7D) compared to images acquired in fixed tissue (Figure 4B).

      Regarding the question of how do images from an uncorrected and a corrected endoscopes compared in vivo, we think that this comparison is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors. Moreover, the major advantage of quantifying how the optical properties of uncorrected and corrected endoscopes impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible in the in vivo recordings.

      Regarding Figure 7, there is no analysis of the biological significance of the calcium signals or even a description of where olfactory stimuli were presented.

      We appreciate the Reviewer pointing out the lack of detailed analysis regarding the biological significance of the calcium signals and the presentation of olfactory stimuli in Figure 7. Our initial focus was on demonstrating the effectiveness of the optimized GRIN lenses for imaging deep brain areas like the piriform cortex, with an emphasis on the improved signal-tonoise ratio (SNR) these lenses provide. However, we agree that including more context about the experimental conditions would enhance the manuscript. To address this point, we added a new panel (Figure 7F) showing calcium transients aligned with the onset of olfactory stimulus presentations, which are now indicated by shaded light blue areas. Additionally, we have specified the timing of each stimulus presented in Figure 7E. This revision allows readers to better understand the relationship between the calcium signals and the olfactory stimuli.

      The timescale of jGCaMP8f signals in Figure 7E is uncharacteristically slow for this indicator (compared to Zhang et al 2023 (Nature)), though perhaps this is related to the physiology of these cells or the stimuli.

      Regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the original manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals.

      (5) The claim of unprecedented spatial resolution across the FOV (page 18) is hard to evaluate and is not supported by references to quantitative comparisons. The promises of the method for future studies (pages 18-19) could also be better supported by analysis or experiment, but these are minor and to me, do not detract from the appeal of the work.

      GRIN lens-based imaging of piriform cortex in the awake mouse had already been done in Wang et al., Neuron 2020. The GRIN lens used in that work was NEM-050-50-00920-S-1.5p (GRINTECH, length: 6.4 mm; diameter: 0.5 mm), similar to the one that we used to design the 6.4 mm-long corrected microendoscope. Here we used a microendoscope specifically design to correct off-axis aberrations and enlarge the FOV, in order to maximize the number of neurons recorded with the highest possible spatial resolution, while keeping the tissue invasiveness to the minimum. Following the Referee’s comments, we revised the sentence at page 19 (lines 68 from bottom) as follows:

      “We used long corrected microendoscopes to measure population dynamics in the olfactory cortex of awake head-restrained mice with unprecedented combination of high spatial resolution across the FOV and minimal invasiveness(17)”.

      (6) The text is lengthy and the material is repeated, especially between the introduction and conclusion. Consolidating introductory material to the introduction would avoid diluting interesting points in the discussion.

      We thank the Reviewer for this comment. As suggested, we edited the Introduction and shortened the Discussion.

      Reviewer #2 (Public review):

      In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.

      This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral regions of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.

      Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.

      The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.

      We thank the Referee for the positive comments on our study. We address the points indicated by the Referee in the “Recommendation to the authors” section below.

      Reviewer #3 (Public review):

      Summary:

      This work presents the development, characterization, and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two micro endoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.

      Strengths:

      (1) The paper is generally clear and well-written. The scientific approach is well structured and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected micro endoscopes:

      a) PSFs measured with corrected micro endoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected micro endoscopes.

      b) Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected micro endoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.

      c) Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.

      (2) There is a strong need for high-quality micro endoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient, and potentially easy to disseminate within the neuroscience community.

      Weaknesses:

      (1) Many points need to be clarified/discussed. Here are a few examples:

      a) It is written in the methods: “The uncorrected microendoscopes were assembled either using different optical elements compared to the corrected ones or were obtained from the corrected

      probes after the mechanical removal of the corrective lens.”

      This is not very clear: the uncorrected microendoscopes are not simply the unmodified GRIN lenses?

      We apologize for not been clear enough on this point. Uncorrected microendoscopes are not simply unmodified GRIN lenses, rather they are GRIN lenses attached to a round glass coverslip (thickness: 100 μm). The glass coverslip was included in ray-trace optical simulations of the uncorrected system and this is the reason why commercial GRIN lenses and corresponding uncorrected microendoscopes have different working distances, as reported in Tables 2-3. To make the text clearer, we added the following sentence at page 27 (last 4 lines):

      “To evaluate the impact of corrective microlenses on the optical performance of GRIN-based microendoscopes, we also simulated uncorrected microendoscopes composed of the same optical elements of corrected probes (glass coverslip and GRIN rod), but in the absence of the corrective microlens”.

      b) In the results of the simulation of neuronal activity (Figure 5A, for example), the neurons in the center of the FOV have a very large diameter (of about 30µm). This should be discussed.

      Thanks for this comment. In synthetic calcium imaging t-series, cell radii were randomly sampled from a Gaussian distribution with mean = 10 µm and standard deviation (SD) = 3 µm. Both values were estimated from the literature (ref. no. 28: Suzuki & Bekkers, Journal of Neuroscience, 2011) as described in the Methods (page 35). In the image shown in Figure 5A, neurons near to the center of the FOV have radius of ~ 20 µm corresponding to the right tail of the distribution (mean + 3SD = 19 µm). It is also important to note that, for corrected microendoscopes, neurons in the central portion of the FOV appear larger than cells located near the edges of the FOV, because the magnification depends on the distance from the optical axis (see Figure 3E, F) and near the center the magnification is > 1 for both microendoscope types.

      Also, why is the optical resolution so low on these images?

      Images shown in Figure 5 are median fluorescence intensity projections of 5 minute-long simulated t-series. Simulated calcium data were generated with pixel size 0.8 μm/pixel and frame rate 30 Hz, similarly to in vivo recordings. In the simulations, pixels not belonging to any cell soma were assigned a value of background fluorescence randomly sampled from a normal distribution with mean and standard deviation estimated from experimental data, as described in the Methods section (page 37). To simulate activity, the mean spiking rate of neurons was set to 0.3 Hz, thus in a large fraction of frames neurons do not show calcium transients. Therefore, the median fluorescence intensity value of somata will be close to their baseline fluorescence value (_F_0). Since in simulations F0 values (~ 45-80 a.u.) were not much higher than the background fluorescence level (~ 45 a.u.), this may generate the appearance of low contrast image in Figure 5A. Finally, we suspect that PDF rendering also contributed to degrade the quality of those images. We will now submit high resolution images alongside the PDF file.

      c) It seems that we can't see the same neurons on the left and right panels of Figure 5D. This should be discussed.

      The Referee is correct. When we intersected the simulated 3D volume of ground truth neurons with the focal surface of microendoscopes, the center of the FOV for the 8.8 mmlong corrected microendoscope was located at a larger depth than the FOV of the 8.8 mm uncorrected microendoscope. This effect was due to the larger field curvature of corrected 8.8 mmlong endoscopes compared to 8.8 mm-long uncorrected endoscopes. This is the reason why different neurons were displayed for uncorrected and corrected endoscopes in Figure 5D. We added this explanation in the text at page 37 (lines 1-4). The text reads:

      “Due to the stronger field curvature of the 8.8 mm-long corrected microendoscope (Figure 1C) compared to 8.8 mm-long uncorrected microendoscopes, the center of the corrected imaging focal surface resulted at a larger depth in the simulated volume compared to the center of the uncorrected focal surface(s). Therefore, different simulated neurons were sampled in the two cases”.

      d) It is not very clear to me why in Figure 6A, F the fraction of adjacent cell pairs that are more correlated than expected increases as a function of the threshold on peak SNR. The authors showed in Supplementary Figure 3B that the mean purity index increases as a function of the threshold on peak SNR for all micro endoscopes. Therefore, I would have expected the correlation between adjacent cells to decrease as a function of the threshold on peak SNR. Similarly, the mean purity index for the corrected short microendoscope is close to 1 for high thresholds on peak SNR: therefore, I would have expected the fraction of adjacent cell pairs that are more correlated than expected to be close to 0 under these conditions. It would be interesting to clarify these points.

      Thanks for raising this point. We defined the fraction of adjacent cell pairs more correlated than expected as the number of adjacent cell pairs more correlated than expected divided by the number of adjacent cell pairs. The reason why this fraction raises as a function of the SNR threshold is shown in Supplementary Figure 2 in the first submission (now Supplementary Figure 5). There, we separately plotted the number of adjacent cell pairs more correlated than expected (numerator) and the number of adjacent cell pairs (denominator) as a function of the SNR threshold. For both microendoscope types, we observed that the denominator more rapidly decreased with peak SNR threshold than the numerator. Therefore, the fraction of adjacent cell pairs more correlated than expected increases with the peak SNR threshold.

      To understand why the denominator decreases with SNR threshold, it should be considered that, due to the deterioration of spatial resolution and attenuation of fluorescent signal collection as a function of the radial distance from the optical axis (see for example fluorescent film profiles in Figure 3A, C), increasing the threshold on the peak SNR of extracted calcium traces implies limiting cell detection to those cells located within smaller distance from the center of the FOV. This information is shown in Figure 5C, F.

      In the manuscript text, this point is discussed at page 12 (lines 1-3 from bottom) and page 13 (lines 1-4):

      “The fraction of pairs of adjacent cells (out of the total number of adjacent pairs) whose activity correlated significantly more than expected increased as a function of the SNR threshold for corrected and uncorrected microendoscopes of both lengths (Fig. 6A, F). This effect was due to a larger decrease of the total number of pairs of adjacent cells as a function of the SNR threshold compared to the decrease in the number of pairs of adjacent cells whose activity was more correlated than expected (Supplementary Figure 5)”.

      e) Figures 6C, H: I think it would be fairer to compare the uncorrected and corrected endomicroscopes using the same effective FOV.

      To address the Reviewer’s concern, we repeated the linear regression of purity index as a function of the radial distance using the same range of radial distances for the uncorrected and corrected case of both microendoscope types. Below, we provide an updated version of Figure 6C, H for the referee’s perusal. Please note that the maximum value displayed on the x-axis of both graphs is now corresponding to the minimum value between the two maximum radial distance values obtained in the uncorrected and corrected case (maximum radial distance displayed: 151.6 µm and 142.1 μm for the 6.4 mm- and the 8.8 mm-long GRIN rod, respectively). Using the same effective FOV, we found that the purity index drops significantly more rapidly with the radial distance for uncorrected microendoscopes compared to the corrected ones, similarly to what observed in the original version of Figure 6. The values of the linear regression parameters and statistical significance of the difference between the slopes in the uncorrected and corrected cases are stated in the Author response image 3 caption below for both microendoscope types. In the manuscript, we would suggest to keep showing data corresponding to all detected cells, as we did in the original submission.

      Author response image 3.

      Linear regression of purity index as a function of the radial distance. A) Purity index of extracted traces with peak SNR > 10 was estimated using a GLM of ground truth source contributions and plotted as a function of the radial distance of cell identities from the center of the FOV for n = 13 simulated experiments with the 6.4 mm-long uncorrected (red) and corrected (blue) microendoscope. Black lines represent the linear regression of data ± 95% confidence intervals (shaded colored areas). Maximum value of radial distance displayed: 151.6 μm. Slopes ± standard error (s.e.): uncorrected, (-0.0015 ± 0.0002) µm-1; corrected, (-0.0006 ± 0.0001) μm-1. Uncorrected, n = 991; corrected, n = 1156. Statistical comparison of slopes, p < 10<sup>-10</sup>, permutation test. B) Same as (A) for n = 15 simulated experiments with the 8.8 mm-long uncorrected and corrected microendoscope. Maximum value of radial distance displayed: 142.1 μm. Slopes ± s.e.: uncorrected, (-0.0014 ± 0.0003) μm-1; corrected, (-0.0010 ± 0.0002) µm-1. Uncorrected, n = 718; corrected, n = 1328. Statistical comparison of slopes, p = 0.0082, permutation test.

      f) Figure 7E: Many calcium transients have a strange shape, with a very fast decay following a plateau or a slower decay. Is this the result of motion artefacts or analysis artefacts?

      Thank you for raising this point about the unusual shapes of the calcium transients in Figure 7E. The observed rapid decay following a plateau or a slower decay is indeed a result of how the data were presented in the original submission. Our experimental protocol consisted of 22 s-long trials with an inter-trial interval of 10 s (see Methods section, page 44). In the original figure, data from multiple trials were concatenated, which led to artefactual time courses and apparent discontinuities in the calcium signals. To resolve this issue, we revised Figure 7E to accurately represent individual concatenated trials. We also added a new panel (please see new Figure 7F) showing examples of single cell calcium responses in individual trials without concatenation, with annotations indicating the timing and identity of presented olfactory stimuli.

      Also, the duration of many calcium transients seems to be long (several seconds) for GCaMP8f. These points should be discussed.

      Author response: regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study, but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals. We cite these references in the text. We believe that these revisions and clarifications address the Reviewer's concern and enhance the overall clarity of our manuscript.

      g) The authors do not mention the influence of the neuropil on their data. Did they subtract the neuropil's contribution to the signals from the somata? It is known from the literature that the presence of the neuropil creates artificial correlations between neurons, which decrease with the distance between the neurons (Grødem, S., Nymoen, I., Vatne, G.H. et al. An updated suite of viral vectors for in vivo calcium imaging using intracerebral and retro-orbital injections in male mice. Nat Commun 14, 608 (2023). https://doi.org/10.1038/s41467-023-363243; Keemink SW, Lowe SC, Pakan JMP, Dylda E, van Rossum MCW, Rochefort NL. FISSA: A neuropil decontamination toolbox for calcium imaging signals. Sci Rep. 2018 Feb 22;8(1):3493.

      doi: 10.1038/s41598-018-21640-2. PMID: 29472547; PMCID: PMC5823956)

      This point should be addressed.

      We apologize for not been clear enough in our previous version of the manuscript. The neuropil was subtracted from calcium traces both in simulated and experimental data. Please note that instead of using the term “neuropil”, we used the word “background”. We decided to use the more general term “background” because it also applies to the case of synthetic calcium tseries, where neurons were modeled as spheres devoid of processes. The background subtraction is described in the Methods on page 39:

      F(t) was computed frame-by-frame as the difference between the average signal of pixels in each ROI and the background signal. The background was calculated as the average signal of pixels that: i) did not belong to any bounding box; ii) had intensity values higher than the mean noise value measured in pixels located at the corners of the rectangular image, which do not belong to the circular FOV of the microendoscope; iii) had intensity values lower than the maximum value of pixels within the boxes”.

      h) Also, what are the expected correlations between neurons in the pyriform cortex? Are there measurements in the literature with which the authors could compare their data?

      We appreciate the reviewer's interest in the correlations between neurons in the piriform cortex. The overall low correlations between piriform neurons we observed (Figure 8) are consistent with a published study describing ‘near-zero noise correlations during odor inhalation’ in the anterior piriform cortex of rats, based on extracellular recordings (Miura et al., Neuron 2013). However, to the best of our knowledge, measurements directly comparable to ours have not been described in the literature. Recent analyses of the correlations between piriform neurons were restricted to odor exposure windows, with the goal to quantify odor-specific activation patterns (e.g. Roland et al., eLife 2017; Bolding et al., eLife 2017, Pashkovski et al., Nature 2020; Wang et al., Neuron 2020). Here, we used correlation analyses to characterize the technical advancement of the optimized GRIN lens-based endoscopes. We showed that correlations of pairs of adjacent neurons were independent from radial distance (Figure 8B), highlighting homogeneous spatial resolution in the field of view.

      (2) The way the data is presented doesn't always make it easy to compare the performance of corrected and uncorrected lenses. Here are two examples:

      a) In Figures 4 to 6, it would be easier to compare the FOVs of corrected and uncorrected lenses if the scale bars (at the centre of the FOV) were identical. In this way, the neurons at the centre of the FOV would appear the same size in the two images, and the distances between the neurons at the centre of the FOV would appear similar. Here, the scale bar is significantly larger for the corrected lenses, which may give the illusion of a larger effective FOV.

      We appreciate the Referee’s comment. Below, we explain why we believe that the way we currently present imaging data in the manuscript is preferable:

      (1) current figures show images of the acquired FOV as they are recorded from the microscope (raw data), without rescaling. In this way, we exactly show what potential users will obtain when using a corrected microendoscope.

      (2) In the current version of the figures, the fact that the pixel size is not homogeneous across the FOV, nor equal between uncorrected and corrected microendoscopes, is initially shown in Figure 3E, F and then explicitly stated throughout the manuscript when images acquired with a corrected microendoscope are shown.

      (3) Rescaling images acquired with the corrected endoscopes gives the impression that the acquisition parameters were different between acquisitions with the corrected and uncorrected microendoscopes, which was not the case.

      Importantly, the larger FOV of the corrected microendoscope, which is one of the important technological achievements presented in this study, can be appreciated in the images regardless of the presentation format.

      b) In Figures 3A-D it would be more informative to plot the distances in microns rather than pixels. This would also allow a better comparison of the micro endoscopes (as the pixel sizes seem to be different for the corrected and uncorrected micro endoscopes).

      The Referee is correct that the pixel size is different between the corrected and uncorrected probes. This is because of the different magnification factor introduced by the corrective microlens, as described in Figure 3E, F. The rationale for showing images in Figure 3AD in pixels rather than microns is the following:

      (1) Optical simulations in Figure 1 suggest that a corrective optical element is effective in compensating for some of the optical aberrations in GRIN microendoscopes.

      (2) After fabricating the corrective optical element (Figure 2), in Figure 3A-D we conduct a preliminary analysis of the effect of the corrective optical element on the optical properties of the GRIN lens. We observed that the microfabricated optical element corrected for some aberrations (e.g., astigmatism), but also that the microfabricated optical element was characterized by significant field curvature. This can be appreciated showing distances in pixels.

      (3) The observed field curvature and the aspherical profile of the corrected lens prompted us to characterize the magnification factor of the corrected endoscopes as a function of the radial distance. We found that the magnification factor changed as a function of the radial distance (Figure 3E-F) and that pixel size was different between uncorrected and corrected endoscopes. We also observed that, in corrected endoscopes, pixel size was a function of the radial distance (Figure 3E-F).

      (4) Once all of the above was established and quantified, we assigned precise pixel size to images of uncorrected and corrected endoscopes and we show all following images of the study (Figure 3G on) using a micron (rather than pixel) scale.

      (3) There seems to be a discrepancy between the performance of the long lenses (8.8 mm) in the different experiments, which should be discussed in the article. For example, the results in Figure 4 show a considerable enlargement of the FOV, whereas the results in Figure 6 show a very moderate enlargement of the distance at which the person's correlation with the first ground truth emitter starts to drop.

      Thanks for raising this point and helping us clarifying data presentation. Images in Figure 4B are average z-projections of z-stacks acquired through a mouse fixed brain slice and they were taken with the purpose of showing all the neurons that could be visualized from the same sample using an uncorrected and a corrected microendoscope. In Figure 4B, all illuminated neurons are visible regardless of whether they were imaged with high axial resolution (e.g., < 10 µm as defined in Figure 3J) or poor axial resolution. In contrast, in Figure 6J we evaluated the correlation between the calcium trace extracted from a given ROI and the real activity trace of the first simulated ground truth emitter for that specific ROI. The moderate increase in the correlation for the corrected microendoscope compared to the uncorrected microendoscope (Figure 6J) is consistent with the moderate improvement in the axial resolution of the corrected probe compared to the uncorrected probe at intermediate radial distances (60-100 µm from the optical axis, see Figure 3J). We added a paragraph in the Results section (page 14, lines 8-18) to summarize the points described above.

      a) There is also a significant discrepancy between measured and simulated optical performance, which is not discussed. Optical simulations (Figure 1) show that the useful FOV (defined as the radius for which the size of the PSF along the optical axis remains below 10µm) should be at least 90µm for the corrected microendoscopes of both lengths. However, for the long microendoscopes, Figure 3J shows that the axial resolution at 90µm is 17µm. It would be interesting to discuss the origin of this discrepancy: does it depend on the microendoscope used?

      As the Reviewer correctly pointed out, the size of simulated PSFs at a given radial distance (e.g., 90 µm) tends to be generally smaller than that of the experimentally measured PSFs. This might be due to multiple reasons:

      (1) simulated PSFs are excitation PSFs, i.e. they describe the intensity spatial distribution of focused excitation light. On the contrary, measured PSFs result from the excitation and emission process, thus they are also affected by aberrations of light emitted by fluorescent beads and collected by the microscope.

      (2) in the optical simulations, the Zemax file of the GRIN lenses contained first-order aberrations. High-order aberrations were therefore not included in simulated PSFs.

      (3) intrinsic variability of experimental measurements (e.g., intrinsic variability of the fabrication process, alignment of the microendoscope to the optical axis of the microscope, the distance between the GRIN back end and the objective…) are not considered in the simulations.

      We added a paragraph in the Discussion section (page 17, lines 9-18) summarizing the abovementioned points.

      Are there inaccuracies in the construction of the aspheric corrective lens or in the assembly with the GRIN lens? If there is variability between different lenses, how are the lenses selected for imaging experiments?

      The fabrication yield, i.e. the yield of generating the corrective lenses, using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with the stereoscope and, in case of air bubble formation, they were discarded.

      The assembly yield, i.e. the yield of correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).

      We added this information in the Methods at page 29 (lines 1-12), as follows:

      “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”.

      Reviewer #1 (Recommendations for the authors):

      (1) Page 4, what is meant by 'ad-hoc" in describing software control?

      With “ad-hoc” we meant “specifically designed”. We revised the text to make this clear.

      (2) It was hard to tell how the PSF was modeled for the simulations (especially on page 34, describing the two spherical shells of the astigmatic PSF and ellipsoids modeled along them). Images or especially videos that show the modeling would make this easier to follow.

      Simulated calcium t-series were generated following previous work by our group (Antonini et al., eLife 2020), as stated in the Methods on page 37 (line 5). In Figure 4A of Antonini et al. eLife 2020, we provided a schematic to visually describe the procedure of simulated data generation. In the present paper, we decided not to include a similar drawing and cite the eLife 2020 article to avoid redundancy.

      (3) Some math symbols are missing from the methods in my version of the text (page 36/37).

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it at the time of submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      (4) The Z extent of stacks (i.e. number of steps) used to generate images in Figure 4 is missing.

      We thank the Reviewer for the comment and we now revised the caption of Figure 4 and the Methods section as follows:

      “Figure 4. Aberration correction in long GRIN lens-based microendoscopes enables highresolution imaging of biological structures over enlarged FOVs. A) jGCaMP7f-stained neurons in a fixed mouse brain slice were imaged using 2PLSM (λexc = 920 nm) through an uncorrected (left) and a corrected (right) microendoscope based on the 6.4 mm-long GRIN rod. Images are maximum fluorescence intensity (F) projections of a z-stack acquired with a 5 μm step size. Number of steps: 32 and 29 for uncorrected and corrected microendoscope, respectively. Scale bars: 50 μm. Left: the scale applies to the entire FOV. Right, the scale bar refers only to the center of the FOV; off-axis scale bar at any radial distance (x and y axes) is locally determined multiplying the length of the drawn scale bar on-axis by the corresponding normalized magnification factor shown in the horizontal color-coded bar placed below the image (see also Fig. 3, Supplementary Table 3, and Materials and Methods for more details). B) Same results for the microendoscope based on the 8.8 mm-long GRIN rod. Number of steps: 23 and 31 for uncorrected and corrected microendoscope, respectively”.

      We also modified the text in the Methods (page 35, lines 1-2):

      “(1024 pixels x 1024 pixels resolution; nominal pixel size: 0.45 µm/pixel; axial step: 5 µm; number of axial steps: 23-32; frame averaging = 8)”.

      (5) Overall, the text is wordy and a bit repetitive and could be cut down significantly in length without loss of clarity. This is true throughout, but especially when comparing the introduction and discussion.

      We edited the text (Discussion and Introduction), as suggested by the Reviewer.

      (6) Although I don't think it's necessary, I would advise including comparison data with an uncorrected endoscope in the same in vivo preparation.

      We thank the Referee for the suggestion. Below, we list the reasons why we decided not to perform the comparison between the uncorrected and corrected endoscopes in the in vivo preparation:

      (1) We believe that the comparison between uncorrected and corrected endoscopes is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of all these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors.

      (2) A major advantage of quantifying how the optical properties of uncorrected and corrected endoscope impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible under in vivo conditions.

      (3) The proposed experiment requires to perform imaging in the awake mouse with a corrected microendoscope, then anesthetize the animal to carefully remove the corrective microlens using forceps, and finally repeat the optical recordings in awake mice with the uncorrected microendoscope. Although this is feasible (we performed the proposed experiment in Antonini et al. eLife 2020 using a 4.1 mm-long microendoscope), the yield of success of these experiments is low. The low yield is due to the fact that the mechanical force applied on top of the microendoscope to remove the corrective microlens may induce movement of the GRIN lens inside the brain, both in vertical and horizontal directions. This can randomly result in change of the focal plane, death or damage of the cells, tissue inflammation, and bleeding. From our own experience, the number of animals used for this experiment is expected to be high.

      Reviewer #2 (Recommendations for the authors):

      Below, I provide a few minor corrections and suggestions for the authors to consider before final submission.

      (1) Page 5: when referring to Table 1 maybe add "Table 1 and Methods".

      Following the Reviewer’s comment, we revised the text at page 6 (lines 4-5 from bottom) as follows:

      “(see Supplementary Table 1 and Materials and Methods for details on simulation parameters)”.

      (2) Page 8: "We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long micro endoscope and the 8.8 mm-long micro endoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3AD)." I could not find the information given in this paragraph, specifically:

      a) Upon examining the black triangles in Figure 3I and J, the enlargement of the effective FOV does not appear to be 4.7 and 2.3 times.

      In Figure 3I, J, black triangles mark the intersections between the curves fitting the data and the threshold of 10 µm on the axial resolution. The values on the x-axis corresponding to the intersections (Table 1, “Effective FOV radius”) represent the estimated radius of the effective FOV of the probes, i.e. the radius within which the microendoscope has spatial resolution below the threshold of 10 μm. The ratios of the effective FOV radii are 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively, which correspond to 4.7 and 2.3 times larger FOV (Table 1). To make this point clearer, we modified the indicated sentence as follows (page 10, lines 3-11 from bottom):

      “We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed a relative increase of the effective FOV radius of 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively (Table 1). This corresponded to an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long microendoscope and the 8.8

      mm-long microendoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3A-D)."

      b) I do not understand how the enlargements in Figure 3I and J align with the ray trace simulations in Figure 1, indicating an enlargement of 5.4 and 5.6.

      In Figure 1C, E of the first submission we showed the Strehl ratio of focal spots focalized after the microendoscope, in the object plane, as a function of radial distance from the optical axis of focal spots focalized in the focal plane at the back end of the GRIN rod (“Objective focal plane” in Figure 1A, B), before the light has traveled along the GRIN lens. After reading the Referee’s comment, we realized this choice does not facilitate the comparison between Figure 1 and Figure 3I, J. We therefore decided to modify Figure 1C, E by showing the Strehl ratio of focal spots focalized after the microendoscope as a function of their radial distance from the optical axis in the objet plane (where the Strehl ratio is computed), after the light has traveled through the GRIN lens (radial distances are still computed on a plane, not along the curved focal surface represented by the “imaging plane” in Figure 1 A, B). Computing radial distances in the object space, we found that the relative increase in the radius of the FOV due to the correction of aberrations was 3.50 and 3.35 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. We also revised the manuscript text accordingly (page 7, lines 6-8):

      “The simulated increase in the radius of the diffraction-limited FOV was 3.50 times and 3.35 times for the 6.4 mm-long and 8.8 mm-long probe, respectively (Fig. 1C, E)”. We believe this change should facilitate the comparison of the data presented in Figure 1 and Figure 3.

      Moreover, in comparing results in Figure 1 and Figure 3, it is important to keep in mind that:

      (1) the definitions of the effective FOV radius were different in simulations (Figure 1) and real measurements (Figure 3). In simulations, we considered a theoretical criterion (Maréchal criterion) and set the lower threshold for a diffraction-limited FOV to a Strehl ratio value of 0.8. In real measures, the effective FOV radius obtained from fluorescent bead measurements was defined based on the empirical criterion of setting the upper threshold for the axial resolution to 10 µm.

      (2) the Zemax file of the GRIN lenses contained low-order aberrations and not high-order aberrations.

      (3) the small variability in some of the experimental parameters (e.g., the distance between the GRIN back end and the focusing objective) were not reflected in the simulations.

      Given the reasons listed above, it is expected that the prediction of the simulations do not perfectly match the experimental measurements and tend to predict larger improvements of aberration correction than the experimentally measured ones.

      c) Finally, how can the enlargement in Figure 3I be compared to the measurements of the sub-resolved fluorescence layers in Figures 3A-D? Could the authors please clarify these points?

      When comparing measurements of subresolved fluorescent films and beads it is important to keep in mind that the two measures have different purposes and spatial resolution. We used subresolved fluorescent films to visualize the shape and extent of the focal surface of microendoscopes in a continuous way along the radial dimension (in contrast to bead measurements that are quantized in space). This approach comes at the cost of spatial resolution, as we are using fluorescent layers, which are subresolved in the axial but not in the radial dimension. Therefore, fluorescent film profiles are not used in our study to extract relevant quantitative information about effective FOV enlargement or spatial resolution of corrected microendoscopes. In contrast, to quantitatively characterize axial and lateral resolutions we used measurements of 100 nm-diameter fluorescent beads (therefore subresolved in the x, y, and z dimensions) located at different radial distances from the center of the FOV, using a much smaller nominal pixel size compared to the fluorescent films (beads, lateral resolution: 0.049 µm/pixel, axial resolution: 0.5 µm/pixel; films, lateral resolution: 1.73 µm/pixel, axial resolution: 2 µm/pixel).

      (3) On page 15, the statement "significantly enlarge the FOV" should be more specific by providing the actual values for the increase. It would also be good to mention that this is not a xy lateral increase; rather, as one moves further from the center, more of the imaged cells belong to axially different planes.

      The values of the experimentally determined FOV enlargements (4.7 times and 2.3 times for 6.4 mm- and 8.8 mm-long microendoscope, respectively) are provided in Table 1 and are now referenced on page 10. Following the Referee’s request, we added the following sentence in the discussion (page 18, lines 10-14) to underline that the extended FOV samples on different axial positions because of the field curvature effect:

      “It must be considered, however, that the extended FOV achieved by our aberration correction method was characterized by a curved focal plane. Therefore, cells located in different radial positions within the image were located at different axial positions and cells at the border of the FOV were closer to the front end of the microendoscope”.

      (4) On page 36, most of the formulas appear to be corrupted. This may have occurred during the conversion to the merged PDF. Please verify this and check for similar problems in other equations throughout the text as well.

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      (5) In the discussion, the authors could potentially add comments on how the verified performance of the corrective lenses depends on the wavelength and mention the range within which the wavelength can be changed without the need to redesign a new corrective lens.

      Following this comments and those of other Reviewers, we explored the effect of changing wavelength on the Strehl ratio using new Zemax simulations. We found that the Strehl ratio remains > 0.8 within ± at least 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained but at different z planes (new Supplementary Figure 1A-D, right panels). These new results are now described on page 7 (lines 8-10).

      (6) Also, they could discuss if and how the corrective lens could be integrated into fiberscopes for freely moving experiments.

      Following the Referee’s suggestion, we added a short text in the Discussion (page 21, lines 4-7 from bottom). It reads:

      “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes(42-44), allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.

      (7) Finally, since the main advantage of this approach is its simplicity, the authors should also comment on or outline the steps to follow for potential users who are interested in using the corrective lenses in their systems.

      Thanks for this comment. The Materials and Methods section of this study and that of Antonini et al. eLife 2020 describe in details the experimental steps necessary to reproduce corrective lenses and apply them to their experimental configuration.

      Reviewer #3 (Recommendations for the authors):

      (1) Suggestions for improved or additional experiments, data, or analyses, and Recommendations for improving the writing and presentation:

      See Public Review.

      Please see our point-by-point response above.

      (2) Minor corrections on text and figures: a) Figure 6A: is the fraction of cells expressed in %?

      Author response: yes, that is correct. Thank you for spotting it. We added the “%” symbol to the y label.

      b) Figurer 8A, left: The second line is blue and not red dashed. In addition, it could be interesting to also show a line corresponding to the 0 value.

      Thank you for the suggestions. We modified Figure 8 according to the Referee’s comments.

      c) Some parts of equation (1) and some variables in the Material and Methods section are missing

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      d) In the methods, the authors mention a calibration ruler with ticks spaced every 10 µm along two orthogonal directions and refer to the following product: 4-dot calibration slide, Cat. No. 1101002300142, Motic, Hong Kong. However, this product does not seem to correspond to a calibration ruler.

      We double check. The catalog number 1101002300142 is correct and product details can be found at the following link:

      https://moticmicroscopes.com/products/calibration-slide-4-dots-1101002300142?srsltid=AfmBOorGYx9PcXtAlIMmSs_tEpxS4nX21qIcV8Kfn4qGwizQK3LYOQn3

    1. eLife Assessment

      This paper represents an important contribution to the field. Summarizing results from neural recording experiments in mice across ten labs, the work provides compelling evidence that basic electrophysiology features, single-neuron functional properties, and population-level decoding are fairly reproducible across labs with proper preprocessing. The results and suggestions regarding preprocessing and quality metrics may be of significant interest to investigators carrying out such experiments in their own labs.

    2. Reviewer #1 (Public review):

      The IBL here presents an important paper that aims to assess potential reproducibility issues in rodent electrophysiological recordings across labs and suggests solutions to these. The authors carried out a series of analyses on data collected across 10 laboratories while mice performed the same decision-making task, and provided convincing evidence that basic electrophysiology features, single-neuron functional properties, and population-level decoding were fairly reproducible across labs with proper preprocessing. This well-motivated large-scale collaboration allowed systematic assessment of lab-to-lab reproducibility of electrophysiological data, and the suggestions outlined in the paper for streamlining preprocessing pipelines and quality metrics will provide general guidance for the field, especially with continued effort to benchmark against standard practices (such as manual curation).

      The authors have carefully incorporated our suggestions. As a result, the paper now better reflects where reproducibility is affected when using common, simple, and more complex analyses and preprocessing methods, and it is more informative-and more reflective of the field overall. We thank the reviewers for this thorough revision. We have 2 remaining suggestions on text clarification:

      (1) Regarding benchmarking the automated metrics to manual curation of units: although we appreciate that a proper comparison may require a lot of effort potentially beyond the scope of the current paper; we do think that explicit discussion regarding this point is needed in the text, to remind the readers (and indeed future generations of electrophysiologists) the pros and cons of different approaches.

      In addition to what the authors have currently stated (line 469-470):<br /> "Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility."

      Maybe also add:<br /> "In particular, a thorough comparison of automated metrics against a careful, large, manually-curated dataset, is an important benchmarking step for future studies.

      (2) The authors now include in Figure 3-Figure Supplement 1 that highlight how much probe depth is adjusted by using electrophysiological features such as LFP power to estimate probe and channel depth. This plot is immensely informative for the field, as it implies that there can be substantial variability-sometimes up to 1 mm discrepancy between insertions-in depth estimation based on anatomical DiI track tips alone. Using electrophysiological features in this way for probe depth estimation is currently not standard in the field and has only been made possible with Neuropixels, which span several millimeters. These figures highlight that this should be a critical step in preprocessing pipelines, and the paper provides solid evidence for this.

      Currently, this part of the figure is only subtly referenced to in the text. We think it would be helpful to explicitly reference this particular panel with discussions of its implication in the text.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether analyses of large-scale electrophysiology data obtained from 10 different individual laboratories are reproducible when they use standardized procedures and quality control measures. They were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Strengths:

      This paper gathered a standardized dataset across 10 labs and performed a host of state-of-the-art analyses on it. Their ability to assess the reproducibility of each analysis across this kind of data is an important contribution to the field.

      Comments on revisions:

      The authors have addressed almost all of the concerns that I raised in this revised version. The new RIGOR notebook is helpful, as are the new analyses.

      This paper attributes much error in probe insertion trajectory planning to the fact that the Allen CCF and standard stereotaxic coordinate systems are not aligned. Consequently, it would be very helpful for the community if this paper could recommend software tools, procedures, or code to do trajectory planning that accounts for this.

      I think it would still be helpful for the paper to have some discussion comparing/contrasting the use of the RIGOR framework with existing spike sorting statistics. They mention in their response to reviewers that this is indeed a large space of existing approaches. Most labs performing Neuropixels recordings already do some type of quality control, but these approaches are not standardized. This work is well-positioned to discuss the advantages and disadvantages of these alternative approaches (even briefly) but does not currently do so-it does not need to run any of these competing approaches to helpfully mention ideas for what a reader of the paper should do for quality control with their own data.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful read of our paper, and appreciate the thoughtful comments.

      Both reviewers agreed that our work had several major strengths: the large dataset collected in collaboration across ten labs, the streamlined processing pipelines, the release of code repositories, the multi-task neural network, and that we definitively determined that electrode placement is an important source of variability between datasets.

      However, a number of key potential improvements were noted: the reviewers felt that a more standard model-based characterization of single neuron responses would benefit our reproducibility analysis, that more detail was needed about the number of cells, sessions, and animals, and that more information was needed to allow users to deploy the RIGOR standards and to understand their relationship to other metrics in the field.

      We agree with these suggestions and have implemented many major updates in our revised manuscript. Some highlights include:

      (1)  A new regression analysis that specifies the response profile of each neuron, allowing a comparison of how similar these are across labs and areas (See Figure 7 in the new section, “Single neuron coefficients from a regression-based analysis are rep oducible across labs”);

      (2) A new decoding analysis (See Figure 9 in the section, “Decodability of task variables is consistent across labs, but varies by brain region”);

      (3) A new RIGOR notebook to ease useability;

      (4) A wealth of additional information about the cells, animals and sessions in each figure;

      (5) Many new additional figure panels in the main text and supplementary material to clarify the specific points raised by the reviewers.

      Again, we are grateful to the reviewers and editors for their helpful comments, which have significantly improved the work. We are hopeful that the many revisions we have implemented will be sufficient to change the “incomplete” designation that was originally assigned to the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors explore a large-scale electrophysiological dataset collected in 10 labs while mice performed the same behavioral task, and aim to establish guidelines to aid reproducibility of results collected across labs. They introduce a series of metrics for quality control of electrophysiological data and show that histological verification of recording sites is important for interpreting findings across labs and should be reported in addition to planned coordinates. Furthermore, the authors suggest that although basic electrophysiology features were comparable across labs, task modulation of single neurons can be variable, particularly for some brain regions. The authors then use a multi-task neural network model to examine how neural dynamics relate to multiple interacting task- and experimenter-related variables, and find that lab-specific differences contribute little to the variance observed. Therefore, analysis approaches that account for correlated behavioral variables are important for establishing reproducible results when working with electrophysiological data from animals performing decision-making tasks. This paper is very well-motivated and needed. However, what is missing is a direct comparison of task modulation of neurons across labs using standard analysis practice in the fields, such as generalized linear model (GLM). This can potentially clarify how much behavioral variance contributes to the neural variance across labs; and more accurately estimate the scale of the issues of reproducibility in behavioral systems neuroscience, where conclusions often depend on these standard analysis methods.

      We fully agree that a comparison of task-modulation across labs is essential. To address this, we have performed two new analyses and added new corresponding figures to the main text (Figures 7 and 9). As the reviewer hoped, this analysis did indeed clarify how much behavioral variance contributes to the variance across labs. Critically, these analyses suggested that our results were more robust to reproducibility than the more traditional analyses would indicate.

      Additional details are provided below (See detailed response to R1P1b).

      Strengths:

      (1) This is a well-motivated paper that addresses the critical question of reproducibility in behavioural systems neuroscience. The authors should be commended for their efforts.

      (2) A key strength of this study comes from the large dataset collected in collaboration across ten labs. This allows the authors to assess lab-to-lab reproducibility of electrophysiological data in mice performing the same decision-making task.

      (3) The authors' attempt to streamline preprocessing pipelines and quality metrics is highly relevant in a field that is collecting increasingly large-scale datasets where automation of these steps is increasingly needed.

      (4) Another major strength is the release of code repositories to streamline preprocessing pipelines across labs collecting electrophysiological data.

      (5) Finally, the application of MTNN for characterizing functional modulation of neurons, although not yet widely used in systems neuroscience, seems to have several advantages over traditional methods.

      Thanks very much for noting these strengths of our work.

      Weaknesses:

      (1) In several places the assumptions about standard practices in the field, including preprocessing and analyses of electrophysiology data, seem to be inaccurately presented:

      a) The estimation of how much the histologically verified recording location differs from the intended recording location is valuable information. Importantly, this paper provides citable evidence for why that is important. However, histological verification of recording sites is standard practice in the field, even if not all studies report them. Although we appreciate the authors' effort to further motivate this practice, the current description in the paper may give readers outside the field a false impression of the level of rigor in the field.

      We agree that labs typically do perform histological verification. Still, our methods offer a substantial improvement over standard practice, and this was critical in allowing us to identify errors in targeting. For instance, we used new software, LASAGNA, which is an innovation over the traditional, more informal approach to localizing recording sites. Second, the requirement that two independent reviewers concur on each proposed location for a recording site is also an improvement over standard practice. Importantly, these reviewers use electrophysiological features to more precisely localize electrodes, when needed, which is an improvement over many labs. Finally, most labs use standard 2D atlases to identify recording location (a traditional approach); our use of a 3D atlas and a modern image registration pipeline has improved the accuracy of identifying the true placement of probes in 3D space.

      Importantly, we don’t necessarily advocate that all labs adopt our pipeline; indeed, this would be infeasible for many labs. Instead, our hope is that the variability in probe trajectory that we uncovered will be taken into account in future studies. Here are 3 example ways in which that could happen. First, groups hoping to target a small area for an experiment might elect to use a larger cohort than previously planned, knowing that some insertions will miss their target. Second, our observation that some targeting error arose because experimenters had to move probes due to blood vessels will impact future surgeries: when an experimenter realizes that a blood vessel is in the way, they might still re-position the probe, but they can also adjust its trajectory (e.g., changing the angle) knowing that even little nudges to avoid blood vessels can have a large impact on the resulting insertion trajectory. Third, our observation of a 7 degree deviation between stereotaxic coordinates and Allen Institute coordinates can be used for future trajectory planning steps to improve accuracy of placement. Uncovering this deviation required many insertions and our standardized pipeline, but now that it is known, it can be easily corrected without needing such a pipeline.

      We thank the reviewer for bringing up this issue and have added new text (and modified existing text) in the Discussion to highlight the innovations we introduced that allowed us to carefully quantify probe trajectory across labs (lines 500 - 515):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset. … Detecting this offset relied on a large cohort size and an automated histological pipeline, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Minimizing variance in probe targeting is another important element in increasing reproducibility, as slight deviations in probe entry position and angle can lead to samples from different populations of neurons. Collecting structural MRI data in advance of implantation could reduce targeting error, although this is infeasible for most labs. A more feasible solution is to rely on stereotaxic coordinates but account for the inevitable off-target measurements by increasing cohort sizes and adjusting probe angles when blood vessels obscure the desired location.”

      b) When identifying which and how neurons encode particular aspects of stimuli or behaviour in behaving animals (when variables are correlated by the nature of the animals behaviour), it has become the standard in behavioral systems neuroscience to use GLMs - indeed many labs participating in the IBL also has a long history of doing this (e.g., Steinmetz et al., 2019; Musall et al., 2023; Orsolic et al., 2021; Park et al., 2014). The reproducibility of results when using GLMs is never explicitly shown, but the supplementary figures to Figure 7 indicate that results may be reproducible across labs when using GLMs (as it has similar prediction performance to the MTNN). This should be introduced as the first analysis method used in a new dedicated figure (i.e., following Figure 3 and showing results of analyses similar to what was shown for the MTNN in Figure 7). This will help put into perspective the degree of reproducibility issues the field is facing when analyzing with appropriate and common methods. The authors can then go on to show how simpler approaches (currently in Figures 4 and 5) - not accounting for a lot of uncontrolled variabilities when working with behaving animals - may cause reproducibility issues.

      We fully agree with the reviewer's suggestion. We have addressed their concern by implementing a Reduced-Rank Regression (RRR) model, which builds upon and extends the principles of Generalized Linear Models (GLMs). The RRR model retains the core regression framework of GLMs while introducing shared, trainable temporal bases across neurons, enhancing the model’s capacity to capture the structure in neural activity (Posani, Wang, et al., bioRxiv, 2024). Importantly, Posani, Wang et al compared the predictive performance of GLMs vs the RRR model, and found that the RRR model provided (slightly) improved performance, so we chose the RRR approach here.

      We highlight this analysis in a new section (lines 350-377) titled, “Single neuron coefficients from a regression-based analysis are reproducible across labs”. This section includes an entirely new Figure (Fig. 7), where this new analysis felt most appropriate, since it is closer in spirit to the MTNN analysis that follows (rather than as a new Figure 3, as the reviewer suggested). As the reviewer hoped, this analysis provides some reassurance that including many variables when characterizing neural activity furnishes results with improved reproducibility. We now state this in the Results and the Discussion (line 456-457), highlighting that these analyses complement the more traditional selectivity analyses, and that using both methods together can be informative.

      When the authors introduce a neural network approach (i.e. MTNN) as an alternative to the analyses in Figures 4 and 5, they suggest: 'generalized linear models (GLMs) are likely too inflexible to capture the nonlinear contributions that many of these variables, including lab identity and spatial positions of neurons, might make to neural activity'). This is despite the comparison between MTNN and GLM prediction performance (Supplement 1 to Figure 7) showing that the MTNN is only slightly better at predicting neural activity compared to standard GLMs. The introduction of new models to capture neural variability is always welcome, but the conclusion that standard analyses in the field are not reproducible can be unfair unless directly compared to GLMs.

      In essence, it is really useful to demonstrate how different analysis methods and preprocessing approaches affect reproducibility. But the authors should highlight what is actually standard in the field, and then provide suggestions to improve from there.

      Thanks again for these comments. We have also edited the MTNN section slightly to accommodate the addition of the previous new RRR section (line 401-402).

      (2) The authors attempt to establish a series of new quality control metrics for the inclusion of recordings and single units. This is much needed, with the goal to standardize unit inclusion across labs that bypasses the manual process while keeping the nuances from manual curation. However, the authors should benchmark these metrics to other automated metrics and to manual curation, which is still a gold standard in the field. The authors did this for whole-session assessment but not for individual clusters. If the authors can find metrics that capture agreed-upon manual cluster labels, without the need for manual intervention, that would be extremely helpful for the field.

      We thank the reviewer for their insightful suggestions regarding benchmarking our quality control metrics against manual curation and other automated methods at the level of individual clusters. We are indeed, as the reviewer notes, publishing results from spike sorting outputs that have been automatically but not manually verified on a neuron-by-neuron basis. To get to the point where we trust these results to be of publishable quality, we manually reviewed hundreds of recordings and thousands of neurons, refining both the preprocessing pipeline and the single-unit quality metrics along the way. All clusters, both those passing QCs and those not passing QCs, are available to review with detailed plots and quantifications at https://viz.internationalbrainlab.org/app (turn on “show advanced metrics” in the upper right, and navigate to the plots furthest down the page, which are at the individual unit level). We would emphasize that these metrics are definitely imperfect (and fully-automated spike sorting remains a work in progress), but so is manual clustering. Our fully automated approach has the advantage of being fully reproducible, which is absolutely critical for the analyses in the present paper. Indeed, if we had actually done manual clustering or curation, one would wonder whether our results were actually reproducible independently. Nevertheless, it is not part of the present manuscript’s objectives to validate or defend these specific choices for automated metrics, which have been described in detail elsewhere (see our Spike Sorting whitepaper, https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_La boratory/19705522?file=49783080). It would be a valuable exercise to thoroughly compare these metrics against a careful, large, manually-curated set, but doing this properly would be a paper in itself and is beyond the scope of the current paper. We also acknowledge that our analyses studying reproducibility across labs could, in principle, result in more or less reproducibility under a different choice of metrics, which we now describe in the Discussion (line 469-470)”:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      (3) With the goal of improving reproducibility and providing new guidelines for standard practice for data analysis, the authors should report of n of cells, sessions, and animals used in plots and analyses throughout the paper to aid both understanding of the variability in the plots - but also to set a good example.

      We wholeheartedly agree and have added the number of cells, mice and sessions for each figure. This information is included as new tabs in our quality control spreadsheet (https://docs.google.com/spreadsheets/d/1_bJLDG0HNLFx3SOb4GxLxL52H4R2uPRcpUlIw6n4 n-E/). This is referred to in line 158-159 (as well as its original location on line 554 in the section, “Quality control and data inclusion”).

      Other general comments:

      (1) In the discussion (line 383) the authors conclude: 'This is reassuring, but points to the need for large sample sizes of neurons to overcome the inherent variability of single neuron recording'. - Based on what is presented in this paper we would rather say that their results suggest that appropriate analytical choices are needed to ensure reproducibility, rather than large datasets - and they need to show whether using standard GLMs actually allows for reproducible results.

      Thanks. The new GLM-style RRR analysis in Figure 7, following the reviewer’s suggestion, does indeed indicate improved reproducibility across labs. As described above, we see this new analysis as complementary to more traditional analyses of neural selectivity and argue that the two can be used together. The new text (line 461) states:

      “This is reassuring, and points to the need for appropriate analytical choices to ensure reproducibility.”

      (2) A general assumption in the across-lab reproducibility questions in the paper relies on intralab variability vs across-lab variability. An alternative measure that may better reflect experimental noise is across-researcher variability, as well as the amount of experimenter experience (if the latter is a factor, it could suggest researchers may need more training before collecting data for publication). The authors state in the discussion that this is not possible. But maybe certain measures can be used to assess this (e.g. years of conducting surgeries/ephys recordings etc)?

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) Figure 3b and c: Are these plots before or after the probe depth has been adjusted based on physiological features such as the LFP power? In other words, is the IBL electrophysiological alignment toolbox used here and is the reliability of location before using physiological criteria or after? Beyond clarification, showing both before and after would help the readers to understand how much the additional alignment based on electrophysiological features adjusts probe location. It would also be informative if they sorted these penetrations by which penetrations were closest to the planned trajectory after histological verification.

      The plots in Figure 3b and 3c reflect data after the probe depth has been adjusted based on electrophysiological features. This adjustment incorporates criteria such as LFP power and spiking activity to refine the trajectory and ensure precise alignment with anatomical landmarks. The trajectories have also been reviewed and confirmed by two independent reviewers. We have clarified this in line 180 and in the caption of Figure 3.

      To address this concern, we have added a new panel c in Figure 3 supplementary 1 (also shown below) that shows the LFP features along the probes prior to using the IBL alignment toolbox. We hope the reviewer agrees that a comparison of panels (a) and (c) below make clear the improvement afforded by our alignment tools.

      In Figure 3 and Figure 3 supplementary 1, as suggested, we have also now sorted the probes by those that were closest to the planned trajectory. This way of visualizing the data makes it clear that as the distance from the planned trajectory increases, the power spectral density in the hippocampal regions becomes less pronounced and the number of probes that have a large portion of the channels localized to VISa/am, LP and PO decreases. We have added text to the caption to describe this. We thank the reviewer for this suggestion and agree that it will help readers to understand how much the additional alignment (based on electrophysiological features) adjusts probe location.

      (4) In Figures 4 and 6: If the authors use a 0.05 threshold (alpha) and a cell simply has to be significant on 1/6 tests to be considered task modulated, that means that they have a false positive rate of ~30% (0.05*6=0.3). We ran a simple simulation looking for significant units (from random null distribution) from these criteria which shows that out of 100.000 units, 26500 units would come out significant (false error rate: 26.5%). That is very high (and unlikely to be accepted in most papers), and therefore not surprising that the fraction of task-modulated units across labs is highly variable. This high false error rate may also have implications for the investigation of the spatial position of task-modulated units (as effects of the spatial position may drown in falsely labelled 'task-modulated' cells).

      Thank you for this concern. The different tests were kept separate, so we did not consider a neuron modulated if it was significant in only one out of six tests, but instead we asked whether a neuron was modulated according to test one, whether it was modulated according to test two, etc., and performed further analyses separately for each test. Thus, we are only vulnerable to the ‘typical’ false positive rate of 0.05 for any given test. We made this clearer in the text (lines 232-236) and hope that the 5% false positive rate seems more acceptable.

      (5) The authors state from Figure 5b that the majority of cells could be well described by 2 PCs. The distribution of R2 across neurons is almost uniform, so depending on what R2 value one considers a 'good' description, that is the fraction of 'good' cells. Furthermore, movement onset has now been well-established to be affecting cells widely and in large fractions, so while this analysis may work for something with global influence - like movement - more sparsely encoded variables (as many are in the brain) may not be well approximated with this suggestion. The authors could expand this analysis into other epochs like activity around stimulus presentation, to better understand how this type of analysis reproduces across labs for features that have a less global influence.

      We thank the reviewer for the suggestion and fully agree that the window used in our original analysis would tend to favor movement-driven neurons. To address this, we repeated the analysis, this time using a window centered around stimulus onset (from -0.5 s prior to stimulus onset until 0.1 s after stimulus onset). As the reviewer suspected, far fewer neurons were active in this window and consequently far fewer were modelled well by the first two PCs, as shown in Author response image 1b (below). Similar to our original analysis using the post-movement window, we found mixed results for the stimulus-centered window across labs. Interestingly, regional differences were weaker in this new analysis compared to the original analysis of the post-movement window. We have added a sentence to the results describing this. Because the results are similar to the post-movement window main figure, we would prefer to restrict the new analysis only to this point-by-point response, in the hopes of streamlining the paper.

      Author response image 1.

      PCA analysis applied to a stimulus-aligned window ([-0.5, 0.1] sec relative to stim onset). Figure conventions as in main text Fig 5. Results are comparable to the post-movement window analysis, however regional differences are weaker here, possibly because fewer cells were active in the pre-movement window. We added panel j here and in the main figure, showing cell-number-controlled results. I.e. for each test, the minimum neuron number of the compared classes was sampled from all classes (say labs in a region), this sampling was repeated 1000 times and p-values combined via Fisher’s method, overall resulting in much fewer significant differences across laboratories and, independently, regions.

      (6) Additionally, in Figure 5i: could the finding that one can only distinguish labs when taking cells from all regions, simply be a result of a different number of cells recorded in each region for each lab? It makes more sense to focus on the lab/area pairing as the authors also do, but not to make their main conclusion from it. If the authors wish to do the comparison across regions, they will need to correct for the number of cells recorded in each region for each lab. In general, it was a struggle to fully understand the purpose of Figure 5. While population analysis and dimensionality reduction are commonplace, this seems to be a very unusual use of it.

      We agree that controlling for varying cell numbers is a valuable addition to this analysis. We added panel j in Fig. 5 showing cell-number-controlled test results of panel i. I.e. for a given statistical comparison, we sample the lowest number of cells of compared classes from the others, do the test, and repeat this sampling 1000 times, before combining the p-values using Fisher’s method. This cell-number controlled version of the tests resulted in clearly fewer significant differences across distributions - seen similarly for the pre-movement window shown in j in Author response image 1. We hope this clarified our aim to illustrate that low-dimensional embedding of cells’ trial-averaged activity can show how regional differences compare with laboratory differences.

      As a complementary statistical analysis to the shown KS tests, we fitted a linear-mixed-effects model (statsmodels.formula.api mixedlm), to the first and second PC for both activity windows (“Move”: [-0.5,1] first movement aligned; “Stim”: [-0.5,0.1] stimulus onset aligned), independently. Author response image 2 (in this rebuttal only) is broadly in line with the KS results, showing more regional than lab influences on the distributions of first PCs for the post-movement window.

      Author response image 2:

      Linear mixed effects model results for two PCs and two activity windows. For the post-movement window (“Move”), regional influences are significant (red color in plots) for all but one region while only one lab has a significant model coefficient for PC1. For PC2 more labs and three regions have significant coefficients. For the pre-movement window (“Stim”) one region for PC1 or PC2 has significant coefficients. The variance due to session id was smaller than all other effects (“eids Var”). “Intercept” shows the expected value of the response variable (PC1, PC2) before accounting for any fixed or random effects. All p-values were grouped as one hypothesis family and corrected for multiple comparisons via Benjamini-Hochberg.

      (7) In the discussion the authors state: " Indeed this approach is a more effective and streamlined way of doing it, but it is questionable whether it 'exceeds' what is done in many labs.

      Classically, scientists trace each probe manually with light microscopy and designate each area based on anatomical landmarks identified with nissl or dapi stains together with gross landmarks. When not automated with 2-PI serial tomography and anatomically aligned to a standard atlas, this is a less effective process, but it is not clear that it is less precise, especially in studies before neuropixels where active electrodes were located in a much smaller area. While more effective, transforming into a common atlas does make additional assumptions about warping the brain into the standard atlas - especially in cases where the brain has been damaged/lesioned. Readers can appreciate the effectiveness and streamlining provided by these new tools without the need to invalidate previous approaches.

      We thank the reviewer for highlighting the effectiveness of manual tracing methods used traditionally. Our intention in the statement was not to invalidate the precision or value of these classical methods but rather to emphasize the scalability and streamlining offered by our pipeline. We have revised the language to more accurately reflect this (line 500-504):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset.”

      (8) What about across-lab population-level representation of task variables, such as in the coding direction for stimulus or choice? Is the general decodability of task variables from the population comparable across labs?

      Excellent question, thanks! We have added the new section “Decodability of task variables is consistent across labs, but varies by brain region” (line 423-448) and Figure 9 in the revised manuscript to address this question. In short, yes, the general decodability of task variables from the population is comparable across labs, providing additional reassurance of reproducibility.

      Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether observations made in separate individual laboratories are reproducible when they use standardized procedures and quality control measures. This is a key question for the field. If ten systems neuroscience labs try very hard to do the exact same experiment and analyses, do they get the same core results? If the answer is no, this is very bad news for everyone else! Fortunately, they were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Major Comments:

      The paper had two principal goals:

      (1) to assess reproducibility between labs on a carefully coordinated experiment

      (2) distill the knowledge learned into a set of standards that can be applied across the field.

      The manuscript made progress towards both of these goals but leaves room for improvement.

      (1) The first goal of the study was to perform exactly the same experiment and analyses across 10 different labs and see if you got the same results. The rationale for doing this was to test how reproducible large-scale rodent systems neuroscience experiments really are. In this, the study did a great job showing that when a consortium of labs went to great lengths to do everything the same, even decoding algorithms could not discern laboratory identity was not clearly from looking at the raw data. However, the amount of coordination between the labs was so great that these findings are hard to generalize to the situation where similar (or conflicting!) results are generated by two labs working independently.

      Importantly, the study found that electrode placement (and thus likely also errors inherent to the electrode placement reconstruction pipeline) was a key source of variability between datasets. To remedy this, they implemented a very sophisticated electrode reconstruction pipeline (involving two-photon tomography and multiple blinded data validators) in just one lab-and all brains were sliced and reconstructed in this one location. This is a fantastic approach for ensuring similar results within the IBL collaboration, but makes it unclear how much variance would have been observed if each lab had attempted to reconstruct their probe trajectories themselves using a mix of histology techniques from conventional brain slicing, to light sheet microscopy, to MRI imaging.

      This approach also raises a few questions. The use of standard procedures, pipelines, etc. is a great goal, but most labs are trying to do something unique with their setup. Bigger picture, shouldn't highly "significant" biological findings akin to the discovery of place cells or grid cells, be so clear and robust that they can be identified with different recording modalities and analysis pipelines?

      We agree, and hope that this work may help readers understand what effect sizes may be considered “clear and robust” from datasets like these. We certainly support the reviewer’s point that multiple approaches and modalities can help to confirm any biological findings, but we would contend that a clear understanding of the capabilities and limitations of each approach is valuable, and we hope that our paper helps to achieve this.

      Related to this, how many labs outside of the IBL collaboration have implemented the IBL pipeline for their own purposes? In what aspects do these other labs find it challenging to reproduce the approaches presented in the paper? If labs were supposed to perform this same experiment, but without coordinating directly, how much more variance between labs would have been seen? Obviously investigating these topics is beyond the scope of this paper. The current manuscript is well-written and clear as is, and I think it is a valuable contribution to the field. However, some additional discussion of these issues would be helpful.

      We thank the reviewer for raising this important issue. We know of at least 13 labs that have implemented the behavioral task software and hardware that we published in eLife in 2021, and we expect that over the next several years labs will also implement these analysis pipelines (note that it is considerably cheaper and faster to implement software pipelines than hardware). In particular, a major goal of the staff in the coming years is to continue and improve the support for pipeline deployment and use. However, our goal in this work, which we have aimed to state more clearly in the revised manuscript, was not so much to advocate that others adopt our pipeline, but instead to use our standardized approach as a means of assessing reproducibility under the best of circumstances (see lines 48-52): “A high level of reproducibility of results across laboratories when procedures are carefully matched is a prerequisite to reproducibility in the more common scenario in which two investigators approach the same high-level question with slightly different experimental protocols.”

      Further, a number of our findings are relevant to other labs regardless of whether they implement our exact pipeline, a modified version of our pipeline, or something else entirely. For example, we found probe targeting to be a large source of variability. Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Relatedly, we found that slight deviations in probe entry position can lead to samples from different populations of neurons. Although this took large cohort sizes to discover, knowledge of this discovery means that future experiments can plan for larger cohort sizes to allow for off-target trajectories, and can re-compute probe angle when the presence of blood vessels necessitates moving probes slightly. These points are now highlighted in the Discussion (lines 500-515).

      Second, the proportion of responsive neurons (a quantity often used to determine that a particular area subserves a particular function), sometimes failed to reproduce across labs. For example, for movement-driven activity in PO, UCLA reported an average change of 0 spikes/s, while CCU reported a large and consistent change (Figure 4d, right most panel, compare orange vs. yellow traces). This argues that neuron-to-neuron variability means that comparisons across labs require large cohort sizes. A small number of outlier neurons in a session can heavily bias responses. We anticipate that this problem will be remedied as tools for large scale neural recordings become more widely used. Indeed, the use of 4-shank instead of single-shank Neuropixels (as we used here) would have greatly enhanced the number of PO neurons we measured in each session. We have added new text to Results explaining this (lines 264-268):

      “We anticipate that the feasibility of even larger scale recordings will make lab-to-lab comparisons easier in future experiments; multi-shank probes could be especially beneficial for cortical recordings, which tend to be the most vulnerable to low cell counts since the cortex is thin and is the most superficial structure in the brain and thus the most vulnerable to damage. Analyses that characterize responses to multiple parameters are another possible solution (See Figure 7).”

      (2) The second goal of the study was to present a set of data curation standards (RIGOR) that could be applied widely across the field. This is a great idea, but its implementation needs to be improved if adoption outside of the IBL is to be expected. Here are three issues:

      (a) The GitHub repo for this project (https://github.com/int-brain-lab/paper-reproducible-ephys/) is nicely documented if the reader's goal is to reproduce the figures in the manuscript. Consequently, the code for producing the RIGOR statistics seems mostly designed for re-computing statistics on the existing IBL-formatted datasets. There doesn't appear to be any clear documentation about how to run it on arbitrary outputs from a spike sorter (i.e. the inputs to Phy).

      We agree that clear documentation is key for others to adopt our standards. To address this, we have added a section at the end of the README of the repository that links to a jupyter notebook (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb) that runs the RIGOR metrics on a user’s own spike sorted dataset. The notebook also contains a tutorial that walks through how to visually assess the quality of the raw and spike sorted data, and computes the noise level metrics on the raw data as well as the single cell metrics on the spike sorted data.

      (b) Other sets of spike sorting metrics that are more easily computed for labs that are not using the IBL pipeline already exist (e.g. "quality_metrics" from the Allen Institute ecephys pipeline [https://github.com/AllenInstitute/ecephys_spike_sorting/blob/main/ecephys_spike_sorting/m odules/quality_metrics/README.md] and the similar module in the Spike Interface package [https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics.html]). The manuscript does not compare these approaches to those proposed here, but some of the same statistics already exist (amplitude cutoff, median spike amplitude, refractory period violation).

      There is a long history of researchers providing analysis algorithms and code for spike sorting quality metrics, and we agree that the Allen Institute’s ecephys code and the Spike Interface package are the current options most widely used (but see also, for example, Fabre et al. https://github.com/Julie-Fabre/bombcell). Our primary goal in the present work is not to advocate for a particular implementation of any quality metrics (or any spike sorting algorithm, for that matter), but instead to assess reproducibility of results, given one specific choice of spike sorting algorithm and quality metrics. That is why, in our comparison of yield across datasets (Fig 1F), we downloaded the raw data from those comparison datasets and re-ran them under our single fixed pipeline, to establish a fair standard of comparison. A full comparison of the analyses presented here under different choices of quality metrics and spike sorting algorithms would undoubtedly be interesting and useful for the field - however, we consider it to be beyond the scope of the present work. It is therefore an important assumption of our work that the result would not differ materially under a different choice of sorting algorithm and quality metrics. We have added text to the Discussion to clarify this limitation:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      That said, we still intend for external users to be able to easily run our pipelines and quality metrics.

      (c) Some of the RIGOR criteria are qualitative and must be visually assessed manually. Conceptually, these features make sense to include as metrics to examine, but would ideally be applied in a standardized way across the field. The manuscript doesn't appear to contain a detailed protocol for how to assess these features. A procedure for how to apply these criteria for curating non-IBL data (or for implementing an automated classifier) would be helpful.

      We agree. To address this, we have provided a notebook that runs the RIGOR metrics on a user’s own dataset, and contains a tutorial on how to interpret the resulting plots and metrics (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb).

      Within this notebook there is a section focused on visually assessing the quality of both the raw data and the spike sorted data. The code in this section can be used to generate plots, such as raw data snippets or the raster map of the spiking activity, which are typically used to visually assess the quality of the data. In Figure 1 Supplement 2 we have provided examples of such plots that show different types of artifactual activity that should be inspected.

      Other Comments:

      (1) How did the authors select the metrics they would use to evaluate reproducibility? Was this selection made before doing the study?

      Our metrics were selected on the basis of our experience and expertise with extracellular electrophysiology. For example: some of us previously published on epileptiform activity and its characteristics in some mice (Steinmetz et al. 2017), so we included detection of that type of artifact here; and, some of us previously published detailed investigations of instability in extracellular electrophysiological recordings and methods for correcting them (Steinmetz et al. 2021, Windolf et al. 2024), so we included assessment of that property here. These metrics therefore represent our best expert knowledge about the kinds of quality issues that can affect this type of dataset, but it is certainly possible that future investigators will discover and characterize other quality issues.

      The selection of metrics was primarily performed before the study (we used these assessments internally before embarking on the extensive quantifications reported here), and in cases where we refined them further during the course of preparing this work, it was done without reference to statistical results on reproducibility but instead on the basis of manual inspection of data quality and metric performance.

      (2) Was reproducibility within-lab dependent on experimenter identity?

      We thank the reviewer for this question. We have addressed it in our response to R1 General comment 2, as follows:

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) They note that UCLA and UW datasets tended to miss deeper brain region targets (lines 185-188) - they do not speculate why these labs show systematic differences. Were they not following standardized procedures?

      Thank you for raising this point. All researchers across labs were indeed following standardised procedures. We note that our statistical analysis of probe targeting coordinates and angles did not reveal a significant effect of lab identity on targeting error, even though we noted the large number of mis-targeted recordings in UCLA and UW to help draw attention to the appropriate feature in the figure. Given that these differences were not statistically significant, we can see how it was misleading to call out these two labs specifically. While the overall probe placement surface error and angle error both show no such systematic difference, the magnitude of surface error showed a non-significant tendency to be higher for samples in UCLA & UW, which, compounded with the direction of probe angle error, caused these probe insertions to land in a final location outside LP & PO.

      This shows how subtle differences in probe placement & angle accuracy can lead to compounded inaccuracies at the probe tip, especially when targeting deep brain regions, even when following standard procedures. We believe this is driven partly by the accuracy limit or resolution of the stereotaxic system, along with slight deviations in probe angle, occurring during the setup of the stereotaxic coordinate system during these recordings.

      We have updated the relevant text in lines 187-190 as follows, to clarify:

      “Several trajectories missed their targets in deeper brain regions (LP, PO), as indicated by gray blocks, despite the lack of significant lab-dependent effects in targeting as reported above. These off-target trajectories tended to have both a large displacement from the target insertion coordinates and a probe angle that unfavorably drew the insertions away from thalamic nuclei (Figure 2f).”

      (4) The authors suggest that geometrical variance (difference between planned and final identified probe position acquired from reconstructed histology) in probe placement at the brain surface is driven by inaccuracies in defining the stereotaxic coordinate system, including discrepancies between skull landmarks and the underlying brain structures. In this case, the use of skull landmarks (e.g. bregma) to determine locations of brain structures might be unreliable and provide an error of ~360 microns. While it is known that there is indeed variance in the position between skull landmarks and brain areas in different animals, the quantification of this error is a useful value for the field.

      We thank the reviewer for their thoughtful comment and are glad that they found the quantification of variance useful for the field.

      (5) Why are the thalamic recording results particularly hard to reproduce? Does the anatomy of the thalamus simply make it more sensitive to small errors in probe positioning relative to the other recorded areas?

      We thank the reviewer for raising this interesting question. We believe that they are referring to Figure 4: indeed when we analyzed the distribution of firing rate modulations, we saw some failures of reproducibility in area PO (bottom panel, Figure 4h). However, the thalamic nuclei were not, in other analyses, more vulnerable to failures in reproducibility. For example, in the top panel of Figure 4h, VisAM shows failures of reproducibility for modulation by the visual stimulus. In Fig. 5i, area CA1 showed a failure of reproducibility. We fear that the figure legend title in the previous version (which referred to the thalamus specifically) was misleading, and we have revised this. The new title is, “Neural activity is modulated during decision-making in five neural structures and is variable between laboratories.” This new text more accurately reflects that there were a number of small, idiosyncratic failures of reproducibility, but that these were not restricted to a specific structure. The new analysis requested by R1 (now in Figure 7) provides further reassurance of overall reproducibility, including in the thalamus (see Fig. 7a, right panels; lab identity could not be decoded from single neuron metrics, even in the thalamus).

      Reviewer #1 (Recommendations for the authors):

      (1) Figure font sizes and formatting are variable across panels and figures. Please streamline the presentation of results.

      Thank you for your feedback. We have remade all figures with the same standardized font sizes and formatting.

      (2) Please correct the noncontinuous color scales in Figures 3b and 3d.

      Thank you for pointing this out, we fixed the color bar.

      (3) In Figures 5d and g, the error bars are described as: 'Error bands are standard deviation across cells normalised by the square root of the number of sessions in the region'. How does one interpret this error? It seems to be related to the standard error of the mean (std/sqrt(n)) but instead of using the n from which the standard deviation is calculated (in this case across cells), the authors use the number of sessions as n. If they took the standard deviation across sessions this would be the sem across sessions, and interpretable (as sem*1.96 is the 95% parametric confidence interval of the mean). Please justify why these error bands are used here and how they can be interpreted - it also seems like it is the only time these types of error bands are used.

      We agree and for clarity use standard error across cells now, as the error bars do not change dramatically either way.

      (4) It is difficult to understand what is plotted in Figures 5e,h, please unpack this further and clarify.

      Thank you for pointing this out. We have added additional explanation in the figure caption (See caption for Figure 5c) to explain the KS test.

      (5) In lines 198-201 the authors state that they were worried that Bonferroni correction with 5 criteria would be too lenient, and therefore used 0.01 as alpha. I am unsure whether the authors mean that they are correcting for multiple comparisons across features or areas. Either way, 0.01 alpha is exactly what a Bonferroni corrected alpha would be when correcting for either 5 features or 5 areas: 0.05/5=0.01. Or do they mean they apply the Bonferroni correction to the new 0.01 alpha: i.e., 0.01/5=0.002? Please clarify.

      Thank you, that was indeed written confusingly. We considered all tests and regions as whole, so 7 tests * 5 regions = 35 tests, which would result in a very strong Bonferroni correction. Indeed, if one considers the different tests individually, the correction we apply from 0.05 to 0.01 can be considered as correcting for the number of regions, which we now highlight better. We apply no further corrections of any kind to our alpha=0.01. We clarified this in the manuscript in all relevant places (lines 205-208, 246, 297-298, and 726-727).

      (6) Did the authors take into account how many times a probe was used/how clean the probe was before each recording. Was this streamlined between labs? This can have an effect on yield and quality of recording.

      We appreciate the reviewer highlighting the potential impact of probe use and cleanliness on recording quality and yield. While we did not track the number of times each probe was used, we ensured that all probes were cleaned thoroughly after each use using a standardized cleaning protocol (Section 16: Cleaning the electrode after data acquisition in Appendix 2: IBL protocol for electrophysiology recording using Neuropixels probe). We acknowledge that tracking the specific usage history of each probe could provide additional insights, but unfortunately we did not track this information for this project. In prior work the re-usability of probes has been quantified, showing insignificant degradation with use (e.g. Extended Data Fig 7d from Jun et al. 2017).

      (7) Figure 3, Supplement1: DY_013 missed DG entirely? Was this included in the analysis?

      Thank you for this question. We believe the reviewer is referring to the lack of a prominent high-amplitude LFP band in this mouse, and lack of high-quality sorted units in that region. Despite this, our histology did localize the recording trajectory to DG. This recording did pass our quality control criteria overall, as indicated by the green label, and was used in relevant analyses.

      The lack of normal LFP features and neuron yield might reflect the range of biological variability (several other sessions also have relatively weak DG LFP and yield, though DY_013 is the weakest), or could reflect some damage to the tissue, for example as caused by local bleeding. Because we could not conclusively identify the source of this observation, we did not exclude it.

      (8) Given that the authors argue for using the MTNN over GLMs, it would be useful to know exactly how much better the MTNN is at predicting activity in the held-out dataset (shown in Figure 7, Supplement 1). It looks like a very small increase in prediction performance between MTNN and GLMs, is it significantly different?

      The average variance explained on the held-out dataset, as shown in Figure 8–Figure Supplement 1 Panel B, is 0.065 for the GLMs and 0.071 for the MTNN. As the reviewer correctly noted, this difference is not significant. However, one of the key advantages of the MTNN over GLMs lies in its flexibility to easily incorporate covariates, such as electrophysiological characteristics or session/lab IDs, directly into the analysis. This feature is particularly valuable for assessing effect sizes and understanding the contributions of various factors.

      (9) In line 723: why is the threshold for mean firing rate for a unit to be included in the MTNN results so high (>5Hz), and how does it perform on units with lower firing rates?      

      We thank the reviewer for pointing this out. The threshold for including units with a mean firing rate above 5 Hz was set because most units with firing rates below this threshold were silent in many trials, and reducing the number of units helped keep the MTNN training time reasonable. Based on this comment, we ran the MTNN experiments including all units with firing rates above 1 Hz, and the results remained consistent with our previous conclusions (Figure 8). Crucially, the leave-one-out analysis consistently showed that lab and session IDs had effect sizes close to zero, indicating that both within-lab and between-lab random effects are small and comparable.

      Reviewer #2 (Recommendations for the authors):

      (1) Most of the more major issues were already listed in the above comments. The strongest recommendation for additional work would be to improve the description and implementation of the RIGOR statistics such that non-IBL labs that might use Neuropixels probes but not use the entire IBL pipeline might be able to apply the RIGOR framework to their own data.

      We thank the reviewer for highlighting the importance of making the RIGOR statistics more accessible to a broader audience. We agree that improving the description and implementation of the RIGOR framework is essential for facilitation of non-IBL labs using Neuropixels probes. To address this we created a jupyter notebook with step-by-step guidance that is not dependent on the IBL pipeline. This tool (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/develop/RIGOR_script.ipynb) is publicly available through the repository, accompanied by example datasets and usage tutorials.

      (2) Table 1: How are qualitative features like "drift" defined? Some quantitative statistics like "presence ratio" (the fraction of the dataset where spikes are present) already exist in packages like ecephys_spike_sorting. Who measured these qualitative features? What are the best practices for doing these qualitative analyses?

      At the probe level, we compute the estimate of the relative motion of the electrodes to the brain tissue at multiple depths along the electrode. We overlay the drift estimation over a raster plot to detect sharp displacements as a function of time. Quantitatively, the drift is the cumulative absolute electrode motion estimated during spike sorting (µm). We clarified the corresponding text in Table 1.

      The qualitative assessments were carried out by IBL staff and experimentalists. We have now provided code to run the RIGOR metrics along with an embedded tutorial, to complement the supplemental figures we have shown about qualitative metric interpretation.

      (3) Table 1: What are the units for the LFP derivative?

      We thank the reviewer for noting that the unit was missing. The unit (decibel per unit of space) is now in the table.

      (4) Table 1: For "amplitude cutoff", the table says that "each neuron must pass a metric". What is the metric?

      We have revised the table to include this information. This metric was designed to detect potential issues in amplitude distributions caused by thresholding during deconvolution, which could result in missed spikes. There are quantitative thresholds on the distribution of the low tail of the amplitude histogram relative to the high tail, and on the relative magnitude of the bins in the low tail. We now reference the methods text from the table, which includes a more extended description and gives the specific threshold numbers. Also, the metric and thresholds are more easily understood with graphical assistance; see the IBL Spike Sorting Whitepaper for this (Fig. 17 in that document and nearby text; https://doi.org/10.6084/m9.figshare.19705522.v4). This reference is now also cited in the text.

      (5) Figure 2: In panel A, the brain images look corrupted.

      Thanks; in the revised version we have changed the filetype to improve the quality of the panel image.

      (6) Figure 7: In panel D, make R2 into R^2 (with a superscript)

      Panel D y-axis label has been revised to include superscript (note that this figure is now Figure 8).

      Works Cited

      Julie M.J. Fabre, Enny H. van Beest, Andrew J. Peters, Matteo Carandini, and Kenneth D. Harris. Bombcell: automated curation and cell classification of spike-sorted electrophysiology data, July 2023. URL https://doi.org/10.5281/zenodo.8172822.

      James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, Brian Barbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, C¸ a˘gatayAydın, Mladen Barbic, Timothy J. Blanche, Vincent Bonin, Jo˜ao Couto, Barundeb Dutta, Sergey L. Gratiy, Diego A. Gutnisky, Michael H¨ausser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez, Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich, Cyrille Rossant, Wei-lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, Christof Koch, John O’Keefe, and Timothy D.Harris. Fully integrated silicon probes for high-density recording of neural activity.Nature, 551(7679):232–236, Nov 2017. ISSN 1476-4687. doi: 10.1038/nature24636. URL https://doi.org/10.1038/nature24636.

      Simon Musall, Xiaonan R. Sun, Hemanth Mohan, Xu An, Steven Gluf, Shu-Jing Li, Rhonda Drewes, Emma Cravo, Irene Lenzi, Chaoqun Yin, Bj¨orn M. Kampa, and Anne K. Churchland. Pyramidal cell types drive functionally distinct cortical activity patterns during decision-making. Nature Neuroscience, 26(3):495– 505, Mar 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01245-9. URL https://doi.org/10.1038/s41593-022-01245-9.

      Ivana Orsolic, Maxime Rio, Thomas D Mrsic-Flogel, and Petr Znamenskiy. Mesoscale cortical dynamics reflect the interaction of sensory evidence and temporal expectation during perceptual decision-making. Neuron, 109(11):1861–1875.e10, April 2021. Hyeong-Dong Park, St´ephanie Correia, Antoine Ducorps, and Catherine Tallon-Baudry.Spontaneous fluctuations in neural responses to heartbeats predict visual detection.Nature Neuroscience, 17(4):612–618, Apr 2014. ISSN 1546-1726. doi: 10.1038/nn.3671. URL https://doi.org/10.1038/nn.3671.

      Lorenzo Posani, Shuqi Wang, Samuel Muscinelli, Liam Paninski, and Stefano Fusi. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy. bioRxiv, 2024. doi: 10.1101/2024.11.15.623878. URL https://www.biorxiv.org/content/early/2024/12/09/2024.11.15.623878.

      Nicholas A. Steinmetz, Christina Buetfering, Jerome Lecoq, Christian R. Lee, Andrew J. Peters, Elina A. K. Jacobs, Philip Coen, Douglas R. Ollerenshaw, Matthew T. Valley, Saskia E. J. de Vries, Marina Garrett, Jun Zhuang, Peter A. Groblewski, Sahar Manavi, Jesse Miles, Casey White, Eric Lee, Fiona Griffin, Joshua D. Larkin, Kate Roll, Sissy Cross, Thuyanh V. Nguyen, Rachael Larsen, Julie Pendergraft, Tanya Daigle, Bosiljka Tasic, Carol L. Thompson, Jack Waters, Shawn Olsen, David J. Margolis, Hongkui Zeng, Michael Hausser, Matteo Carandini, and Kenneth D. Harris. Aberrant cortical activity in multiple gcamp6-expressing transgenic mouse lines. eNeuro, 4(5), 2017. doi: 10.1523/ENEURO.0207-17.2017. URL https://www.eneuro.org/content/4/5/ENEURO.0207-17.2017.

      Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributed coding of choice, action and engagement across the mouse brain. Nature, 576(7786):266–273, Dec 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1787-x. URL https://doi.org/10.1038/s41586-019-1787-x.

      Nicholas A. Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia B¨ohm, Martijn Broux, Susu Chen, Jennifer Colonell, Richard J. Gardner, Bill Karsh, Fabian Kloosterman, Dimitar Kostadinov, Carolina Mora-Lopez, John O’Callaghan, Junchol Park, Jan Putzeys, Britton Sauerbrei, Rik J. J. van Daal, Abraham Z. Vollan, Shiwei Wang, Marleen Welkenhuysen, Zhiwen Ye, Joshua T. Dudman, Barundeb Dutta, Adam W. Hantman,Kenneth D. Harris, Albert K. Lee, Edvard I. Moser, John O’Keefe, Alfonso Renart, Karel Svoboda, Michael H¨ausser, Sebastian Haesler, Matteo Carandini, and Timothy D. Harris. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science, 372(6539):eabf4588, 2021. doi: 10.1126/science.abf4588.URL https://www.science.org/doi/abs/10.1126/science.abf4588.

      Charlie Windolf, Han Yu, Angelique C. Paulk, Domokos Mesz´ena, William Mu˜noz, Julien Boussard, Richard Hardstone, Irene Caprara, Mohsen Jamali, Yoav Kfir, Duo Xu, Jason E. Chung, Kristin K. Sellers, Zhiwen Ye, Jordan Shaker, Anna Lebedeva, Manu Raghavan, Eric Trautmann, Max Melin, Jo˜ao Couto, Samuel Garcia, Brian Coughlin, Csaba Horv´ath, Rich´ard Fi´ath, Istv´an Ulbert, J. Anthony Movshon, Michael N. Shadlen, Mark M. Churchland, Anne K. Churchland, Nicholas A. Steinmetz, Edward F. Chang, Jeffrey S. Schweitzer, Ziv M. Williams, Sydney S. Cash, Liam Paninski, and Erdem Varol. Dredge: robust motion correction for high-density extracellular recordings across species. bioRxiv, 2023. doi: 10.1101/2023.10.24.563768. URL https://www.biorxiv.org/content/early/2023/10/29/2023.10.24.563768.

    1. eLife Assessment

      This study provides valuable insights into the evolutionary histories and cellular infection responses of two Salmonella Dublin genotypes. While the evidence is compelling, a more phylogenetically diverse bacterial collection would enhance the findings. This research is relevant to scientists studying Salmonella and gastroenteritis-related pathogens.

    2. Reviewer #1 (Public review):

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long read sequencing on a subset of isolates (ST10 and ST74), and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophage compared to ST10, but both STs induced comparable cytotoxicity levels. Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors likely associated with the observed differences. The study provides a comprehensive and novel understanding on the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. The methodology included in both approaches were sound and written in sufficient detail, and data analysis were performed with rigour. Source data were fully presented and accessible to readers.

      Comments on revised version:

      The authors have addressed all the points raised by the reviewer. The manuscript is now much enhanced in clarity and accuracy. The re-written Discussion is more relevant and brings in comparison with other invasive Salmonella serotypes.

      Comments:

      In light of the metadata supplied in this revision, for Australian isolates, all human cases of ST74 (n=7) were from faeces (assuming from gastroenteritis) while 18/40 of ST10 were from invasive specimen (blood and abscess). This may contradict with the manuscript's finding and discussion on different experiment phenotypes of the two STs, with ST74 showing more replication in macrophages and potentially more invasive. Thus, the reviewer suggests the authors to mention this disparity in the Discussion, and discuss possible reasons underlying this disparity. This can strengthen the author's rationale for further in vivo studies.

    3. Reviewer #2 (Public review):

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understand its evolution. The phenotyping of isolates of ST10 and ST74 also offer insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high income settings. However, there is no selection bias; this is simply a consequence of publicly available sequences.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long-read sequencing on a subset of isolates (ST10 and ST74) and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophages compared to ST10, but both STs induced comparable cytotoxicity levels.

      Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors were likely associated with the observed differences. The study provides a comprehensive and novel understanding of the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. 

      The methodology included in both approaches was sound and written in sufficient detail, and data analysis was performed with rigour. Source data were fully presented and accessible to readers. Certain aspects of the manuscript could be clarified and extended to improve the manuscript. 

      (1) For epidemiology purposes, it is not clear which human diseases were associated with the genomes included in this manuscript. This is important since S. Dublin can cause invasive bloodstream infections in humans. While such information may be unavailable for public sequences, this should be detailed for the 53 isolates sequenced for this study, especially for isolates selected to perform experiments in vitro.

      Thank you for the suggestion. We have added the sample type for the 53 isolates sequenced for this study. These additional details have been added to Supplementary Tables 1, 4, 9 and 10.

      (2) The major AMR plasmid in described S. Dublin was the IncC associated with clonal expansion in North America. While this plasmid is not found in the Australian isolates sequenced in this study, the reviewer finds that it is still important to include its characterization, since it carries blaCMY-2 and was sustainedly inherited in ST10 clade 5. If the plasmid structure is already published, the authors should include the accession number in the Main Results.

      We have provided accessions and context for two of the IncC hybrid plasmids that have been previously reported in the literature in the Introduction. The text now reads:

      “These MDR S. Dublin isolates all type as sequence type 10 (ST10), and the AMR determinants have been demonstrated to be carried on an IncC plasmid that has recombined with a virulence plasmid encoding the spvRABCD operon (12,16,18,19).  This has resulted in hybrid virulence and AMR plasmids circulating in North America including a 329kb megaplasmid with IncX1, IncFIA, IncFIB, and IncFII replicons (isolate CVM22429, NCBI accession CP032397.1) (12,16) and a smaller hybrid plasmid 172,265 bases in size with an IncX1 replicon (isolate N13-01125, NCBI accession KX815983.1) (19).”

      Further characterisation of the IncA/C plasmid circulating in North America was beyond the scope of this study.

      (a) The reviewer is concerned that the multiple annotations missing in  plasmid structures in Supplementary Figures 5 & 6, and  genetic content unique to ST10 and ST74 was due to insufficient annotation by Prokka. I would recommend the authors use another annotation tool, such as Bakta (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/) for plasmid annotation, and reconstruction of the pangenome described in Supplementary Figure 10. Since the recombinant virulence plasmid in ST10 is a novel one, I would recommend putting Supplementary Figure 5 as a main figure, with better annotations to show the virulence region, plasmid maintenance/replication, and possible conjugation cluster.

      In the supplementary figures of the plasmids, we sought to highlight key traits on interest on the plasmids, namely plasmid replicons, antimicrobial resistance and heavy metal resistance (Supplementary Figure 5) and virulence genes (Supplementary Figure 6). The inclusion of the accessions of publicly available isolates provide for characterised plasmids such as the S. Dublin virulence plasmid (NCBI accession: CP001143). 

      For the potentially hybrid plasmid with IncN/IncX1/IncFII reported in Supplementary Figure 6, we have undertaken additional analyses of the two Australian isolates to reannotate these isolates with Bakta which provides for more detailed annotations. 

      We have added new text to the methods which reads as: 

      “The final genome assemblies were confirmed as S. Dublin using SISTR and annotated using both Prokka v1.14.6 (69) for consistency with the draft genome assemblies and  Bakta v1.10.1 (93) which provides for more detailed annotations (Supplementary Table 13). Both Prokka and Bakta annotations were in agreement for AMR, HMR and virulence genes, with Bakta annotating between 3-7 additional CDS which were largely ‘hypothetical protein’.”

      For the pangenome analysis of the seven ST74 and ten ST10 isolates, we have continued to use the Prokka annotated draft genome assemblies for input to Panaroo. 

      (4) The authors are lauded for the use of multiple strains of ST10 and ST74 in the in vitro experiment. While results for ST74 were more consistent, readouts from ST10 were more heterogenous (Figure 5, 6). This is interesting as the tested ST10 were mostly clade 1, so ST10 was, as expected, of lower genetic diversity compared to tested ST74 (partly shown in Figure 1D. Could the authors confirm this by constructing an SNP table separately for tested ST10 and ST74? Additionally, the tested ST10 did not represent the phylogenetic diversity of the global epidemiology, and this limitation should be reflected in the Discussion.

      In response to the reviewer’s comments, we have provided a detailed SNP table (Supplementary Table 12) to further clarify the genetic diversity within the tested ST10 and ST74 strains. 

      Additionally, we have expanded on the limitation regarding the phylogenetic diversity of the ST10 isolates in the Discussion, highlighting how the strains used in the in vitro experiments may not fully represent the global epidemiological diversity of S. Dublin ST10. The new text now reads:

      “This study has limitations, including a focus on ST10 isolates from clade 1, which do not represent global phylogenetic diversity. Nonetheless, our pangenome analysis identified >900 uncharacterised genes unique to ST74, offering potential targets for future research. Another limitation is the geographic bias in available genomes, with underrepresentation from Asia and South America. This reflects broader disparities in genomic research resources but may improve as public health genomics capacity expands globally.”

      (5) The comparative genomics between ST10 and ST74 can be further improved to allow more interpretation of the experiments. Why were only SPI-1, 2, 6, and 19 included in the search for virulome, how about other SPIs? ST74 lacks SPI-19 and has truncated SPI-6, so what would explain the larger genome size of ST74? Have the authors screened for other SPIs using more well-annotated databases or references (S. Typhi CT18 or S. Typhimurium ST313)? The mismatching between in silico prediction of invasiveness and phenotypes also warrants a brief discussion, perhaps linked to bigger ST74 genome size (as intracellular lifestyle is usually linked with genome degradation).

      Systematic screening for SPIs with detailed reporting on individual genes and known effectors is still an area of development in Salmonella comparative genomics. In our characterisation of the virulome in this S. Dublin dataset we decided to focus on SPI1, SPI-2, SPI-6 and SPI-19 as these had been identified in previous studies and were considered to be most likely linked to the invasive phenotype of S. Dublin. We thought the truncation of SPI-6 and lack of SPI-19 in ST74 compared to the ST10 isolates would provide a basis to explore genomic differences in the two genotypes, with the screening for individual genes on each SPIs reported in Supplementary Figure 7 and Supplementary Table 9.  

      We have expanded upon the mismatching of the in silico prediction of invasiveness and phenotypes in the Discussion. We now explore the increased genome size and intracellular replication of the ST74 population. We hypothesise that invasiveness has not been studied as thoroughly in zoonotic iNTS as much as human adapted iNTS and S. Typhi, and the increased genome content may be required for survival in different host species. The new text now reads:

      “Our phenotypic data demonstrated a striking difference in replication dynamics between ST10 and ST74 populations in human macrophages. ST74 isolates replicated significantly over 24 hours, whereas ST10 isolates were rapidly cleared after 9 hours of infection. ST74 induced significantly less host cell death during the early-mid stage of macrophage infection, supported by limited processing and release of IL-1ß at 9 hpi. While NTS are generally potent inflammasome activators (60), most supporting data come from laboratory-adapted S. Typhimurium strains. Our findings suggest that ST74 isolates may employ immune evasion mechanisms to avoid host recognition and activation of cell death signaling in early infection stages. Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection. Consistent with this, we observed comparable cytotoxicity between ST10 and ST74 isolates at 24 hpi, suggesting ST74 induces cell death via alternative mechanisms once intracellular bacterial numbers are unsustainable. Further research is needed to identify genomic factors underpinning these observations.”

      (6) On the epidemiology scale, ST10 is more successful, perhaps due to its ongoing adaptation to replication inside GI epithelial cells, favouring shedding. ST74 may tend to cause more invasive disease and less transmission via fecal shedding. The presence of T6SS in ST10 also can benefit its competition with other gut commensals, overcoming gut colonization resistance. The reviewer thinks that these details should be more clearly rephrased in the Discussion, as the results highly suggested different adaptations of two genotypes of the same serovar, leading to different epidemiological success.

      We thank the reviewer for highlighting that we could rephrase this important point. We have added additional text in the Discussion to better interpret the differences in the two genotypes of S. Dublin and how this relates to difference epidemiological success. The new text now reads:

      “While machine learning predicted lower invasiveness for ST74 compared to ST10, the increased genomic content of ST74 may support higher replication in macrophages. We speculate that increased intracellular replication could enhance systemic dissemination, though this requires in vivo validation. Invasiveness of S. enterica is often linked to genome degradation (4,62–64). However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage. Collectively, these findings highlight phenotypic differences between S. Dublin populations ST10 and ST74. Enhanced intra-macrophage survival of ST74 could promote invasive disease, whereas the prevalence of ST10 may relate to better intestinal adaptation and enhanced faecal shedding. In vivo models are needed to test this hypothesis. Interestingly, the absence of SPI-19 in ST74, which encodes a T6SS, may reflect adaptation to enhanced replication in macrophages. SPI-19 has been linked to intestinal colonisation in poultry (23,56) and mucosal virulence in mice (56). It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts. These findings highlight important knowledge gaps in zoonotic NTS host-pathogen interactions and drivers of emerging invasive NTS lineages with broad host ranges.”

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understanding its evolution. The phenotyping of isolates of ST10 and ST74 also offers insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high-income settings. However, there is no selection bias; this is simply a consequence of publically available sequences.

      Reviewer #1 (Recommendations for the authors): 

      (1) The Abstract did not summarize the main findings of the study. The authors should rewrite to highlight the key findings in genomic epidemiology (low AMR generally, novel plasmid of which Inc type, etc.) and the in vitro experiments. The findings clearly illustrate the differing adaptations of the two genotypes. Suggest to omit 'economic burden' and 'livestock' as this study did not specifically address them.

      We agree with the Reviewer and have re-written the abstract to directly reflect the major outcomes of the research. We have also deleted wording such as ‘livestock’, ‘economic burden’ and ‘One Health’ as we did not specifically address these issues as highlighted by the Reviewer. 

      (2) Figure 2: The MCC tree should include posterior support in major internal nodes. The current colour scheme is also confusing to readers (columns 1, 2). Suggest to revise and include additional key information as columns: major AMR genes (blaCMY-2, strAB, floR) and mer locus, so this info can be visualized in the main figure. 

      Thank you for your valuable feedback. We have revised Figure 2 with the MCC tree to include posterior support on the internal nodes. We have also amended the figure legend to explain the additional coloured internal nodes. We have also amended the heatmap in Figure 2 to include additional white space between the columns to make it easier for the readers to distinguish. We didn’t change the colours in this figure as we have used the same colours throughout for the different traits reported in this study. Further, we chose to keep the AMR profiles reported in Figure 2 at the susceptible, resistant or MDR. This was done to convey the overview of the AMR profiles, and we provide detail in the AMR and HMR determinants in the Supplementary Figures and Tables. 

      (3) The manuscript title is not informative, as it did not study the 'dynamics' of the two genotypes. Suggest to revise the study title along the lines of main results.

      Thank you for the feedback on the title. We have amended this to better reflect the main findings of the study, and it now reads as “Distinct adaptation and epidemiological success of different genotypes within Salmonella enterica serovar Dublin”

      (4) The co-occurrence of AMR and heavy metal resistance genes (like mer) are quite common in Salmonella and E. coli. This is not a novel finding. The reviewer would suggest shortening the details related to heavy metal resistance in Results and Discussion, to make the writing more streamlined. 

      In line with the Reviewer comments, we have shortened the details in the Results and Discussion on the co-occurrence of AMR and HMR.  

      (5) L185: missing info after n=82. 

      This has been revised to now read as “n=82 from Canada”. 

      (6) I think Vi refers to the capsular antigen, not flagelle. Please double-check this.

      Thank you for highlighting this mistake. We have revised all instances.

      (7) L252-253: which statistic was used to state 'no association'. Also, there is no evidence presented to support 'no fitness cost associated with resistance and virulence."

      We have removed this sentence.

      (8) 320: Figure 6F is a scatterplot, not PCA. Please confirm. 

      The reviewer is correct, this is in fact a scatterplot. We have amended the figure legend and text.

      (9) For Discussion, it would be helpful to compare the phenotype findings with that of other invasive Salmonella like Typhi or Typhimurium ST313.

      Thank you for noting this, we had alluded to findings from ST313 but have now expanded include some further comparisons to S. Typhimurium ST313 and added references for these within the Discussion. The additional text now reads:

      “Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection.”

      "Invasiveness of S. enterica is often linked to genome degradation (4,62–64).

      However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage.”

      (10) L440: no evidence for "successful colonization" of ST74. Actually, the findings suggested otherwise.

      Thank you for picking this up, we have amended the sentence to better reflect the findings. The amended text now reads as:

      “It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts.”

      (11) L460-461: The data did not show an increasing trend of iNTS related to S. Dublin.

      Thank you for identifying this. This sentence has been revised accordingly and now reads as:

      “While the data did not indicate an increasing trend of iNTS associated with S. Dublin, the potential public health risk of this pathogen suggests it may still warrant considering it a notifiable disease, similar to typhoid and paratyphoid fever.”

      (12) L465: Data were not analyzed explicitly in the context of animal vs. human. Suggest omitting 'One Health' from the conclusion.

      Thank you for the suggestion. We have omitted “One Health” from the conclusion

      (13) L500: Was the alignment not checked for recombination using Gubbins? The approach here is inconsistent with the method described in the subtree selected for BEAST analysis (L546).

      We have now applied Gubbins to the phylogenetic tree constructed using IQTREE, and the methods and results have been updated accordingly.

      (14) What was the output of Tempest? Correlation or R2 value? 

      We have now included the R2 value from Tempest and reported this in the manuscript. 

      (15) L556: marginal likelihood to allow evaluation of the best-fit model. Please rephrase to state this clearly.

      We have rephrased this in the manuscript to state this clearly.

    1. eLife Assessment

      This valuable study reports that epididymal proteins are required for embryogenesis after fertilization. The data presented are generally supportive of the conclusion and considered solid. This work will be of interest to reproductive biologists and andrologists.

    2. Reviewer #1 (Public review):

      Summary:

      The main observation that the sperm from CRISP proteins 1 and 3 KO lines are post-fertilization less developmentally competent is convincing. The data showing progressive acquisition of the sperm defects during epididymal transport and the exchange fluid studies showing the altered epididymal environment are important. However, the molecular characterization of the mechanism(s) that leads to these defects requires additional studies.

      Strengths:

      The generation of these double mutant mice is valuable for the field. Moreover, the fact that the double mutant line of Crisp 1 and 3 is phenotypically different from the Crisp 1 and 4 line suggests different functions of these epididymis proteins. The methods used to demonstrate that developmental defects are largely due to post-fertilization defects are also a considerable strength. The initial characterization that these sperm have altered intracellular Ca2+ levels, and increased rates of DNA fragmentation are valuable. The increase fragmentation of control sperm DNA when exposed to mutant epididymal fluid is significant and an excellent platform for future studies.

      Weaknesses:

      The study is mechanistically incomplete because evidence of how these proteins alter the environment is not shown. What are the target(s) of these proteins that result in increased Ca2+?

    3. Reviewer #2 (Public review):

      Summary:

      The study highlights the role of CRISP1 and CRISP3, two epididymal proteins, in early embryo development through DNA integrity. The authors demonstrate that C1/C3 DKO sperm exhibit defects in the DNA integrity, probably due to Ca2+ dysregulation in the epididymis. However, direct evidence for this mechanism requires further experiments. The finding of the involvement of the epididymal environment in embryogenesis is significant, but some results on sperm fertilizing ability of C1/C3 DKO mice were similar to the previous report. Thus, this point raises concern about the perspective of novelty.

      Strengths:

      The authors demonstrate that CRISP1 and CRISP3 regulate Ca2+ in the epididymal fluid, and loss of CRISP1 and CRISP3 disrupts Ca2+ regulation in the epididymal fluid, leading to sperm DNA fragmentation and impaired embryonic development after fertilization. This proposed mechanism is both novel and intriguing, offering valuable insights into the epididymal control of sperm quality.

      Weaknesses:

      The evidence supporting the mechanism of CRISP1 and CRISP3 in calcium regulation within epididymis and its contribution to the sperm DNA damage remains limited.

      Major comments:

      The data provided in this manuscript (Figure 2A and B) appear to overlap with data in previously published paper (PMID:33037689), despite differences in the duration of in vivo fertilization after mating. The results in both studies show similar findings, raising concerns about potential data redundancy.

      As shown in Figure 6A, while wild-type sperm were exposed to the epididymal fluid of C1/C3 DKO mice, the wild-type sperm exhibited DNA fragmentation. Additionally, when wild-type sperm were exposed to the epididymal fluid of wild-type mice with 10 mM Ca2+, DNA fragmentation is still observed. Therefore, the authors conclude that the DNA fragmentation in C1/C3 DKO sperm is due to the increased level of the Ca2+. However, the connection between the DNA damage in wild-type sperm exposed to the epididymal fluid of C1/C3 DKO mice and the increased levels of Ca2+ remains unclear. To clarify this, it is suggested that intracellular calcium levels in the wild type sperm should be analyzed before and after exposure to the epididymal fluid of C1/C3 DKO mice (or before and after adding 10 mM Ca2+ into wild-type fluid). Furthermore, the author should explain detailed information on epididymal fluid collection, because Ca2+ levels vary between different sections of the epididymis.

      In lines 321-323, the authors mention the selection system of the female reproductive tract that only allows high-quality sperm to reach the eggs (Cummins and Yanagimachi 1982), but this paper is not listed in the bibliography. It is important to ensure proper referencing.

      The discussion section is too long and difficult to follow well because there is redundancy of the results in many parts. It is recommended to shorten it by focusing only on relevant and important information.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main observation that the sperm from CRISP proteins 1 and 3 KO lines are postfertilization less developmentally competent is convincing. However, the molecular characterization of the mechanism that leads to these defects and the temporal appearance of the defects requires additional studies.

      We thank the reviewer for the valuable comments. As requested, additional experiments were carried out to analyze both the molecular mechanisms and the temporal appearance of the observed defects. Our results showed that DNA integrity defects appear during epididymal maturation and/or storage (see Figure 5B), that the epididymal fluid contributes to sperm DNA fragmentation defects (See Figure 6A) and that these defects seem not to be due to an increase in oxidative stress (Figure 5C) but rather to a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis (Figure 6A,B).

      Strengths:

      The generation of these double mutant mice is valuable for the field. Moreover, the fact that the double mutant line of Crisp 1 and 3 is phenotypically different from the Crisp 1 and 4 line suggests different functions of these epididymis proteins. The methods used to demonstrate that developmental defects are largely due to post-fertilization defects are also a considerable strength. The initial characterization of these sperm has altered intracellular Ca<sup>2+</sup> levels, and increased rates of DNA fragmentation are valuable.

      We thank the reviewer for the positive comments on our work.

      Weaknesses:

      The study is mechanistically incomplete because there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein during the passage through the epididymis the sperm become affected. Also, a direct demonstration of how the proteins in question cause or lead to DNA damage and increased Ca<sup>2+</sup> requires further characterization.

      The new experiments included in the revised version (see Figure 6A) showed that exposure of control WT sperm to epididymal fluid form mutant mice leads to an increase in sperm DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid wherein the sperm become affected. In addition, new observations showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the finding that mutant sperm exhibit higher intracellular Ca<sup>2+</sup> levels (Figure 6B) but no higher levels of ROS, strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Reviewer #2 (Public Review):

      The authors showed that CRISP1 and CRISP3, secreted proteins in the epididymis, are required for early embryogenesis after fertilization through DNA integrity in cauda epididymal sperm. This paper is the first report showing that the epididymal proteins are required for embryogenesis after fertilization. However, some data in this paper (Table 1 and Figure 2A) are overlapped in a published paper (Curci et al., FASEB J, 34,15718-15733, 2020; PMID: 33037689). Furthermore, the authors did not address why the disruption of CRISP1/3 leads to these phenomena (the increased level of the intracellular Ca<sup>2+</sup> level and impaired DNA integrity in sperm) with direct evidence. Therefore, if the authors can address the following comments to improve the paper's novelty and clarification, this paper may be worthwhile to readers.

      We thank the reviewer for the constructive comments. Regarding the data included in Table 1 and Figure 2A, it is important to note that Table 1 includes data on embryo development corresponding to C1/C4 DKO mice not published before in which the data on embryo development corresponding to C1/C3 DKO was used as simultaneous control. Figure 2A showed in vivo fertilization results at short times after mating (4h instead of 18 h) that have been neither reported before.

      Regarding studies to address why the disruption of CRISP1 and CRISP3 leads to defects in DNA integrity and Ca<sup>2+</sup> levels, we have carried out new experiments showing that mutant sperm do not exhibit higher levels of ROS (see Figure 5C), not favoring oxidative stress as the mechanism underlying mutant sperm defects. In addition, we found that DNA integrity defects develop during epididymal transit (Figure 5B) and that exposure of WT sperm to epididymal fluid from mutant mice leads to an increase in sperm DNA fragmentation levels (Figure 6A), confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid. Finally, our new results showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the higher intracellular Ca<sup>2+</sup> levels detected in mutant sperm (Figure 6B) strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall comments:

      This manuscript investigates the mechanisms whereby the absence of the epididymal CRISP proteins 1 and 3 (Cysteine-Rich Secretory Proteins) causes infertility and lower embryo developmental rates. This strain's infertility seems to have a post-fertilization origin because the rates of in vivo fertilization are like the controls, but the development to the blastocyst stage is decreased. The results of this study show that (1) mutant sperm viability, progressive motility, and morphology are normal;

      (2) in vivo fertilization rates are comparable to controls, but embryo development is reduced;

      (3) in vitro fertilization studies found reduced fertilization rates and activation rates even in zona-free studies;

      (4) additional functional studies showed increased rates of DNA fragmentation and elevated Ca<sup>2+</sup> levels in mutant sperm.

      The results presented are credible and hint that the epididymis might play a role before and after fertilization and directly affect embryo development. However, the study is mechanistically incomplete, as there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein the passage through the epididymis the sperm become functionally defective, and whether mutant or control epididymal fluid or purified CRISP proteins can change, either reduce or overcome, respectively, the developmental competence of the control or mutant sperm and induce functional changes in the counterpart sperm. In summary, the main observation that the sperm from CRISP proteins 1 and 3 KO lines are post-fertilization less developmentally competent is significant and important, but the molecular characterization of the defects and the temporal appearance of defects requires additional studies.

      Specific comments:

      (1) Introduction.

      It is too long. The description of the function of the epididymis should be reduced. The functional properties of the Crisp genes should also be substantially shortened.

      As requested, the Introduction has been revised and descriptions of the epididymis and CRISP have been shortened

      (2) Results.

      • Lines 140 to 142. Remove these initial lines. Start directly addressing the results of the C1/C3 strain, which is the mutant under consideration here. Referring to the C1/C4 results detracts from the focus of the study.

      As suggested by the reviewer, lines 140 to 142 have been removed.

      • Table 1. Move the two-cell embryo line to the top of the Table and place the Blastocyst line below it. This organization is the conventional method to present this type of data.

      As suggested, the order of the lines in Table 1 has been modified to align with the conventional presentation method.

      • Figures 1 and 2A and B data are solid and support the notion that enough sperm reach the site of fertilization, and that the sperm are defective in their capacity to support embryo development. Figures 2C and D have interesting data, although additional information would strengthen these results. The authors concluded that the sperm were defective in the epididymis. Where in the epididymis? These sperm were all from the cauda. Could the authors collect sperm from the upper portion of the cauda, or midportion, and compare if the defects manifest gradually?

      We appreciate this interesting and appropriate comment from the reviewer. In this regard, all the studies in our work were carried out using sperm from the whole cauda epididymis, the reason why we could not answer where defective sperm appear in the epididymis. In view of this, we have now conducted a comparative DNA fragmentation analysis between caput and cauda sperm from both genotypes. Our findings indicate that while cauda mutant sperm showed once again higher DNA fragmentation levels than controls, caput sperm exhibited levels of DNA damage not significantly different between genotypes. These results confirm that defects in DNA appear following sperm passage through the epididymal caput, supporting the hypothesis that defects in DNA fragmentation manifest during sperm transit through the epididymis and /or during storage in the cauda. These results have been included in the revised version of the manuscript (see lines 235-240/Figure 5B of the revised version)

      • Figure 3 displays the results of in vitro fertilization, either COCs A-C or zona-free fertilization D-F. The results are important and differ from those produced by fertilization in vivo. The authors indicate that these confirm that the in vivo conditions overcome in vitro defects. However, this study never addresses the reason behind it. Is there less expression of proteins related to these functions, or the function of some proteins is compromised? The authors should advance a hypothesis or a rationale to explain these results.

      As indicated by the reviewer, our results showed differences between the fertilization rates observed for mutant mice under in vivo and in vitro conditions, as previously observed for all our single and multiple KO models (Da Ros et al., 2008; PMID: 18571638, Brukman et al., 2016; PMID: 26786179, Weigel Muñoz, 2018; PMID: 29481619, Ernesto et al., 2015; PMID: 26416967, Carvajal et al,. 2018; PMID: 30510210) and also reported by other groups (Okabe et al., 2007; PMID: 17558467). In this regard, it has been well established that, although millions of sperm are ejaculated into the female tract, only a few (approximately one per oocyte) reach the fertilization site (i.e. the ampulla) (Cummins and Yanagimachi, 1982; doi:10.1002/mrd.1120050304). This efficient selection system by the female reproductive tract leads to the arrival of only the best sperm at the fertilization site, even in males with reproductive deficiencies, thereby “masking” sperm defects that can be detected under in vitro conditions due to the competition between good and bad quality sperm for the egg. Thus, although we can not exclude other mechanisms to explain the commonly observed differences between in vivo and in vitro fertilization rates, our rationale is that the natural and efficient sperm selection process that takes place within the female reproductive tract masks sperm defects that can, otherwise, be detected under the competitive in vitro conditions. This explanation is now included in the discussion of the revised version of the manuscript (see lines 320-325).

      • Data in Figures 4 and 5 support the interpretation of the authors. However, it is necessary to establish the level of oxidative stress in the mutant sperm vs. the controls. Also, a question to explore is for how long does the sperm need to reside in that mutant environment to start undergoing the DNA fragmentation reported?

      In response to the valuable request from the reviewer regarding the level of oxidative stress in sperm, we have analyzed reactive oxygen species (ROS) levels in mutant and control epididymal sperm. Our results showed that ROS levels in mutant sperm were not higher than those observed in the control group, supporting the idea that mechanisms other than oxidative stress may be leading to the increased DNA fragmentation observed in mutant sperm. These results are now included in the revised version of the manuscript (see Figure 5C).

      Regarding the question on how long the sperm need to reside in the mutant environment to undergo DNA fragmentation, recent experiments carried out in response to this reviewer in which we analyzed DNA fragmentation in caput sperm led us to conclude that DNA fragmentation develops during epididymal transit and/or storage in the cauda. While these observations do not precisely define the time within the epididymis that sperm require for exhibiting DNA fragmentation, our additional new in vitro experiments analyzing the effect of epididymal fluids on sperm DNA integrity showed that exposure of WT sperm to DKO fluid for only 1 hr already leads to an increase in DNA fragmentation (see Figure 6A of the revised manuscript), suggesting that sperm do not need long periods within the mutant environment to be affected.

      (3) The length of the Discussion section should be shortened, especially by not recapitulating data presented in the Results section.

      As requested by the reviewer, sections recapitulating results have been modified.

      Minor comments:

      (1) The sentence in lines 171 and 172 is unclear, "However, despite the short time after mating, once again, the in vivo fertilized eggs corresponding to the mutant group exhibited clear defects to reach the blastocyst stage in vitro compared to controls." What do the authors mean by short time? It is the expected time, correct?

      It is well established that after copulatory plug formation, most oocytes are fertilized within 2 to 8 hours, with fertilization rates that increase over time: 0–5% at 1.5 hours post-mating; 40% at 4 hours post-mating and more than 90% at 7 hs after mating (Muro et al., 2016; PMID: 26962112, La Spina et al., 2016; PMID: 26872876). In order to examine whether the embryo development defects observed for mutant mice were due to a delayed arrival of sperm to the ampulla, we decided to analyze the percentage of fertilized eggs recovered from the ampulla at “short times” (4 hs) after mating to avoid the possibility that the prolonged stay of sperm within the female tract corresponding to the usual “overnight mating” schedule could be giving defective sperm enough time to reach the ampulla and, finally, fertilize the eggs (i.e. delayed fertilization). Our results showed that, despite the expected lower fertilization rates observed for both control and mutant males when analyzed just 4 hs after mating, the fertilized eggs corresponding to the mutant group were still exhibiting clear defects to develop into blastocysts compared to controls, not favoring the idea that embryo development defects were due to a delayed fertilization. The sentence in lines “171 and 172” has been modified in the revised version of the manuscript to better explain this conclusion (see lines 152-155 of the revised version).

      (2) Line 177. Mutant epididymal sperm already carry defects leading to embryo development failure. Under this subheading, the authors compare within the same female the ability of mutant and control sperm delivered into different horns to support fertilization and embryo development. They show that the embryo development induced by mutant sperm is diminished vs. controls under very similar conditions, confirming the previous results of post-fertilization failure. The data also answers the question raised by the authors of whether the fertilization defects appear during or after epididymal transit; the interpretation of the results is the functional defects in the sperm are present before the transport into the female tract. Important unaddressed questions are, could these defects begin even earlier before arriving at the cauda? Did the authors try to incubate the mutant sperm with the epididymal fluid of WT mice to examine if the sperm defects could be rescued? The opposite experiment could also be performed, where WT sperm are incubated with the epididymal fluid of mutant mice, and the treated sperm examined for altered Ca<sup>2+</sup> levels or DNA fragmentation.

      First of all, we would like to clarify that our question about whether the fertilization defects appear “during or after epididymal transit” was in fact referring to whether defects appear during epididymal maturation or later on, at the moment of ejaculation. In this regard, our in vivo and in vitro fertilization studies allowed us to conclude that defects were already present in epididymal sperm without excluding the possibility that additional defects could appear at the vas deferens or at the moment of ejaculation due to the contribution of seminal plasma secretions.

      Regarding whether sperm defects could appear even earlier before arriving to the cauda, we have now analyzed DNA fragmentation defects in caput vs cauda both mutant and control sperm observing differences between genotypes only for cauda sperm. Based on these observations, we conclude that DNA integrity defects appear within the epididymis after sperm passage through the caput either when sperm reach the corpus or the cauda epididymis, or during their storage within the cauda region.

      Also, as suggested by the reviewer, we incubated in vitro WT sperm with epididymal fluid from DKO mice (and vice versa) and then analyzed DNA fragmentation levels. Results showed that exposure of control sperm to the mutant epididymal fluid for 1 hr significantly increased DNA fragmentation levels. When mutant sperm (exhibiting higher levels of DNA fragmentation than control sperm), were exposed to epididymal fluid from WT mice, no differences between groups were observed. Together, these results confirm both that the epididymal fluid from mutant mice contributes to the higher DNA fragmentation levels detected in mutant sperm, and that normal epididymal fluid would not be able to rescue the DNA fragmentation present in mutant cells. These results are now included in the revised version of the manuscript (see Figure 6A).

      (3) Lines 203 to 216. In these paragraphs the authors indicate "that mutant sperm had a lower percentage of fertilization and lower rates of blastocysts (Figure 3D, E), indicating that defects in egg coat penetration were not responsible for embryo development failure. Later, they indicated that a few eggs fertilized by mutant sperm failed to activate. It is shown that Ca<sup>2+</sup> oscillations are normal, indicating that the defects lie elsewhere. Could the authors propose a mechanism based on their sperm DNA defects?

      As described in the Result and Discussion sections of the original manuscript, we decided to investigate the existence of possible defects in sperm DNA fragmentation based on evidence indicating that delays in early embryo development may result from the time taken by the egg to repair damaged paternal DNA (Esbert et al., 2018; PMID: 30259705, Newman et al., 2022; PMID: 34954800, Nguyen et al., 2023; PMID: 37658763). In this regard, it is known that time is needed before the first embryonic cell division for activation of the egg DNA repairing machinery (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800) and that increased sperm DNA damage may necessitate more time for repair by the oocyte (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800). Based on this, we decided to examine possible DNA damage in sperm. Our finding that, in fact, sperm DNA fragmentation was clearly increased in mutant sperm led us to propose that delays in early embryo development in our mutant colonies may result from the time required by the egg to repair sperm DNA fragmentation.

      (4) The demonstration that C1/C3 sperm have abnormal rates of DNA fragmentation and Ca<sup>2+</sup> levels is significant. Additional studies would strengthen the findings reported here. For example, what are the levels of oxidative stress in these sperm? Are there other changes related to oxidative stress? Performing a TUNNEL assay will strengthen the notion of DNA damage demonstrated here with the chromatin dispersion assay.

      As mentioned previously, we analyzed oxidative stress by evaluating ROS levels in control and mutant sperm observing no differences between genotypes. These results have been included in the revised version of the manuscript (See Figure 5C). We appreciate the suggestion of performing TUNNEL assay for future studies.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) There are some reports small RNAs gained during the epididymal transition of sperm are essential for embryonic development (e.g., Conine et al., Dev Cell, 46, 470480, 2018; PMID: 30057276), suggesting that the luminal changes in Crisp1/3 double KO (dKO) epididymis lead to the phenotype in this study. In fact, there is no evidence whether CRISP1/CRISP3 secreted from an epididymis exists in cauda epididymal sperm and directly controls the observed phenomena. Also, the authors wrote there is no strong evidence to exclude the possible role of small RNA in Crisp1/3 dKO sperm (lines 370-372). Therefore, it is at least necessary to measure small RNA abundance in dKO mice.

      As mentioned by the reviewer and as cited in our manuscript, there is a report indicating that the small RNAs gained during epididymal transit may play a role in embryonic development (Conine et al., 2018; PMID: 30057276). However, the need of small RNAs for embryonic development still remains a topic of debate (Wang et al. 2020; PMCID: PMC7799177). In this regard, clear evidence indicating that sperm DNA fragmentation is associated with embryo development defects together with the increase in sperm DNA fragmentation levels observed in mutant sperm support sperm DNA damage as one of the causes leading to the observed phenotype in our mutant mice. Moreover, recent experiments carried out in response to Reviewer 1 comments revealed that exposure of control sperm to epididymal fluid from mutant mice significantly increases DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 proteins in epididymal fluid contributes to sperm DNA damage in mutant sperm. Finally, whereas oxidative stress might also lead to embryo development impairment as mentioned in our original manuscript, recent evaluation of ROS levels in control and mutant sperm carried out in response to Reviewer 1’s comments did not show higher ROS levels in mutant sperm. Thus, although as mentioned in the manuscript, we do not exclude the possibility that small RNAs may also contribute to embryo development defects, our observations support DNA fragmentation and a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for embryo development failure in our mutant males. The experiments using epididymal fluid (Figure 6A) and those evaluating ROS levels (Figure 5C) have been included in the revised version of the manuscript and discussed accordingly.

      (2) Lines 245-248 and 354-374: According to Figure 5C, the intracellular Ca<sup>2+</sup> level significantly increased in Crisp1/3 dKO sperm compared to control. The author hypothesized that this increase could destroy sperm DNA integrity, causing defects in early embryogenesis. However, the authors did not show the direct evidence.

      Specifically, as CRISP1 inhibits CatSper (line 95), the authors believed the increased Ca<sup>2+</sup> level in Crisp1/3 dKO sperm was observed. Crisp1/3 dKO and Crisp1/4 dKO mice share the disruption of Crisp1, but the phenotype is totally different. Thus, the authors should also examine the CatSper activity in Crisp1/3 dKO sperm.

      We appreciate the reviewer's insightful comments. In this regard, whereas C1/C3 and C1/C4 DKO colonies shares the disruption of Crisp1, the intracellular Ca<sup>2+</sup> levels in these two colonies are different as no increase in sperm intracellular Ca<sup>2+</sup> was detected in Crisp C1/C4 DKO mice. Thus, this difference in intracellular Ca<sup>2+</sup> levels might explain the different embryo development phenotype observed in our two DKO colonies. In this regard, our results revealed that sperm intracellular Ca<sup>2+</sup> levels are different depending on the Crisp gene being deleted. Whereas the lack of Crisp1 did not affect intracellular sperm Ca<sup>2+</sup> levels (Weigel Munoz et al, 2018; PMID: 29481619), there was an increase in Ca<sup>2+</sup> levels in CRISP2 KO sperm (Brukman et al., 2016; PMID: 26786179) and a decrease in sperm when Crisp4 was deleted (Carvajal 2019, Ph.D Thesis). Thus, although the ability of CRISP3 to regulate sperm Ca<sup>2+</sup> channels has not yet been reported, the existence of functional compensations between homologous CRISP members (Curci et al., 2020; PMID: 33037689) makes it complicated to draw straightforward conclusions based on the behavior of each individual protein in Ca<sup>2+</sup> regulation. In fact, while the lack of CRISP1 and CRISP4 does not affect sperm Ca<sup>2+</sup> concentration (Carvajal 2019, Ph.D Thesis), the simultaneous lack of CRISP1 and CRISP3 produced an increase in Ca<sup>2+</sup> levels and the lack of the four CRISP proteins showed a decrease in the intracellular levels of the cation after capacitation (Curci et al, 2020). Based on these observations, we conclude that the absence of CRISP1 may or may not lead to altered intracellular Ca<sup>2+</sup> levels depending on the other simultaneously-deleted gene/s.

      The authors make a hypothesis that the increased Ca<sup>2+</sup> level may lead to damaged DNA integrity by citing a published paper (lines 360-363). In the published paper, the authors examined the influence of the luminal fluid of the epididymis and vas deference on sperm chromatin fragmentation (Gawecka et al., 2015). However, they did not mention the increased DNA fragmentation in epididymal sperm when these sperm were incubated with Ca<sup>2+</sup> or Mn2+. So, the authors' hypothesis is over discussion. Thus, the correlation between the intracellular Ca<sup>2+</sup> level and DNA integrity in sperm is still unclear. So, the authors should show why the increased Ca<sup>2+</sup> level leads to DNA fragmentation with direct evidence.

      We appreciate the reviewer’s comment regarding the work by Gawecka et al., (2015), and the opportunity to clarify the proposed mechanism underlying our observations. In the above mentioned paper, the authors reported that when mouse epididymal or vas deferens sperm were incubated with divalent cations (Ca<sup>2+</sup> and Mn<sup>2+</sup>) in the presence of luminal fluid, they were induced to degrade their DNA in a process termed sperm chromatin fragmentation (SCF). The fact that both the ejaculated and epididymal mutant sperm used in our studies had been exposed to epididymal fluid lacking CRISP proteins known to regulate sperm Ca<sup>2+</sup> channels, opened the possibility that changes in Ca<sup>2+</sup> levels within the epididymal fluid and/or sperm could be responsible for the higher DNA fragmentation levels observed in mutant cells. In this regard, it is important to note that, as requested by Reviewer 1, we performed additional in vitro experiments in which WT epididymal sperm were exposed to mutant or WT epididymal fluid in the presence or absence of Ca<sup>2+</sup> and DNA fragmentation analyzed at the end of incubation. Results showed a significant increase in DNA fragmentation in WT sperm exposed to either mutant epididymal fluid or WT fluid in the presence of Ca<sup>2+</sup> (Figure 6A). We believe these observations together with the higher intracellular Ca<sup>2+</sup> levels detected in DKO sperm (Figure 6B) provides strong evidence supporting changes in Ca<sup>2+</sup> homeostasis in the epididymis and sperm as the main responsible for the observed sperm DNA integrity defects. This could be mediated by the activation of Ca<sup>2+</sup>-dependent nucleases present within the epididymal fluid and/or sperm cells as previously suggested (Shaman et al., 2006; PMID: 16914690, Sotolongo et al., 2005; PMID: 15713834, Boaz et al., 2008; PMID: 17879959, Dominguez and Ward, 2009; PMID: 19938954). These observations have now been included and discussed in the revised version of the manuscript (see lines 245-265 and 427-439).

      Minor Comments:

      (3) Standards for measuring rates should be clarified, such as two-cell rates are determined by dividing the number of two-cell embryos by the total number of eggs.

      As requested, standards for measuring rates have now been clarified in the corresponding figure legends

    1. eLife Assessment

      This study provides valuable information on a novel gene that regulates meiotic progression in both male and female meiosis. The evidence supporting the conclusions of the authors is solid. This study will be of interest to developmental and reproductive biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses highlighted previously:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice.

    3. Reviewer #2 (Public review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field.

    4. Reviewer #3 (Public review):

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve.

      Strengths:

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth.

      Weaknesses highlighted previously:

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case.

      [Editors' note: We thank the authors for considering the previous recommendations and suggested corrections.]

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool. 

      Strengths: 

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool. 

      Weaknesses: 

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. 

      We sincerely appreciate the reviewer’s thoughtful evaluation of our work and recognition of the strengths of our study. We are especially grateful for the acknowledgment of the novelty of our findings regarding the role of BEND2 in female fertility. While we extensively characterized the e ects of BEND2 depletion in male meiosis, we agree that the phenotype observed in females provides particularly interesting insights into the establishment of the primordial follicle pool. 

      Reviewer #2 (Public review): 

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically di icult to study due to its location on the X chromosome and male sterility of global knockout animals. 

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field. 

      We are truly grateful for the reviewer’s thoughtful review and recognition of the key contributions of our study. We appreciate the acknowledgment of how our model overcomes the challenges in studying BEND2 and the importance of our findings in both male and female meiosis. We also value the reviewer’s encouraging comments on our responsiveness to their feedback and the quality of our data and conclusions.

      Reviewer #3 (Public review): 

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve. 

      Strengths: 

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth. 

      Weaknesses: 

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case. 

      We greatly appreciate the reviewer’s comments and recognition of the strengths of our work. We are grateful for their acknowledgment of our findings related to the truncated isoform of Bend2 and its e ect on ovarian reserve. We also agree that, although our study provides important insights, we are still far from directly applying these results to human clinical scenarios. There is much further research needed before these findings can be translated. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):: 

      The authors have addressed all concerns both editorially and experimentally. This is a very nice manuscript, and I congratulate the authors on their work. 

      We sincerely appreciate your kind words and thoughtful review. Your feedback has been invaluable in improving our manuscript, and we are grateful for your time and effort. Thank you for your support and encouragement!

      Reviewer #2 (Recommendations for the authors):: 

      In Figure 3, graphs in panels C & D have typos in the early zygotene column where it reads "zyotene". 

      We appreciate your careful review and for pointing out the typos in Figure 4, which has been corrected in the new version of the manuscript. 

      Reviewer #3 (Recommendations for the authors): 

      ・Since there are two isoforms of Bend2, and the authors depleted one isoform, this is not suitable to use "full length" in the titles and in the manuscripts. 

      We respectfully disagree with the reviewer’s comment. In our mouse model, we specifically remove the full-length isoform of Bend2. Therefore, we consider it appropriate to refer to it as such in the manuscript. Our results indicate that the full-length isoform is not required to complete meiotic prophase in males but is indispensable for setting up the ovarian reserve in females. We appreciate the reviewer’s input and are happy to clarify this point further if needed.

      ・Is there any reason why authors used 7 month old females for in vitro fertilization? It may not be recognized as aged mice but it seems a bit old to perform IVF especially when the ovarian reserve in mutant mice is decreased. If there is any reason, please clarify it. In addition, since the authors added IVF data, which showed similar fertilization ratio between control and mutant, the authors need to discuss why the litter size was decreased in mutant mice. It may be to strong to conclude "subfertility". 

      We used 7-month-old females for IVF because this falls within the age range of the samples analyzed for ovarian reserve, with the oldest females being 8 months old. Regarding the apparent discrepancy between IVF results and litter size, we addressed this in the discussion section of the manuscript: 'Interestingly, our mutant oocyte quality analysis suggests that mature oocytes from mutant females are equally competent to develop into a blastocyst as control ones. These data suggest that the subfertility observed in Bend2 mutants may be due to errors in later developmental stages, such as implantation or organogenesis.' We appreciate the reviewer’s feedback and hope this clarification helps.

    1. eLife Assessment

      This important study shows that a very slow (infraslow) oscillation occurs in voltage recordings from the dentate gyrus of the adult mouse. The authors suggest that it is related to sleep stage and serotonin acting at one type of serotonin receptor in the dentate gyrus. The results are significant because they suggest new insight into how a slow oscillation affects memory through serotonin receptors in the dentate gyrus. Convincing data are provided to support the claims.

    2. Reviewer #1 (Public review):

      Turi, Teng and the team used state of the art techniques to provide convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory. First, they showed that the glutamatergic DG cells become activated following an infraslow rhythm during NREM sleep. In addition, the infraslow oscillation in the DG is correlated with rhythmic serotonin release during sleep. Finally, they found that specific knockdown of 5-HT receptors in the DG impairs the infraslow rhythm and memory, suggesting that serotonergic signaling is crucial for regulating DG activity during sleep. Given that the functional role of infraslow rhythm still remains to be studied, their findings deepen our understanding on the role of DG cells and serotonergic signaling in regulating infraslow rhythm, sleep microarchitecture and memory.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep. The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep<br /> (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses from the original round of review:

      (1) The current data set and analysis are insufficient to interpret the observation correctly [...].

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Fig. 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Fig. 4), which reduces the reliability of this study.

      Comments on revisions:

      Thank you for the clarification of the detection criteria and the quantification of the specific events. This reviewer can now follow the authors' interpretation.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Turi, Teng and the team used state-of-the-art techniques to provide convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory. First, they showed that the glutamatergic DG cells become activated following an infraslow rhythm during NREM sleep. In addition, the infraslow oscillation in the DG is correlated with rhythmic serotonin release during sleep. Finally, they found that specific knockdown of 5-HT receptors in the DG impairs the infraslow rhythm and memory, suggesting that serotonergic signaling is crucial for regulating DG activity during sleep. Given that the functional role of infraslow rhythm still remains to be studied, their findings deepen our understanding on the role of DG cells and serotonergic signaling in regulating infraslow rhythm, sleep microarchitecture and memory.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep. The important findings are 1) the antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and 2) the GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.<br /> a. In Fig 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      b. In Fig 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Fig 1E. If MAs were clustered, please describe this properly.<br /> c. In Fig 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      d. In Fig 1C, please provide line plots connecting the same session. This request applies to all related figures.

      e. In Fig 2C, the significant increase during REM and the same level during NREM are not convincing. In Fig 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Fig 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      f. Fig 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      g. In Fig 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Fig. 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Fig. 4), which reduces the reliability of this study.

      Responses to weaknesses mentioned above have been addressed in the first revision.

      Comments on revisions:

      In the first revision, I pointed out the inappropriate analysis of the EEG/EMG/photometry data and gave examples. The authors responded only to the points raised and did not seem to see the need to improve the overall analysis and description. In this second revision, I would like to ask the authors to improve them. The biggest problem is that the detection criteria and the quantification of the specific event are not described at all in Methods and it is extremely difficult to follow the statement. All interpretations are made by the inappropriate data analysis; therefore, I have to say that the statement is not supported by the data.

      Please read my following concerns carefully and improve them.

      (1) The definition of the event is critical to the detection of the event and the subsequent analysis. In particular, the authors explicitly describe the definition of MA (microarousal), the trough and peak of the population level of intracellular Ca concentrations, or the onset of the decline and surge of Ca levels.

      (1-1) The authors categorized wake bouts of <15 seconds with high EMG activity as MA (in Methods). What degree of high EMG is relevant to MA and what is the lower limit of high EMG? In Fig 1E, there are some EMG spikes, but it was unclear which spike/wave (amplitude/duration) was detected as MA-relevant spike and which spike was not detected. In Fig 2E, the 3rd MA coincides with the EMG spike, but other EMG spikes have comparable amplitude to the 3rd MA-relevant EMG spike. Correct counting of MA events is critical in Fig 1F, 2F, 4C.

      We have added more information about the MA definition in Methods, including EMG amplitude. Furthermore, we have re-analyzed MA and MA-related calcium signals in Fig1 and Fig2. Fig-S1 shows the traces of EMG aptitude for all MA events show in Fig1G and Fig2G.

      (1-2) Please describe the definition of Ca trough in your experiments. In Fig 1G, the averaged trough time is clear (~2.5 s), so I can acknowledge that MA is followed by Ca trough. However, the authors state on page 4 that "30% of the calcium troughs during NREM sleep were followed by an MA epoch". This discrepancy should be corrected.

      We apologize for the misleading statement. We meant 30% of ISO events during NERM sleep. We have corrected this. To detect the calcium trough of ISO, we first calculated a moving baseline (blue line in Fig-S2 below) by smoothing the calcium signals over 60 s, then set a threshold (0.2 standard deviation from the moving baseline) for events of calcium decrease, and finally detected the minimum point (red dots in Fig-S2) in each event as the calcium trough. We have added these in Methods.

      (1-3) Relating comment 1-2, I agree that the latency is between MA and Ca through in page 4, as the authors explain in the methods, but, in Fig 1G, t (latency) is labeled at incorrect position. Please correct this.

      We are sorry for the mistake in describing the latency in the Methods. The latency was defined as the time difference between the onset of calcium decline (see details below in 1-4) and the onset of the MA. We have corrected this in the revised manuscript. Thus, the labeling in Fig1G was correct.

      (1-4) The authors may want to determine the onset of the decline in population Ca activity and the latency between onset and trough (Fig 1G, latency t). If so, please describe how the onset of the decline is determined. In Fig 1G, 2G, S6, I can find the horizontal dashed line and infer that the intersection of the horizontal line and the Ca curve is considered the onset. However, I have to say that the placement of this horizontal line is super arbitrary. The results (t and Drop) are highly dependent on the position of horizontal line, so the authors need to describe how to set the horizontal line.

      Indeed, we used the onset of calcium decline to calculate the latency as mentioned above. First, we defined the baseline (dashed line in Fig1G) by calculating the average of calcium signals in the10s window before the MA (from -15s to -5s in Fig1G). The onset of calcium decline is defined as the timepoint where calcium decrease was larger than 0.05 SD from this baseline. We have added these in Methods.

      (1-5) In order to follow Fig 1F correctly, the authors need to indicate the detection criteria of "Ca dip (in legend)". Please indicate "each Ca dip" in Fig 1E. As a reader, I would like to agree with the Ca dip detection of this Ca curve based on the criteria. Please also indicate "each Ca dip" in Fig 2E and 2F. In the case of the 2nd and 3rd MAs, do they follow a single Ca dip or does each MA follow each Ca dip? This chart is highly dependent on the detection criteria of Ca dip.

      We have indicated each ca dip in Fig 1 and Fig 2.

      As I mentioned above, most of the quantifications are not based on the clear detection criteria. The authors need to re-analyze the data and fix the quantification. Please interpret data and discuss the cellular mechanism of ISO based on the re-analyzed quantification.

      As suggested, we have re-analyzed the MA and MA-related photometry signals. Accordingly, parts of Fig1 and Fig2 have been revised. Although there are some small changes, the main results and conclusions remain unchanged.

      Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro arousals, and sensory sensitivity.

      Weaknesses:

      - The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine, but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      – The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.", but the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      – Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      – The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Fig. 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      – Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      – While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (fig. 6), how is cortical EEG affected? is ISO still seen in EEG but attenuated in DG?

      – The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B, C? it is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Fig 1 or G as well as broader sleep architecture are not affected.

      – On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA correlated activity. I would like to see the equivalent of Fig 1,2 G panels with the 5-HT1a manipulation.

      Responses to Revewer#3 have been addressed in the first revision. 

      Reviewer #1 (Recommendations for the authors):

      Minor comment: Several recent publications from different laboratories have shown rhythmic release of norepinephrine (NE) (~0.03 Hz) in the medial prefrontal cortex, the thalamus, and in the locus coeruleus (LC) of the mouse during sleep-wake cycles-> Please add "preoptic area" here

      We have added the citation.

      Reviewer #2 (Recommendations for the authors):

      Minor

      (1) (abstract, page 2 line 9) what kind of "increased activity" did the authors find?

      Increased activity compared to that during wakefulness. We have added this.

      (2) (result, page 4) please define first, early, and late stage of NREM sleep in the methods.

      We have added these in the Methods.

      (3) (result, page 6) please define "the risetime of the phasic increase".

      It refers to the latency between the increase of 5-HT and the MA onset. We have clarified this in the text.

      (4) (supplement Fig 3 legend) please reword "5-HT events" and "5-HT signals" because these are ambiguous.

      We have defined the events in the legend.

      (5) (Fig 5A) please replace the picture without bubbles.

      We have replaced the image in Fig5A.

    1. eLife Assessment

      This important manuscript proposes a dual behavioral/computational approach to assess emotional regulation in humans. The authors present solid evidence for the idea that emotional distancing (as routinely used in clinical interventions for e.g. mood and anxiety disorders) enhances emotional control.